Zulip Chat Archive

Stream: general

Topic: Concurrent access to Lean Server


Stanislas Polu (Aug 25 2020 at 08:23):

Does the Lean Server support receiving concurrent info requests on the same file? Is there a capacity limit? Is it supported to do concurrent interleaved requests across files interleaving info requests and full syncs?

Stanislas Polu (Aug 25 2020 at 08:25):

Also is it preferred to requests positions somewhat sequentially in the file or is the file fully processed and there is no performance gain from location-grouped requests?

Gabriel Ebner (Aug 25 2020 at 08:34):

Recent versions do not have a restriction on the number of concurrent requests. Note however that the results of the info request only reflect the current state of what has been processed. If you don't wait for Lean to finish compiling (i.e. the orange bars in vscode go away), then info will successfully return nothing.

Stanislas Polu (Aug 25 2020 at 08:56):

@Gabriel Ebner thanks! It seems that I can consistently crash the server with 2 concurrent requests going as fast as possible if I go for long enough (~500 requests total) (I'm programmatically calling it, so the workload is definitely more intensive than when used with the extension)

Stanislas Polu (Aug 25 2020 at 08:56):

@Gabriel Ebner there is not output on STDOUT as it crashes it seems... do you know where I should look for an error message?

Gabriel Ebner (Aug 25 2020 at 08:57):

That's disappointing to hear. Your best bet with a crash (segfault?) is to attach a debugger.

Stanislas Polu (Aug 25 2020 at 09:01):

I may be OOMing; let me try on a bigger machine

Stanislas Polu (Aug 25 2020 at 09:52):

@Gabriel Ebner quick question. With the following file:

import data.real.basic

example {a b : } (hab : a  b) (c : ) : a + c  b + c :=
begin
    rw  sub_nonneg,
    have key : b + c - (a + c) = b - a, { ring },
    rw key,
    rw sub_nonneg,
    exact hab,
end

I get the following response from the Lean server on 6,17

InfoResponse(seq_num=129, record=InfoRecord(full_id='has_add.add', text=None, type='Π {α : Type} [c : has_add α], α  α  α', doc=None, source=InfoSource(line=323, column=6, file='/Users/spolu/code/formal/formal/lean/vendor/lean-3.18.4-darwin/lean-3.18.4-darwin/lib/lean/library/init/core.lean'), state='a b : ,\nhab : a  b,\nc : \n ℝ', tactic_param_idx=None, tactic_params=None))

In particular the state is a b : ℝ,\nhab : a ≤ b,\nc : ℝ\n⊢ ℝ which is not the current goal state but rather a type information related to + I presume?

The VSCode extension mark this as an "expected type". How do you know that this is not a goal state?

Gabriel Ebner (Aug 25 2020 at 09:53):

In general there is no way to know. The "expected type" string is rendered by the interactive widget. If you're willing to modify the C++ code, you could add a flag in term_goal_data::report.

Gabriel Ebner (Aug 25 2020 at 09:54):

Note that you will encounter "nested" tactic states pretty often. by exact foo (by simp) 3 is perfectly valid.

Stanislas Polu (Aug 25 2020 at 09:56):

OK I think I'll bite the bullet and will look at the C++ code :) Thanks for the pointer to term_goal_data::report :+1:

Stanislas Polu (Aug 25 2020 at 09:58):

To handle nested states is there a unique goal identifier I can expose/use ?

Stanislas Polu (Aug 25 2020 at 10:49):

Maybe the cache ID?

Jason Rute (Aug 25 2020 at 13:16):

One trick I've found to tracing nested tactics is as follows. This is easiest to do in a clean environment, but can be also be done on a messy environment with other lean versions if you are careful about making a new lean toolchain. For simplicity, I'll assume you have a clean environment where you aren't afraid to break lean.

  • After having your lean project setup, find the lean library paths via lean --path.
  • Modify <lean_library_path>/init/meta/tactic.lean as follows.
  • Change istep to also record the positions and state:
    meta def istep {α : Type u} (line0 col0 : ℕ) (line col : ℕ) (t : tactic α) : tactic unit := do trace (line0, col0, line, col), trace_state, λ s, (@scope_trace _ line col (λ _, step t s)).clamp_pos line0 line col

  • Save the file and recompile the lean library file that you've changed. I've found one way to do this is by making a new lean file in your project which has a tactic in it, and then running lean --make my_file.lean.

  • Now for every tactic lean will print the line and column numbers of the beginning of the tactic as well as the state.

You should test this extensively. Also, note the following:

  • You are modifying lean's source code (but not the C++), so this will effect any lean project using this lean toolchain.
  • Line numbers are 1-indexed. Column numbers are 0-indexed.
  • I don't remember the difference between line0/col0 and line/col. They are usually the same and they are the start of the tactic. Unfortunately, it is not the end, just the start.
  • Depending on parallism, you may or may not need to combine the two trace statements into one (looking at the source for trace_state) to make sure they stay together.
  • If you want the end of the tactic, I think this can be done more complicatedly by also modifying ALL of the interactive tactics to have (loca : parse location) parameter arguments as the first and the last arguments. I haven't tested this much. Then you would have to trace this information.

Jason Rute (Aug 25 2020 at 13:22):

For the lean server, I think these traces will show up in all_messages if I remember correctly.

Thomas Eckl (Aug 25 2020 at 14:55):

Well, the manual only tells me how to produce a leanpkg using mathlib - I don't want to do that. It might be that I have to extend the hott library; I want to port some Lean2 hott files and don't know yet what I need on the way.

Stanislas Polu (Aug 25 2020 at 15:22):

@Jason Rute Thanks! I will try your diff that looks promising. A slightly different approach I'm exploring right now is to split proofs line by line and record the goal state at the beginning of each line only. Then train on the objective of generating a line of code conditioned on the current partial proof and the current goal state. Obviously with term proofs the goal state will remain empty (and these can be quite long (eg pell.lean)) but since they are generally one-liners that's not a big deal since the lemma/theorem statement serves the same purpose when that's the case. This has the major advantage that it is super easy to setup; allows for tree-searching to generate full proofs, works with the vanilla toolchain and yet could be useful if integrated as is in the VSCode plugin.

Stanislas Polu (Aug 25 2020 at 18:18):

Is there an incremental syncing capability to Lean server? Does the VSCode always use full-sync?

Bryan Gin-ge Chen (Aug 25 2020 at 18:21):

As far as I know there's no incremental sync. vscode-lean always does a full sync.

Bryan Gin-ge Chen (Aug 25 2020 at 18:22):

I think Lean is usually smart enough not to reprocess parts of the file which haven't changed though.

Jason Rute (Aug 25 2020 at 21:44):

@Stanislas Polu I think something to keep in mind is that you really have two separate problems. (1) Generating the training data to train your model. (2) Using your model to create proofs. The problem with the first is that parsing Lean tactics is literally Turing complete. (More specifically, parsing is done by a Lean parser which farms out the parsing work to the individual parsers, executing it for each tactic argument. In practice, it is fairly well behaved, but still you would have to basically write another parser to do it perfectly.) But once you have high-quality, reasonably parsed training data, then your model should be more well behaved. If you can train your model to create one tactic command at a time, then you can run it on a goal, generate a tactic command, check if it succeeds, and do a tree search. You shouldn't need to hack Lean to run your tree-search algorithm (if using the Lean server), just maybe to get good training data.

Jason Rute (Aug 25 2020 at 21:45):

As for getting good training data I think it is a tradeoff of how much work you want to put into it (and how much help you get from others in the Lean community). One tactic per line is a reasonable starting guess, but I don't know how well it holds up in practice. Some other heuristics might also help. For one, (baring comments) tactic commands must have matching parentheses, brackets, braces, etc. This would allow you to find some nested tactics and tactics which span multiple lines. Tactic commands are usually separated by commas in begin...end blocks and semicolons in by blocks, but this is not always true (and commas have other uses as well). If you are willing to use the lean server goals or the hacked istep traces, then you can get even better data.

Jason Rute (Aug 25 2020 at 21:45):

I hope Lean 4 will solve this problem by providing a way to parse Lean into tactic statements and goal pairs, but I don't think we should hold out hope.

Jason Rute (Aug 25 2020 at 21:49):

Also, if it helps, here is some code I wrote to extract all the rewrite tactic applications in Lean. It works pretty well (but it is only for a specific set of tactics. I had to modify all the rewrite tactics manually. (The code is more complicated than it would be for other tactics, since I also broke up simultaneous rewrites, into separate rewrite calls: https://github.com/leanprover-community/lean/compare/master...jasonrute:v3.9.0-rw-recording (Unfortunately, I haven't yet published this data anywhere.)

Stanislas Polu (Aug 26 2020 at 09:31):

@Jason Rute Completely agreed that the two problems are separate. I think that staying close to the interface humans use is desirable (at least in our opinion), that's the main reason why I'm not too keen on exploring instrumentation of Lean at the tactic code level. I hypothesize that our models will be capable of efficiently leveraging even partial goal state information and I'll work on verifying that hypothesis. (But I also think Lean code alone is not enough as humans do leverage the goal state when constructing proofs).

Concerning the replay of proofs, while I see the work you've done very close to what HOList did with HOL-light, I plan to explore replaying proofs by simply injecting them in their context Lean file. This solves most issues with universes, local variables, imports etc... we'll see how it goes.

Unrelated, but very nice thing about Lean is that we have a very natural evaluation task available with Patrick Massot's tutorials. Would be really excited to see a model achieve that!

Stanislas Polu (Aug 26 2020 at 09:32):

@Bryan Gin-ge Chen thanks! good to know :thank_you:


Last updated: Dec 20 2023 at 11:08 UTC