Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Interacting with Lean 4

Mert Ünsal (Oct 12 2024 at 18:57):

Hi Zulip Community,

I am looking forward to your input in what is the best way to programmatically interact with Lean 4 - I want to get compilation feedback and tactic state information to try out some ML stuff.

I am currently using Lean REPL (https://github.com/leanprover-community/repl) and it takes roughly half a second to get the compilation feedback for a machine generated autoformalization of theorem statement (looks like theorem theorem_statement := by sorry). Is this the best speed one can get?

Thanks!

Jason Rute (Oct 12 2024 at 19:36):

I feel the design space is so large here that it doesn't make sense to ask for the "best way", since each way comes with its own advantages and disadvantages. Also, some ways of running lean just don't let you do stuff (like interact with the environment, or see what code is in the current file). In short I know of four general paradigms for interacting with Lean:

Running a Lean file from the command line. Simplest, but probably most limited and slowest.
Running from inside Lean's metaprogramming using a tactic or command.
Using Lean's language server
Using that Lean is written in Lean, so you can reuse parts of Lean's parser/kernel/interpreter.

Jason Rute (Oct 12 2024 at 19:36):

I know since LLMs are often the bottleneck, one doesn't usually optimize the interaction (and you should ask yourself if this is slowing you down), but I think you could get things to go a lot faster.

Jason Rute (Oct 12 2024 at 19:36):

Also, researchers aren't very open about what they do and the Lean organization hasn't put a lot of effort into standardizing this. You could also try LeanDojo or LeanAide. Or just try directly calling Lean from the command line or from the language server. (Note the language server is hard to use and it has a hard-coded 100ms delay, at least it did in Lean 3.)

Jason Rute (Oct 12 2024 at 19:36):

Also, you could probably get a good upper bound on the time for autoformalization by just generating 500, 1000, 2000 statements in the same file (duplicates might be okay) and timing how long it takes to run that file in Lean from the command line.

Jason Rute (Oct 12 2024 at 19:37):

Let us know what eventually works for you. I wish more people shared what they practically do here, and I wish more of this was upstreamed to common tools that other people use.

Mert Ünsal (Oct 12 2024 at 19:48):

I have experimented a bit with LeanDojo but it seems to be extremely slow in initializing a theorem for me - somehow it needs to trace everything and really doesn't work as fast as the standard Infoview environment.

I think this is a very important aspect to enable machine-assistance to Lean 4 - the model interacting a little bit with the environment, at the very least, can assure that the recommendations of the model doesn't result in compilation errors. I guess it could be very valuable if Lean FRO worked on a standardized interaction with an 'environment'.

Mert Ünsal (Oct 12 2024 at 19:51):

Jason Rute said:

Also, you could probably get a good upper bound on the time for autoformalization by just generating 500, 1000, 2000 statements in the same file (duplicates might be okay) and timing how long it takes to run that file in Lean from the command line.

Thanks - I will try this. I am wondering how they did in AlphaProof but I am assuming they just had parallelized across many CPUs with Google compute.

Mert Ünsal (Oct 12 2024 at 19:52):

Jason Rute said:

I know since LLMs are often the bottleneck, one doesn't usually optimize the interaction (and you should ask yourself if this is slowing you down), but I think you could get things to go a lot faster.

When I am running a 7B model I can get 2000 tokens/sec so to generate a next Lean proof step can take 10ms - so in my case I believe it's actually the compiler feedback that is the bottleneck.

Jason Rute (Oct 12 2024 at 19:55):

As for AlphaProof, they had enough Lean experts on that team that I wouldn't be surprised if they forked Lean. If not, they certainly had the metaprogramming knowledge to do whatever they needed (and enough compute, that running Lean in parallel wasn't likely a huge bottleneck). (Meta similarly had @Gabriel Ebner on their team and my understanding is that for the HyperTree Proof Search paper, he wrote a custom Lean parser for running tactics. I could be mistaken on the details.)

Mert Ünsal (Oct 12 2024 at 19:57):

There is also LeanGym but it only works in Lean 3 (from what I understand).

Jason Rute (Oct 12 2024 at 19:58):

I generally recommend that machine learning researchers partner with Lean metaprogramming experts, but this is moving so fast, that advice might be more and more unrealistic.

Mert Ünsal (Oct 12 2024 at 20:02):

If there's anyone willing to help with this, I am more than happy to collaborate

Notification Bot (Oct 16 2024 at 11:28):

8 messages were moved from this topic to #Machine Learning for Theorem Proving > ABEL by Jason Rute.

Notification Bot (Dec 02 2024 at 01:02):

2 messages were moved from this topic to #Machine Learning for Theorem Proving > Pantograph by Jason Rute.

Jason Rute (Dec 02 2024 at 01:04):

I made a separate topic :up: for Pantograph but the one sentence summary is that it seems to be another way to communicate with Lean (it seems through the language server) for research (and maybe practical tools?).

Leni Aniva (Dec 03 2024 at 18:23):

Jason Rute said:

I made a separate topic :up: for Pantograph but the one sentence summary is that it seems to be another way to communicate with Lean (it seems through the language server) for research (and maybe practical tools?).

Pantograph doesn't invoke Lean through the LSP. It bypasses it.

Jason Rute (Dec 04 2024 at 00:10):

So in my list above is the interaction model closest to option 2?

Running from inside Lean's metaprogramming using a tactic or command.

In other words, your AI won't be able to see the rest of the file or library (except the symbolic representations stored the Lean environment)? But maybe I'm mistaken and it is more closer to option 4:

Using that Lean is written in Lean, so you can reuse parts of Lean's parser/kernel/interpreter.

I see you are doing something with Lean's parser. I'm not sure if that is just for parsing an LLM written proof of the current theorem, or also parsing other parts of the current Lean file and project.

Leni Aniva (Dec 04 2024 at 02:17):

Jason Rute said:

So in my list above is the interaction model closest to option 2?

Running from inside Lean's metaprogramming using a tactic or command.

In other words, your AI won't be able to see the rest of the file or library (except the symbolic representations stored the Lean environment)? But maybe I'm mistaken and it is more closer to option 4:

Using that Lean is written in Lean, so you can reuse parts of Lean's parser/kernel/interpreter.

I see you are doing something with Lean's parser. I'm not sure if that is just for parsing an LLM written proof of the current theorem, or also parsing other parts of the current Lean file and project.

Its closer to option 2 with some features implemented in option 4

Kaiyu Yang (Dec 26 2024 at 16:38):

Note the language server is hard to use and it has a hard-coded 100ms delay, at least it did in Lean 3.

@Jason Rute I'm wondering if that's still the case for Lean 4.

Jason Rute (Dec 26 2024 at 18:31):

@Kaiyu Yang I assume it is. The two best ways to check this are either (1) to ask someone who made it, or (2) test it by repeatedly calling the server. The later could be done by borrowing some code from https://github.com/leanprover-community/lean-client-python.

Sebastian Ullrich (Dec 26 2024 at 18:33):

That shouldn't be the case but I don't see any reason to use it for ML purposes over specialized interfaces like Pantograph

Jason Rute (Dec 26 2024 at 18:39):

@Sebastian Ullrich How easy would it to make a tool (like a plugin) which can see both the current goal state so it can suggest tactics or whole proofs, but also the rest of the file (in text) or project so that it can suggest proofs based on other tactics proofs in the file or project (not the term proofs, but the tactic proofs)?

Jason Rute (Dec 26 2024 at 18:40):

Tactics can’t see the tactic proofs, even the proof you are currently in.

Jason Rute (Dec 26 2024 at 18:42):

General plugins like GitHub Copilot, Continue, Cursor can’t see proof states.

Eric Wieser (Dec 26 2024 at 18:51):

Jason Rute said:

General plugins like GitHub Copilot, Continue, Cursor can’t see proof states.

Can these systems not see error messages?

Jason Rute (Dec 26 2024 at 18:54):

Maybe, I’d have to double check, but where is the error message coming from? The language server?

Jason Rute (Dec 26 2024 at 19:07):

I guess Cursor (which I haven’t used yet) has a “fix with ai” feature for fixing errors. It isn’t the use case I envisioned. (I envisioned something more like Tactician for Coq or what I think @Sean Welleck envisioned with minicxt.) But if that feature already exists in current tools, how does that apply to Lean? What data does it need to be trained on? Does it already work for Lean in Cursor + Sonnet 3.5?

Sebastian Ullrich (Dec 26 2024 at 19:21):

I see, for an editor plugin, interaction with the language server makes more sense. There shouldn't be any artificial delays as I said but if any existing request is insufficient, installing a new RPC handler on the server side could be a valid option

Oliver Dressler (Jan 02 2025 at 15:15):

I've been working on a thin Python wrapper around the language server.
So I can provide some benchmarks on the interaction:

First a file needs to be opened in the language server.
Loading random .lean files from /mathlib/Mathlib/:

Loaded 10 files: 0.18 files/s, 72.28 lines/s, 2817.45 chars/s
Loaded 10 files: 0.20 files/s, 57.63 lines/s, 2239.09 chars/s
Loaded 10 files: 0.13 files/s, 39.00 lines/s, 1660.48 chars/s
Loaded 10 files: 0.21 files/s, 56.99 lines/s, 2341.68 chars/s

Thankfully, the lsp allows multiple files to be opened in parallel:

Loaded 10 files: 1.34 files/s, 219.91 lines/s, 8685.79 chars/s
Loaded 10 files: 1.27 files/s, 183.02 lines/s, 7649.99 chars/s
Loaded 10 files: 0.32 files/s, 95.10 lines/s, 3629.08 chars/s
Loaded 10 files: 0.23 files/s, 72.02 lines/s, 3173.30 chars/s
Loaded 10 files: 1.07 files/s, 308.99 lines/s, 13099.81 chars/s
Loaded 10 files: 0.63 files/s, 197.90 lines/s, 7633.16 chars/s

The imports seem to influence the time it takes to load a file.
Artificial delay on the file opening in the lsp seems unlikely?

Querying an opened file is quite fast:

Running 50 repeats of each request:
plain_goal: 179.82 requests/s
plain_term_goal: 159.10 requests/s
completion: 18.67 requests/s
definition: 334.63 requests/s
hover: 881.99 requests/s
declaration: 250.87 requests/s
type_definition: 1071.57 requests/s
document_highlight: 26.48 requests/s
document_symbol: 261.13 requests/s
semantic_tokens_full: 42.73 requests/s
semantic_tokens_range: 55.14 requests/s
folding_range: 551.44 requests/s

Again, individual request times vary a lot, depending on the file and position.

The wrapper is not ready for use yet but the benchmarking code is here:
https://github.com/oOo0oOo/leanclient/blob/main/tests/test_client_benchmark.py
EDIT: Fixed url

I use the lsp for parallelizable and query-heavy tasks like data extraction.
It worked very well so far!

To use the lsp as a REPL (like Pantograph or LeanDojo) would require some speedup on the file opening:

Use textDocument/didChange for incremental updates.
Some interactions might be possible before waitForDiagnostics returns?

John Smith (Jan 14 2025 at 08:41):

Jason Rute said:

I feel the design space is so large here that it doesn't make sense to ask for the "best way", since each way comes with its own advantages and disadvantages. Also, some ways of running lean just don't let you do stuff (like interact with the environment, or see what code is in the current file). In short I know of four general paradigms for interacting with Lean:

Running a Lean file from the command line. Simplest, but probably most limited and slowest.

Running from inside Lean's metaprogramming using a tactic or command.

Using Lean's language server

Using that Lean is written in Lean, so you can reuse parts of Lean's parser/kernel/interpreter.

@Jason Rute
The "four general paradigms for interacting with Lean" you mentioned are very interesting, but they are quite abstract. Where can I find some examples or cases about these four paradigms?

Additionally, to which paradigm does the method of "writing code in VSCode and seeing information in the InfoView" belong? :melting_face:

Jason Rute (Jan 14 2025 at 12:27):

dialectics said:

Additionally, to which paradigm does the method of "writing code in VSCode and seeing information in the InfoView" belong? :melting_face:

To be clear, I was talking about machine-interaction with Lean, specifically in AI applications. But nonetheless, the Lean VSCode plugin works through the language server (approach 3).

Jason Rute (Jan 14 2025 at 12:57):

As for examples:

I think a lot of AI papers using very large LLMs run lean on a file (approach 1) for running their benchmarks. For example I thought the CORPRA evals ran this way. If the LLM generation time is much more than the compile time, it isn’t an issue for benchmarks.
All the tactic-based tools like Lean Copilot, LLMLean, and Aesop use metaprogramming (approach 2) to write their tactics, but the ones connected to an LLM at least use the parser as well (which is sort of approach 4).
Maybe approaches 2 and 4 are blurry and could be combined. It is all a form of metaprogramming. I was just trying express 4 as something more drastic where you say make your own language server or forked Lean version. I don’t know if anyone does this (maybe DeepMind?).
There are a number of applications which you can run with lean exe _ like the Lean REPL, Improver, and various proof recording libraries. I dont know the details. They are clearly metaprogramming so in the 2 or 4 category, but again maybe those categories should be fleshed out more. I think to build such apps you likely need to know Lean metaprogramming as in the metaprogramming book, but not the inner workings of Lean.
Last, the editor plugins communicate via LSP (this includes Lean’s widgets). Also @Oliver Dressler just released a Python library wrapping the LSP which might be useful for those not wanting to learn Lean metaprogramming.

RexWang (Mar 03 2025 at 15:04):

A simple example using approach 1 :)

import subprocess as sp
import tempfile

class State:
    def __init__(self, goals:str, code:str):
        self.goals, self.code = goals, code
    def __repr__(self):
        return self.goals or "No goals"

def get_state(code):
    with tempfile.NamedTemporaryFile(mode='w', suffix='.lean') as temp:
        temp.write(code)
        temp.flush()
        result = sp.run(["lake", "env", "lean", "--json", temp.name] , capture_output=True, text=True)
        states = []
        for line in result.stdout.split('\n'):
            if not line.strip(): continue
            data = json.loads(line)
            goals = data.get('data')
            if goals.startswith('unsolved goals'):
                states.append(State(goals[15:], code))
        return states if len(states) else [State("", code)]

thm = """theorem demo (P Q R : Prop) : (P → Q) → (Q → R) → (P → R) := by"""
tactics = [
    'intro h1 h2 p',
    '''have q : Q := by
      apply h1
      exact p''',
    'apply h2',
    'exact q'
]
state = get_state(thm)[0]
print(state)
states = []
for tac in tactics:
    state = get_state(f"{state.code}\n{tac}")[0]
    print(state, end='\n\n')
    states.append(state)

image.png

Auguste Poiroux (Apr 11 2025 at 12:15):

Hello everyone!
I'm excited to share LeanInteract, a Python package built on top of the Lean REPL. (Yay, yet another tool to interact with Lean 4 ^^)
Overall, it aims at being simple to set up and use for both new and experienced users. I developed it while working on autoformalization research, particularly needing version compatibility and efficient type-checking through state management.

Some of the features I find most useful:

Compatibility with all Lean versions between v4.7.0-rc1 and v4.19.0-rc2, with all the latest features and bug fixes of Lean REPL made available for all Lean versions.
Simple interaction with Lean projects: start from existing projects, or instantiate temporary ones with custom dependencies. I find the latter option especially helpful for AI benchmarks stored in datasets (JSON, ...).
Based on Lean REPL, so you can interleave executing Lean commands with tactics-based proof search.

Let me know what you think :)

Auguste Poiroux (Apr 11 2025 at 12:15):

Bonus: I personally like the recent addition of the rootGoals option in the Lean REPL (thus available in LeanInteract). It extracts initial goals from declarations in a Lean file, allowing you to work with theorems in existing projects without tracing the whole project.

Eric Wieser (Apr 11 2025 at 12:17):

Do I read correctly that you have backported the rootGoals feature to all 12 intermediate lean releases?

Auguste Poiroux (Apr 11 2025 at 12:22):

Yes indeed, it is tested here and here for Lean v4.7.0-rc1 for example. Although there are only two tests for it, so please tell me if you find a bug ;)
The backport is also done for the intermediate rc versions

Auguste Poiroux (Apr 11 2025 at 12:24):

Most importantly I think, the proof validation in tactic mode that has been added today to the REPL has also been backported

J. Simon Richard (Apr 13 2025 at 19:38):

Hi @Auguste Poiroux I'm a bit late on this, but it looks cool! I've been working a similar tool supporting some autoformalization research called lean-sdk... it's very similar, but I wanted the flexibility of a REPL fork rather than the original project, and I wanted to use LSP-style JSON RPC instead of REPL's current format to improve reliability... but maybe you've solved that problem without that change.

My project is pre-pre-alpha, so I don't expect it to really be useful to anyone (except other developers) yet, but I wonder if we should have a chat to figure out if we should combine forces or keep our projects separate. Let me know what you think

Auguste Poiroux (Apr 14 2025 at 14:43):

Hi @J. Simon Richard! I am happy to hear that you like my project ^^
With LeanInteract, my initial goal was to make it flexible. There are several options making it possible to switch the repl backend: in LeanREPLConfig you have repl_git and repl_rev parameters, and in LeanServer, there is a run_dict method which communicate with the repl in raw JSON format. However, I must recognize that it is not very flexible anymore, especially because of the constraints regarding Lean versioning LeanInteract now imposes on the repl backend.
But I would be very happy to make some changes to make this less strict. This can be particularly helpful for people developing new features for the Lean REPL like you. I also welcome all kinds of contributions to LeanInteract, so feel free to make a PR ;)

Auguste Poiroux (Apr 14 2025 at 18:21):

I was reading the Kimina report (great work, congrats to everyone who worked on this project), and found their description of the Numina Lean Server explaining why it is efficient: using Lean REPL, reusing preloaded environments based on import headers, and parallelization across CPUs.
There is an example script doing exactly that in LeanInteract.
One difference with the described Numina server is the use of LRU-based caching mechanism of the headers. Our example script is not tailored to be used during training, but for evaluation, hence no need for LRU-based caching (I believe, please tell me if I misunderstood the role of the LRU-based caching).
If you want to efficiently evaluate the Kimina models, the example script should work almost out-of-the-box, minus adding the model config and updating the prompting :+1:

Last updated: May 02 2025 at 03:31 UTC