Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: leanclient: Interact with Lean from Python via the LSP

Oliver Dressler (Jan 14 2025 at 12:09):

I am happy to announce the initial release of leanclient, a Python client to the Lean language server.

This module is a thin wrapper around the Lean language server, allowing similar interactions as the VSCode extension, but from Python.
You can point leanclient at a lean project to query and update files. E.g:

Get goals or term goal at a specific position.
Update a file and get new diagnostic information (warnings, errors, ...).
Extract locations of symbols (theorems/definitions) in a file.
Find all callers of a theorem in mathlib.
Get hover information at a specific position.

See the documentation for a full list of implemented features.

leanclient also enables easy parallel processing of files using multiple language servers to get your fans spinning.

This tool is designed to abstract away the complexity of interacting via the LSP and allow you to focus on more interesting tasks, such as automatic proof rewrites, information extraction, writing dynamic prompts, ...

I hope you find this tool useful, and I look forward to any questions, feedback and contributions!

John Smith (Jan 17 2025 at 08:59):

This is a very useful tool. I'm using it to construct a dataset where all instances of simp are translated to the corresponding term using the show_term tactic.

However, simp has many variations, such as simp * at [*] and simp only, making it difficult to determine the complete scope of simp. Consequently, replacing simp ... with the term can be challenging.

I believe this might be the job of a parser, but I'm not very familiar with the Lean parser. Do you know of any methods or tools that could accomplish this task? :thinking:

Joachim Breitner (Jan 17 2025 at 09:13):

Nice! Can this tool be used to perform all code actions on a file? (Thinking of updating #guard_message strings in particular)

Oliver Dressler (Jan 17 2025 at 12:47):

John Smith said:

I believe this might be the job of a parser, but I'm not very familiar with the Lean parser. Do you know of any methods or tools that could accomplish this task? :thinking:

I don't think there are any tools directly in the language server.
Maybe in Metaprogramming but I'm still a novice.

In Python I would use multiple regex or similar to capture all cases.

I'm thinking about adding support for conditional rewrites, akin to:
"Perform this transformation at every location unless it causes certain diagnostics (errors / warnings)."

Would this be useful in your case?

Oliver Dressler (Jan 17 2025 at 12:48):

Joachim Breitner said:

Nice! Can this tool be used to perform all code actions on a file? (Thinking of updating #guard_message strings in particular)

I can add support for code actions next. If you can shortly explain your use case I can use it as a test case :wink:

Eric Wieser (Jan 17 2025 at 13:08):

If you're trying to build a dataset, I don't think the LSP is a sensible tool

John Smith (Jan 21 2025 at 09:00):

Oliver Dressler said:

I'm thinking about adding support for conditional rewrites, akin to:
"Perform this transformation at every location unless it causes certain diagnostics (errors / warnings)."

Yes, I think this would be very useful. Perhaps we could start with implementing a basic version of this functionality, similar to the "apply suggestion" feature in the Lean Client.

I've noticed that in the VSCode Lean infoview, there's a section called "Suggestions" where you can click on blue text to apply a suggestion. This could be seen as a subset or a stepping stone towards the ultimate goal you're describing.
image.png

John Smith (Jan 21 2025 at 09:02):

Eric Wieser said:

If you're trying to build a dataset, I don't think the LSP is a sensible tool

Thank you for your feedback. Could you elaborate on why you don't think the Language Server Protocol (LSP) is a suitable tool for building a dataset?

And what alternative approaches or tools would you recommend for dataset construction in this context?

THX!

Patrick Massot (Jan 21 2025 at 09:30):

The LSP is really meant for human beings interacting with Lean through an editor. Did you read the discussion at #Machine Learning for Theorem Proving > Interacting with Lean 4 @ 💬 for intance?

Oliver Dressler (Jan 27 2025 at 09:09):

I think, @Eric Wieser and @Patrick Massot are correct. The LSP is not a fitting tool for dataset creation.

For that something like lean-training-data is much better suited!
Speed is key as mathlib scales up & the lsp is slow at opening files.

In general, I've been planning a page in the docs about the limitations of the lsp.
Would love some feedback and additions:

Limited interface made for humans
The LSP only allows for a few requests, many require post-processing to be useful (e.g. the highlight range).
LSP is part of the core lean4 repo and only extensible there.
New parametrization: Line, character
Usually the user provides these parameters. You will have to write a program that does this.
Bonus: If the LSP does not provide the position you need, you start parsing files manually...
Slow to open files.

These limitations make it challening to do library-wide programmatic interaction with the LSP.

On the other hand, I do believe that precisely due to its design for human interaction,
the LSP could have potential for (neural) agentic interaction.

Patrick Massot (Jan 27 2025 at 09:42):

“LSP is part of the core lean4 repo and only extensible there” is not fully true. There are things you can do in user space like implementing a new LSP handler.

Oliver Dressler (Jan 27 2025 at 10:08):

Patrick Massot said:

“LSP is part of the core lean4 repo and only extensible there” is not fully true. There are things you can do in user space like implementing a new LSP handler.

This is great to know. I'll investigate and change the sentence accordingly. Thanks!

Jason Rute (Jan 27 2025 at 12:22):

One consideration: if the LSP is how you will setup AI to interact with Lean, you should at least spot-check that the data you use at interaction-time agrees with the sort of data data you extract, and ideal check it more thoroughly. I found lots of weird idiosyncrasies in the Lean 3 language server. Hopefully those have been properly addressed in Lean 4, but it is good to do this investigation before training a model. :)

Oliver Dressler (Jan 27 2025 at 14:19):

Jason Rute said:

One consideration: if the LSP is how you will setup AI to interact with Lean, you should at least spot-check that the data you use at interaction-time agrees with the sort of data data you extract, and ideal check it more thoroughly. I found lots of weird idiosyncrasies in the Lean 3 language server. Hopefully those have been properly addressed in Lean 4, but it is good to do this investigation before training a model. :)

Good point, I'll include a section in the limitations:
Verify LSP response ...

I cannot comment on lean3 but during development and testing of leanclient I have only found 1-2 minor issues.
I'm going to submit these but I first want to check if they are relevant in the vscode extension.
Otherwise they are probably not even worth fixing.
Overall the experience has been stable even if confusing at times.

Oliver Dressler (Jan 27 2025 at 14:20):

Patrick Massot said:

“LSP is part of the core lean4 repo and only extensible there” is not fully true. There are things you can do in user space like implementing a new LSP handler.

Just to clarify, does this mean that a user could extend the LSP Handler and include a custom method e.g. $/lean/infoTree?

Patrick Massot (Jan 27 2025 at 14:34):

Yes, see the discussion at #lean4 > User defined LSP extensions . Unfortunately I had zero time to work on this since last summer, but the possibility is real and I had a very early experiment at https://github.com/PatrickMassot/LspExperiments/

Oliver Dressler (Jan 27 2025 at 14:42):

Patrick Massot said:

Yes, see the discussion at #lean4 > User defined LSP extensions . Unfortunately I had zero time to work on this since last summer, but the possibility is real and I had a very early experiment at https://github.com/PatrickMassot/LspExperiments/

This is very helpful, thank you!

Last updated: May 02 2025 at 03:31 UTC