Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: RFC: Thoughts on a Lean4 model context protocol (MCP) SDK?

Daniel Lebedinsky (Oct 27 2025 at 19:38):

For those who don't know, a MCP is a way for LLMs to talk to external systems, so they can get more context for code generation. The external system can be documentation for a language or library like lean4/mathlib, or an API that does something else entirely. According to this video, MCPs are especially useful for writing code in new languages/frameworks that the AI training data didn't have enough examples of, such as svelte. This repository has several examples of MCP SDKs for various languages. Someone already made a MCP for lean, but there are just a few people working on it and it doesn't look like it's supported by the official org or community.

User Experience: How does this feature improve the user experience?
This feature will make lean4 code generation more reliable with tools like Claude Code; this will lead to progress in the field of automatic theorem proving.
Beneficiaries: Which Lean users and projects benefit most from this feature/change?
Mathematicians who want to experiment with prompt engineering, and anyone who is interested in Lean's applications to AI or likes vibe coding in general.
Maintainability: Will this change streamline code maintenance or simplify its structure?
I'm not sure exactly what impact it will have on the official lean4 project's code maintenance- if the maintainers want to use AI coding agents, it will be a big plus as it will help automate at least some of their work, if not, then I'm sure it will still help the community maintained repos, ie. more people will want to add theorems to mathlib if they get to play around with AI in doing so.

I'm curious to hear your thoughts on this, if there seems to be enthusiasm I'll open an RFC in the github issues.

Notification Bot (Oct 27 2025 at 19:59):

This topic was moved here from #general > RFC: Thoughts on a Lean4 model context protocol (MCP) SDK? by Patrick Massot.

Gavin Zhao (Oct 27 2025 at 20:15):

lean-lsp-mcp works perfectly for me, so I'm not sure if there's the need for another Lean MCP server. Are you suggesting that Lean should have an official MCP server?

Gavin Zhao (Oct 27 2025 at 20:16):

I could be wrong but I think lean-lsp-mcp is the go-to for people who want to use a Lean MCP server, so it's not like it's niche or not used actively by the community.

Alfredo Moreira-Rosa (Oct 27 2025 at 20:22):

Yes, lean-lsp-mcp is already ultra powerful, i don't think we need another one it has :

infoview feedback
lean compiler feedback
local search
external search lemmas (pattern search, semantic search, ...)

Daniel Lebedinsky (Oct 27 2025 at 21:37):

Gavin Zhao said:

lean-lsp-mcp works perfectly for me, so I'm not sure if there's the need for another Lean MCP server. Are you suggesting that Lean should have an official MCP server?

I wasn't necessarily saying that there should be another one made from scratch. If that MCP is good then I guess my only suggestion would be to add it to the documentation.

Justin Asher (Oct 28 2025 at 13:36):

There's probably a decent amount of room for improvement.

For instance, LeanSearch does not live index Mathlib, meaning AI assistants do not have access to the most recent data, making collaboration difficult. They could technically read files and grep for things that they are looking for, but I do not think that this is a good way to do things (this is what ripgrep is doing in lean-lsp-mcp for local files). I am in the process of updating LeanExplore to fix this issue.

I also think that using the LSP might be the wrong approach to doing things. A lot of projects simply compile files repeatedly and use the compiler feedback to fix proofs instead of needing to submit individual MCP tool requests to examine each error that pops up.

In either @Oliver Dressler has done a wonderful job with this package, and I am super happy to see progress being made in this area.

Daniel Lebedinsky (Oct 28 2025 at 15:10):

Justin Asher said:

I also think that using the LSP might be the wrong approach to doing things.

Interesting, what possible alternative approach(es) could a MCP take?

Kim Morrison (Oct 29 2025 at 01:20):

Daniel said:

Justin Asher said:

I also think that using the LSP might be the wrong approach to doing things.

Interesting, what possible alternative approach(es) could a MCP take?

I think Justin answers this in his next sentence: run lake build.

For small projects I've found this more reliable than having Claude read diagnostics.

Justin Asher (Oct 29 2025 at 03:08):

I think Justin answers this in his next sentence: run lake build.

Yes, thanks, @Kim Morrison! :smile:

The only stipulation is that you would want to break the proof down into small enough steps that not too many diagnostics pop up in each individual proof. Meaning: use lots of lemmas! This is what handwritten blueprints do already, and why Math.inc has had the most autoformalization success.

You could also have an MCP tool that targets getting diagnostics from specific theorems, meaning a method get_diagnostics_from_declaration(name: str), so that we do not need to get diagnostics from the entire file each time for very long files.

Oliver Dressler (Oct 29 2025 at 09:10):

Justin Asher said:

You could also have an MCP tool that targets getting diagnostics from specific theorems, meaning a method get_diagnostics_from_declaration(name: str), so that we do not need to get diagnostics from the entire file each time for very long files.

Agreed, I have been rewriting the LSP interaction to be more async. One of the goals is to have optional line range parameters on get_diagnostics. This is not straightforward, as unlike most other LSP requests the diagnostic messages are not actively requested but just come in, whenever a line is done processing.

Justin Asher said:

For instance, LeanSearch does not live index Mathlib, meaning AI assistants do not have access to the most recent data, making collaboration difficult. They could technically read files and grep for things that they are looking for, but I do not think that this is a good way to do things (this is what ripgrep is doing in lean-lsp-mcp for local files). I am in the process of updating LeanExplore to fix this issue.

Excited to see what solution you are cooking up! Maybe some kind of local encoding?

I think it should be mentioned here that the reason lean-lsp-mcp does not include a tool to interact with leanexplore is that they provide their own MCP which works well.

Kim Morrison (Oct 29 2025 at 10:46):

Justin Asher said:

The only stipulation is that you would want to break the proof down into small enough steps that not too many diagnostics pop up in each individual proof.

I think, to the contrary, one wants to instruct the agent to only ever write one step at a time, so that there is only ever one diagnostic (on the most recently added step of the proof).

I have never seen an agent successfully write a long one-shot proof that had an error early on, and then sensibly recover.

Kelly Davis (Oct 29 2025 at 13:09):

Kim Morrison said:

I have never seen an agent successfully write a long one-shot proof that had an error early on, and then sensibly recover.

For examples see the Goedel-Prover-V2 article, for example Table 4 and Figure 7 show how Goedel-Prover-V2 performance improves with self-correction. However, this might not fit your understanding of "long" or "early on".

To ground "long" and "early on" with an example, just this morning I had Goedel-Prover-V2 on its first round of self-correction fix a 263 line proof of Putnam 1962 A6.

Cameron Freer (Oct 29 2025 at 16:04):

@Kim Morrison said:

I have never seen an agent successfully write a long one-shot proof that had an error early on, and then sensibly recover.

I've also seen sonnet 4.5 (via Claude code with my skill) recover many times on 100+ line proofs in this situation -- it often works best for it to fix errors at the top first (leading to further cascading errors lower down in the proof, but it eventually eliminates them all in the course of working downwards).

Justin Asher (Oct 29 2025 at 18:28):

Oliver Dressler said:

Excited to see what solution you are cooking up! Maybe some kind of local encoding?

I am planning to start to wrap doc-gen4, so I can build on top of the wonderful work that the Lean FRO is doing already, and just handle the semantic search side of things myself. I want to do nightly updates.

Kim Morrison said:

I think, to the contrary, one wants to instruct the agent to only ever write one step at a time, so that there is only ever one diagnostic (on the most recently added step of the proof).

This does seem ideal to me, too. However, I think the current SOTA is doing whole-proof rollout with proof-repair (Seed-Prover, Delta Prover, Hilbert). I think there is a good middle ground here by keeping the proofs on the shorter side of things (instead of encouraging the agent to produce 100+ line proofs). This allows the models to focus on a few errors at once, then use the completed lemmas in subsequent proofs without needing to worry about them.

Cameron Freer said:

leading to further cascading errors lower down in the proof, but it eventually eliminates them all in the course of working downwards

Yes, we should test if it is indeed faster to have one long proof or breaking things into lemmas. My gut tells me that encouraging the LLM to structure its output in a way that mimics the way Mathlib is written would be ideal, but I could be wrong.

Daniel Lebedinsky (Oct 29 2025 at 20:19):

Kim Morrison said:

I think Justin answers this in his next sentence: run lake build.

For small projects I've found this more reliable than having Claude read diagnostics.

Oliver Dressler said:

I think it should be mentioned here that the reason lean-lsp-mcp does not include a tool to interact with leanexplore is that they provide their own MCP which works well.

The keyword is "small projects". I've only started testing out lean-lsp-mcp and it seems very good so far, I haven't tried the leanexplore mcp yet, from my job experience I've found that in general genAI is a great force multiplier for large software projects, and MCPs make it more accurate.

What I meant by my last question though, is LSP the standard for designing MCPs?

Justin Asher (Oct 29 2025 at 20:23):

Daniel said:

What I meant by my last question though, is LSP the standard for designing MCPs?

That's an excellent question, and slightly outside my area of expertise. It seems like there are some open source projects in this direction (e.g., mcp-language-server), but I am not aware how much of an advantage this provides. I do not believe Claude Code or Codex use this, and instead just run the files each time to see any errors. Any thoughts?

Daniel Lebedinsky (Oct 29 2025 at 20:25):

Cameron Freer said:

@Kim Morrison said:

I have never seen an agent successfully write a long one-shot proof that had an error early on, and then sensibly recover.

I've also seen sonnet 4.5 (via Claude code with my skill) recover many times on 100+ line proofs in this situation -- it often works best for it to fix errors at the top first (leading to further cascading errors lower down in the proof, but it eventually eliminates them all in the course of working downwards).

I also rarely see agents successfully write non-trivial Python programs with a one shot prompt, in my work we use techniques like Tree of Thought prompting and prompt graphs to generate large bodies of text with greater accuracy. I'd be very interested to see research into these kinds of techniques being used to generate proofs.

Eric Wieser (Oct 30 2025 at 21:07):

One downside of a lake build agentic workflow is that the agent can't benefit from Lean's incremental compilation.

Last updated: Feb 28 2026 at 14:05 UTC