Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: VSCode plugin that connects an agent to the tactic state

Geoffrey Irving (Feb 24 2025 at 20:41):

My general sense is that most of the projects discussed in this channel are custom, open-source models tuned for Lean specifically. However, the closed-source frontier LLMs are getting pretty good as agents these days. So:

I'd be curious to see the best current attempt at making a VSCode plugin that iteratively asks asks an LLM to rewrite a proof, then shows it the tactic state, then iterates. In particular, I'm curious for the version that tries to use an unmodified, untuned frontier LLM via API, both for simplicity and so that it can easily be tried on new models without any update step.

Ideally, this would involve stuffing most/all of the other files of the project into context-cached context. My main claim is that this kind of setup would often be useful out of the box, even without familiarity with the relevant other parts of Mathlib, because most projects have similar reasoning in other proofs that the model can attend over. If there is a particular theorem that the agent doesn't know how to use, you can put it in a prompt or in a comment above, and the system prompt could explain to the LLM how to use #check to see full types of such lemmas.

Has anyone built such a VSCode plugin?

GasStationManager (Feb 24 2025 at 20:51):

LeanTool does part of what you are asking. It does the LLM-Lean feedback loop. One of the interfaces it exports is an openAI-compatible proxy server, which you can then plug into a VS Code plugin coding assistant like Cline & Continue, or a command line one like Aider.
As for managing the context of the rest of the project, presumably these coding assistants can do some form of that, but I have not tested how well it works together with LeanTool.

𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 (Feb 24 2025 at 20:52):

A while ago we built Sagredo. It mostly did what you suggest, iterating on errors, except that I don't think we were including other files in the context (this was back when contexts were much shorter). LeanCopilot also seems to have the capacity to run off-the-shelf LLMs.

Geoffrey Irving (Feb 24 2025 at 20:52):

@GasStationManager: To clarify, it could be plugged into VSCode, but isn’t yet?

GasStationManager (Feb 24 2025 at 20:55):

Geoffrey Irving said:

To clarify, it could be plugged into VSCode, but isn’t yet?

I have tested the set up to confirm that it works in Cline and Continue and Aider, at least the chat agent part. Have not tested more sophisticated code action features that these assistants have; more testing are welcome!

Geoffrey Irving (Feb 24 2025 at 20:58):

LeanCopilot seems like it is trying to do something nontrivially different than direct agent iteration, including aesop-like iteration rather than whole proof rewriting. I think it is plausible that whole proof rewrites work better in some cases: certainly I often learn something from one part of the tactic tree that is helpful in some other part.

Oliver Dressler (Feb 24 2025 at 21:01):

I have built an extension, Lean Scribe, that fits some of your requirements.

It allows you to define and render context-rich prompts. To include for example the tactic state at the cursor position, all diagnostic messages in the current file, or the the result of running a tactic at the cursor position.

Lean Scribe also allows for easy use and configuration of models (API or ollama) and supports all state-of-the-art models. I have used it today with Claude 3.7, it performs OK on the simple problems I have tried.

However, it is a manual tool for now. The iterative loop is planned but I want to get the human-in-control mode right first.

Andy Jiang (Feb 26 2025 at 10:12):

Could be interesting to compare with how Claude is playing pokemon (live on twitch) . I think there they provided Claude with some scratch pad to log working knowledge/long term plan + some scaffolding to help it navigate & not get stuck

Btw how good is Claude computer control at doing lean proofs--have anyone tried?

Oliver Dressler (Mar 31 2025 at 14:48):

VSCode Insiders (the bleeding edge version of VSCode) has released a native agent mode end of Feb. This feature should arrive "soon" in stable VSCode.

It allows VSCode to iteratively edit files, check diagnostics, and interact with MCP servers, similar to how Cursor works.

Since LeanTool by @GasStationManager (see above) has an MCP mode it should be relatively straight forward to integrate with this new agent mode.

I have also recently made an MCP server that exposes various tools for the LLM agent: E.g. Query the tactic state or term goal at any position. With the new agent mode this became essentially a one-click install for VSCode Insiders.

GasStationManager (Mar 31 2025 at 15:28):

Very cool! I think tools that expose Lean's features to LLMs are greatly needed. And MCP seems to have emerged as a standard protocol for this, being adopted by multiple vendors.

Bas Spitters (Apr 01 2025 at 14:51):

@Oliver Dressler That's very cool! I had expected a similar functionality to be available for rust, but I cannot find an MCP interface for rust-analyzer yet...

Last updated: May 02 2025 at 03:31 UTC