Zulip Chat Archive
Stream: Machine Learning for Theorem Proving
Topic: Claude Code
Geoffrey Irving (Feb 24 2025 at 20:29):
My sense is that Claude Code isn’t that useful yet in a Lean context because without custom engineering the agent doesn’t know the tactic state. Is that right? Curious if anyone tries it and has a positive experience.
https://www.anthropic.com/news/claude-3-7-sonnet
GasStationManager (Feb 25 2025 at 18:52):
I tried Claude Code briefly. After a bit of prompting it was able to invoke lake env lean <file>
, and try to fix its own syntax errors, but ended up not able to fix its syntax and got stuck in a loop. (The task prompt was "prove that 3 is not even")
What worked better for me is when I wrote a simple wrapper of LeanTool to export its tool as a Model Context Protocol (MCP) server, which can then be added to Claude Code. This provides somewhat more information than just calling Lean, as it also prints goal states from sorry
s (using Pantograph).
See here for an example set up.
GasStationManager (Feb 25 2025 at 19:01):
This does make me wonder, if Claude can make use of more interactive Lean features; e.g. if one makes a MCP server wrapper for @Oliver Dressler's leanclient, which exposes the Lean Language Server features. Which features would be useful to an LLM?
GasStationManager (Feb 25 2025 at 19:25):
Similarly for Pantograph and other repl-like interfaces. Would Claude be able to use these for proof search?
Geoffrey Irving (Feb 26 2025 at 11:01):
Recent LLMs are pretty good at tool use if prompted appropriately, so likely use.
Quinn (Feb 26 2025 at 15:59):
doing the fvapps baselines, the simplest scaffold (a loop) elicited nontrivial ability to learn syntax that the core llm was wrong about, just error message no LSP wire-up
Quinn (Feb 26 2025 at 16:26):
i'm actually a little skeptical that LSP / goal state beats error message by a lot overall, but one choice over another might save a lot of tokens/dollars
Last updated: May 02 2025 at 03:31 UTC