Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Claude Code


Geoffrey Irving (Feb 24 2025 at 20:29):

My sense is that Claude Code isn’t that useful yet in a Lean context because without custom engineering the agent doesn’t know the tactic state. Is that right? Curious if anyone tries it and has a positive experience.

https://www.anthropic.com/news/claude-3-7-sonnet

GasStationManager (Feb 25 2025 at 18:52):

I tried Claude Code briefly. After a bit of prompting it was able to invoke lake env lean <file>, and try to fix its own syntax errors, but ended up not able to fix its syntax and got stuck in a loop. (The task prompt was "prove that 3 is not even")
What worked better for me is when I wrote a simple wrapper of LeanTool to export its tool as a Model Context Protocol (MCP) server, which can then be added to Claude Code. This provides somewhat more information than just calling Lean, as it also prints goal states from sorrys (using Pantograph).
See here for an example set up.

GasStationManager (Feb 25 2025 at 19:01):

This does make me wonder, if Claude can make use of more interactive Lean features; e.g. if one makes a MCP server wrapper for @Oliver Dressler's leanclient, which exposes the Lean Language Server features. Which features would be useful to an LLM?

GasStationManager (Feb 25 2025 at 19:25):

Similarly for Pantograph and other repl-like interfaces. Would Claude be able to use these for proof search?

Geoffrey Irving (Feb 26 2025 at 11:01):

Recent LLMs are pretty good at tool use if prompted appropriately, so likely use.

Quinn (Feb 26 2025 at 15:59):

doing the fvapps baselines, the simplest scaffold (a loop) elicited nontrivial ability to learn syntax that the core llm was wrong about, just error message no LSP wire-up

Quinn (Feb 26 2025 at 16:26):

i'm actually a little skeptical that LSP / goal state beats error message by a lot overall, but one choice over another might save a lot of tokens/dollars


Last updated: May 02 2025 at 03:31 UTC