Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Lean Skill for Claude Code

Cameron Freer (Oct 18 2025 at 19:19):

In case it's useful to anyone else, I thought I'd share this Claude Skill that I've been finding useful for formalizing math in Lean using Claude Code (with Sonnet 4.5):
https://github.com/cameronfreer/lean4-skills/tree/main/plugins/lean4-theorem-proving
[updated link as the repo/marketplace now has two skills]

I've found that it generally helps, by reducing Claude's tendency to declare victory too soon (without recompiling to confirm) or to think that it's done even when it has introduced axioms or still has sorries remaining. It also encourages incremental well-documented progress on one thing at a time (rather than piecemeal on many things), reminds it to first look in mathlib before trying to prove something on its own, etc.

(deleted) (Oct 18 2025 at 23:36):

How powerful is this compared to GPT-5

(deleted) (Oct 18 2025 at 23:45):

And cost efficiency too :eyes: Even though I have a ChatGPT Pro plan I use up the weekly limit pretty quickly

Cameron Freer (Oct 19 2025 at 00:13):

@Huỳnh Trần Khanh I've found the GPT-5-Codex model (via Codex) somewhat better for planning proof structure and GPT-5 Pro the best for getting unstuck on tricky issues, but this Claude setup (Sonnet 4.5 via Claude Code with this skill) by far the best for filling in proof details, fixing errors, looking up mathlib results, and refactoring existing Lean code.

Eric Taucher (Oct 19 2025 at 12:41):

(deleted)

(deleted) (Oct 19 2025 at 12:45):

Honestly it's fascinating to see how quickly AI models become so good at Lean

(deleted) (Oct 19 2025 at 12:45):

To the point where in practice I don't see any gap between informal and formal math capabilities of LLMs

(deleted) (Oct 19 2025 at 12:49):

It's a solved problem at this point

Eric Taucher (Oct 19 2025 at 12:58):

(deleted)

Cameron Freer (Oct 19 2025 at 14:04):

@Eric Taucher I created it by asking Claude Code to write it based on patterns that seemed to have worked well in ~1000 commits of my repo containing mostly Claude Code development in Lean. It made use of the various Anthropic documentation for writing skills.

I've also been using superpowers, which has this skill for writing skills: https://github.com/obra/superpowers/blob/7fc125e5e9d864e082de89e3f72ffa4b95d5cc92/skills/writing-skills/SKILL.md

I'm in the process of improving it using the skill-creator skill of https://github.com/anthropics/skills

Eric Taucher (Oct 19 2025 at 14:44):

@Cameron Freer

For those that work with prompts that are an amalgamation of other prompts, as in this case the with Claude Skills (Progressive disclosure), do you see the effectiveness of a conversation with skills diminishing as the conversation grows?

The reasoning behind the question is that many AI Chat bots will summarize the info from the starting prompts, user prompt and create the first reply. Then summarize the initial prompt and reply along with the next user prompt as the actual prompt sent for the LLM completion. As such as the conversation continues, the generated portion of the prompt looses the initial instructions and thus one needs to start an entire new conversation. Wondering if using skills has either reduced or remedied the problem. Did not see such noted in the documentation but still reading.

Cameron Freer (Oct 19 2025 at 14:55):

@Eric Taucher
My understanding is that once a skill is invoked, the metadata (title/description) of the skill remain in context, so that it remembers to reread sections of the skill as needed.

I think the progressive disclosure of the skill itself means that individual instructions only load incrementally:
https://docs.claude.com/en/docs/agents-and-tools/agent-skills/overview#how-skills-work

It's also efficient about loading from hierarchies as needed, and about running scripts without loading them into context:
https://docs.claude.com/en/docs/agents-and-tools/agent-skills/best-practices#progressive-disclosure-patterns

I've restructured the Lean Skill to make better use of these patterns:
https://github.com/cameronfreer/lean4-skills/tree/main/plugins/lean4-theorem-proving/skills/lean4-theorem-proving/references
https://github.com/cameronfreer/lean4-skills/tree/main/plugins/lean4-theorem-proving/skills/lean4-theorem-proving/scripts
[updated links as the repo/marketplace now has two skills, each in a separate plugin]

Eric Taucher (Oct 19 2025 at 15:06):

Cameron Freer said:

I think the progressive disclosure of the skill itself means that individual instructions only load incrementally

Yes, correct.

Eric Taucher (Oct 19 2025 at 15:10):

@Terence Tao

Terry created a Google document that you or others might find of value in incorporating into a Claude Lean skill.

Not sure if the list is still maintained.

https://mathstodon.xyz/@tao/111360298114925842#:~:text=Google%20DocsLean%20phrasebookPhrasebook,X%20is%20equal%20to%20...

https://docs.google.com/spreadsheets/d/1Gsn5al4hlpNc_xKoXdU6XGmMyLiX4q-LFesFVsMlANo/edit?pli=1&gid=0#gid=0

Eric Taucher (Oct 19 2025 at 15:41):

@Cameron Freer

Hope you don't mind all of these replies, what you are doing is a good thing!

In creating MCPs, learned of adding memory, e.g.

https://github.com/modelcontextprotocol/servers/tree/main/src/memory

Memory might be of value with skills. I know that with memory, with my other LLM chat bots had to turn off the feature because the memory was not segregated for when it was needed, with skills this might work as needed.

Cameron Freer (Oct 20 2025 at 01:18):

@Eric Taucher no worries at all -- thanks for all the feedback and interesting ideas!

I've made an experimental skill that uses memory over MCP, indexed by project/file, as you suggested. It seems to be working, though I'm not yet sure how much (or whether) it is helping:
https://github.com/cameronfreer/lean4-skills/tree/main/plugins/lean4-memories

(I've renamed the repo/marketplace as it now contains two plugins with separate skills -- the main Lean one is independent of the memory one)

[edited link to reflect new directory structure]

Cameron Freer (Oct 20 2025 at 01:35):

@Eric Taucher also I've added a reference that the skill can load as needed, based largely on @Terence Tao's phrasebook -- thanks for the suggestion!

https://github.com/cameronfreer/lean4-skills/blob/main/plugins/lean4-theorem-proving/skills/lean4-theorem-proving/references/lean-phrasebook.md

[edited link to reflect new directory structure]

Eric Taucher (Oct 20 2025 at 19:18):

(deleted)

Cameron Freer (Oct 20 2025 at 19:28):

@Eric Taucher thanks for these details. I'll update the README installation instructions and see if I can make it any easier for Windows. (I already made the various scripts compatible with Bash 3.2 since many Macs will have this older version.)

Eric Taucher (Oct 20 2025 at 20:52):

(deleted)

Nehal Patel (Oct 24 2025 at 11:38):

@Cameron Freer Thanks very much for creating this! I found using claude code + skills a very nice way to use AI for doing things besides closing sorries. I post this elsewhere: #Machine Learning for Theorem Proving > Claude Code @ 💬

Olonbayar Temuulen (Oct 24 2025 at 14:31):

Nehal Patel said:

Cameron Freer Thanks very much for creating this! I found using claude code + skills a very nice way to use AI for doing things besides closing sorries. I post this elsewhere: #Machine Learning for Theorem Proving > Claude Code @ 💬

Do you find "codex" as an VS extension version worser than this set-up?

Olonbayar Temuulen (Oct 24 2025 at 14:32):

@Eric Taucher @Nehal Patel @Cameron Freer

Cameron Freer (Oct 24 2025 at 14:51):

@Olonbayar Temuulen I've personally found the Codex extension helpful for structural planning of a big proof or library of results, and also GPT-5 with lots of thinking is good for getting unstuck on tricky mathematical or type issues. But I've found Claude Code with this skill much better and faster at writing proof details, fixing errors, filling sorries, and refactoring large proofs.

Eric Taucher (Oct 24 2025 at 15:54):

(deleted)

Eric Taucher (Oct 24 2025 at 17:24):

(deleted)

Cameron Freer (Oct 24 2025 at 21:54):

I've also learned a bunch by asking Claude to elaborate on some of its thoughts -- here are two examples:

image-18.png

image-16.png

Nehal Patel (Oct 24 2025 at 23:30):

@Eric Taucher fyi, codex does also save a conversation history
And i would echo @Cameron Freer comments -- codex seems to be a very good mathematician, but the claude-code cli seems effective in moving stuff along. And the web chat interfaces for gpt-5-pro with deep research, and the equivalents from Anthropic and Google all seem quite good at providing natural language scaffolding (so if possible tart with the powerful tools to create a scaffold for claude code to follow)

Cameron Freer (Jan 20 2026 at 15:52):

Originally I had started developing this Claude skill in the course of a project to formalize three proofs of de Finetti's theorem. I've recently finished this (announced today in #general > formalization of three proofs of de Finetti's theorem).

I'd be happy to discuss this process more here if people are interested, and I'd also be interested to hear of other things that people have been doing using the skill.

Wrenna Robson (Jan 29 2026 at 12:42):

A "skill" in this case - it is essentially a markdown file that writes in structured, natural language the criteria for success and failure?

Cameron Freer (Jan 29 2026 at 14:04):

Wrenna Robson said:

A "skill" in this case - it is essentially a markdown file that writes in structured, natural language the criteria for success and failure?

Something like that, but also including patterns to follow and antipatterns to avoid, and with task-specific details (e.g., for golfing proofs that are already sorry-free, or guides to particular areas of mathlib), all organized hierarchically so that it can be read via "progressive disclosure".

Wrenna Robson (Jan 30 2026 at 10:24):

Thanks, that is really interesting - and interesting that it is so effective.

Wrenna Robson (Jan 30 2026 at 10:25):

What "plan" do you use Claude on for Lean work? I find the pricing structure somewhat confusing.

Cameron Freer (Jan 30 2026 at 15:37):

I've been using Max 20x for a while, but Max 5x or Pro or tokens via an API key should also work fine for lower-volume work.

Last updated: Feb 28 2026 at 14:05 UTC