Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: AI/ML tools for Lean

Anthony Bordg (Aug 05 2024 at 05:07):

Among the AI/ML tools that can possibly help with the formalization of mathematics in Lean, is there one that stands out?
I heard about Lean Copilot and Github Copilot, but I haven't tried yet. I tested chatGPT in the context of formalizing with Lean (to help with explicit universe levels declaration and management), but wasn't impressed, maybe I should give it another try when I figure out for which tasks It can be useful and for which tasks it can't.
Does one of these tools go beyond mathematical Olympiads to help with the formalization of higher mathematics?
Which tools can help with the formalization of definitions? Which ones can help with filling proofs? What are the best use cases?

Jason Rute (Aug 05 2024 at 14:58):

Unfortunately, the situation isn't very clear right now. I think many of us agree there is a lot of potential, but there aren't many practical tools that work well at the moment. Tools typically fall into two categories. General purpose tools and Lean-specific tools.

Jason Rute (Aug 05 2024 at 14:58):

As for general-purpose tools, the one I've heard the best about is GitHub copilot. I've heard from Kevin that some of his students like it. As for GPT-4 and ChatGPT, I've heard a large problem it has it that it mixes up Lean 3 and Lean 4 syntax. I think some have said Claude is better, but I haven't tried it. Also, with general-purpose tools it mostly comes down to how you use them. Better prompting and being willing to generate multiple solutions instead of just one would likely give better answers.

Jason Rute (Aug 05 2024 at 14:58):

As for Lean-specific tools, there is Lean copilot. I haven't heard of many people finding that to be a useful tool, but also I don't think many people have even tried it. There is also the new LLMLean. I don't think it has even been mentioned on this Zulip, so you might be the first to try it. (@Sean Welleck is LLMLean ready for people to use?) There are some other Lean-specific tools coming down the line like a Lean Hammer, and I think some of the more research-oriented projects like DeepSeek-Prover will eventually find their way into Lean.

Jason Rute (Aug 05 2024 at 14:58):

Does one of these tools go beyond mathematical Olympiads to help with the formalization of higher mathematics?

I think a few projects have been tested beyond competition problems, but not many of them have been made into user-facing tools. Lean copilot certainly is designed for working with all of MathLib.

Jason Rute (Aug 05 2024 at 14:58):

Which tools can help with the formalization of definitions?

I think for formalizing definitions you have to use a general-purpose tool like Github copilot or some LLM. How you prompt the model will make a lot of difference, and your mileage may vary. I think there is so much we could do right now to make this work, but the research is not there yet (nor has a hobbyist made a working wrapper around an LLM to do this sort of thing).

Jason Rute (Aug 05 2024 at 14:59):

Which ones can help with filling proofs?

I think LLMLean and Lean Copilot are the only ones I know that are user-facing proof solvers. But there are others that are research projects. Some are just LLMs, so if they are released they could possibly be turned into proof solvers, but the problem is that you typically have to call the LLM hundreds of times until it gives the correct proof. This can get slow and expensive and doesn't fit into the typical code-LLM interaction paradigm of tools like Github copilot.

Jason Rute (Aug 05 2024 at 15:05):

I think the delta between what we know we can do in experimental setups and what we can actually make useful user-facing tools for in practice is huge and there is a lot of room for people to fill that gap. (The challenge of course is making it into a valuable research project. One AI researcher told me he would never encourage his PhD students to spend time on making practical tools. It just isn't helpful for their advancement.)

Luigi Massacci (Aug 05 2024 at 16:11):

While it might not quite help directly with formalisation, one ML based tool that I think nearly everyone will agree has been pretty damn useful is Moogle. I haven't tried yet the two new contenders, i.e. LeanSearch and mathlib-search, but I'm optimistic about them too, as opposed to the upteenth latest "sota" on MiniF2F...

Jason Rute (Aug 05 2024 at 16:15):

Links:

Shreyas Srinivas (Aug 05 2024 at 16:17):

Given the volume of ML papers published every year on the academic side(2500 this ICML), I would have hoped that those who hire would see the value in tool building. But then again we live in an academia that has become a living example of Goodhart's law.

About Lean Copilot: I have tried it a few times, but it causes my system to heat up a lot. It runs for a while. The one or two times I used it, the suggested proofs worked on some API lemmas. However, on a practical level, the heating issues are a concern to me.

Anthony Bordg (Aug 05 2024 at 19:46):

@Jason Rute Thanks a lot for your comprehensive answer, I really appreciate! :folded_hands:

Anthony Bordg (Aug 06 2024 at 10:34):

Somewhat different, but probably worth mentioning too, is the Aesop proof search tactic for Lean 4. Apparently, it doesn't work out of the box, but requires the installation of a package as a dependency.
Some feedback on Aesop?

Kim Morrison (Aug 06 2024 at 11:52):

aesop is used pretty extensively in Mathlib. It is quite useful for finishing off proofs that work via ext, simp, intros, and "taking apart and putting back together" tactic steps. But it is not AI in any sense. It is also not always the most performant (mostly because it calls simp a lot internally, including during backtracking search, and this can get really slow).

Kim Morrison (Aug 06 2024 at 11:53):

It is mostly used in Mathlib in the form of "auto params", filling in omitted arguments to functions or structures. e.g. all the "boring" proofs in category theory are done via aesop.

Anthony Bordg (Aug 06 2024 at 12:19):

Kim Morrison said:

But it is not AI in any sense.

It falls under the symbolic AI/GOFAI umbrella, doesn't it?

Anthony Bordg (Aug 06 2024 at 12:22):

Your feedback fits well with what I read, especially regarding performance issues. It seems that aesop can be used to get proof scripts that can be polished and some calls to aesop can then be removed.

Jason Rute (Aug 06 2024 at 12:22):

Aesop is also the backend of Lean Copilot. It handles the tree search.

Anthony Bordg (Aug 06 2024 at 12:24):

Interesting!

Sean Welleck (Aug 06 2024 at 12:59):

Jason Rute said:

As for Lean-specific tools, there is Lean copilot. I haven't heard of many people finding that to be a useful tool, but also I don't think many people have even tried it. There is also the new LLMLean. I don't think it has even been mentioned on this Zulip, so you might be the first to try it. (Sean Welleck is LLMLean ready for people to use?) There are some other Lean-specific tools coming down the line like a Lean Hammer, and I think some of the more research-oriented projects like DeepSeek-Prover will eventually find their way into Lean.

Yes! Feedback and/or contributions are welcome

Anthony Bordg (Aug 06 2024 at 15:15):

Jason Rute said:

As for GPT-4 and ChatGPT, I've heard a large problem it has it that it mixes up Lean 3 and Lean 4 syntax. I think some have said Claude is better, but I haven't tried it. Also, with general-purpose tools it mostly comes down to how you use them. Better prompting and being willing to generate multiple solutions instead of just one would likely give better answers.

Something that doesn't seem possible: to use LLMs to search mathlib documentation, by asking if a notion/result has been formalized and if so, requiring a link. Even with models recently released, e.g. Claude 3.5 Sonnet, this is not possible, so not a use case of LLMs. You could say there is Moogle for that, I agree, but I was curious to try.

Jason Rute (Aug 07 2024 at 11:44):

@Anthony Bordg It depends what you mean by “LLM”. It is true that if you asked ChatGPT, it etc. to find something and provide a link it would likely hallucinate. If you specifically fine tuned on the mathlib’s documentation it would likely do better as that was basically how many LLM-based theorem provers work. They just memorize the theorems in the library. But a better approach often is to use a retrieval model, which could be based on an LLM. You have an encoder which turns text to vectors, then you can use k-nearest neighbors (or a locality sensitive hashing version of k-nearest neighbors), to find relevant results. Indeed, the lemma selection in MagnusHammer, ReProver, and Lean Copilot all use LLM-like transformers for this purpose. I believe Moogle does the same. And tools like the new GPTSearch and Google’s AI generated answers also combine search with an LLM generated summary. (One problem is that I don’t think Google and other search engines index this Zulip, so it has trouble finding relevant Lean stuff.)

Ralf Stephan (Aug 07 2024 at 13:38):

https://en.wikipedia.org/wiki/Retrieval-augmented_generation

Jason Rute (Aug 07 2024 at 14:07):

Just to be clear, retrieval augment generation is the combination of retrieval and generation. So Moogle is just retrieval, while Lean Copilot is retrieving augmented generation since after retrieving the lemmas one has to turn them into tactics.

Last updated: May 02 2025 at 03:31 UTC