Zulip Chat Archive
Stream: Machine Learning for Theorem Proving
Topic: Autoformalisation
Cris Salvi (Aug 11 2024 at 08:39):
Are there any recent surveys on autoformalisation that include a description of the techniques used and a link to the relevant code? Honest accounts of both successes and failures would be particularly appreciated.
Jason Rute (Aug 11 2024 at 21:03):
I don’t think there is such a survey. Autoformalization is pretty recent and I don’t think we have hit a stable point where a survey wouldn’t be very much out-of-date.
Jason Rute (Aug 11 2024 at 21:03):
Here are some key points that might help:
- Qingxiang Wang has a few papers trying out auto-formalization using machine translation
- Christian Szegedy wrote A Promising Path Towards Autoformalization and General Artificial Intelligence which is a manifesto of sort on the topic. The idea is that we can get into a virtuous loop where autoformalization and automated theorem proving strengthen each other in a reinforcement learning loop.
- Christian Szegedy gave a talk at AITP 2020 showcasing how hard this topic is and the brick walls they were running into.
Jason Rute (Aug 11 2024 at 21:03):
Then everything changed in 2022 with the paper Autoformalization with Large Language Models. The main idea was that LLMs for Code (codex at the time) could do autoformalization at a sufficiently useful level (about 25% correct for theorem statements of competition problems IIRC). That was good enough for generating additional data for reinforcement learning of automated theorem provers.
Jason Rute (Aug 11 2024 at 21:04):
Shortly after that paper, a number of works came out (many talked about on this Zulip at length):
- LeanChat (a pre-chat-gpt chat tool for autoformalization into Lean)
- LeanAide: Autoformalization connected to Lean metaprogramming so can better find and fix translation mistakes
- Draft-Sketch-Prove: Formalize theorem statements into proof sketchs and fill in details with Hammer. Many papers in automated theorem proving for Isabelle extend Draft-Sketch-Prove:
- ProofNet: A benchmark for Lean made by autoformalizing textbook exercises
- Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization: I think the idea is to check informal LLM generated answers by using auto-formalization.
- Multilingual Mathematical Autoformalization: Using autoformalization in reverse (formal to informal) to generate training data for LLMs.
- DeepSeek-Prover and AlphaProof both use autoformalized competition problems at a large scale (millions of problems) to train automated theorem provers at scale.
(I’m sure I’m missing a lot of works.)
Jason Rute (Aug 11 2024 at 21:05):
Also, many people have been converting Lean theorems into informal statements for the purpose of either better documentation (automatic documentation), for semantic search (Moogle, LeanSearch), or for making training data for automated theorem proving.
Jason Rute (Aug 11 2024 at 21:05):
I’m less aware of work pursuing autoformalization for larger projects (e.g. formalizing a book, or a paper), but I think that would be the next big goal. Another big goal would be to do more systematic autoformalization, like formalizing a whole database, or a whole encyclopedic reference of some sort. Or trying to scrap and formalize all theorems of a particular form in some area of mathematics.
Jason Rute (Aug 11 2024 at 21:12):
Also just found this benchmark on autoformalization in Lean: An Evaluation Benchmark for Autoformalization in Lean4
Jason Rute (Aug 11 2024 at 21:14):
And this: Process-Driven Autoformalization in Lean 4
Jason Rute (Aug 11 2024 at 21:16):
And: Lean Workbook: A large-scale Lean problem set formalized from natural language math problems (discussed in #Machine Learning for Theorem Proving > Lean Workbook: A large-scale Lean formalized problem set)
Cris Salvi (Aug 12 2024 at 08:23):
This is great @Jason Rute thanks a lot for the response.
Cris Salvi (Jul 16 2025 at 07:57):
@Jason Rute — Would you say that approaches based entirely on LLMs are currently the default methods for autoformalisation? Also, as far as you know, has the combination of MCTS + LLMs + RL with Lean feedback been considered/explored?
Shashank Kirtania (Aug 26 2025 at 05:03):
I see a lot of efforts on (like some of the papers mentioned in the thread) on informal theorems to formal theorems, however not so much for informal proof to formal proof?
Why is that? Is it because formal & informal proof writing are quite different or it’s much harder problem to solve or something else?
Phillip Harris (Aug 26 2025 at 19:27):
I think it's just a lot harder than merely translating a theorem statement, but there's a fair number of people working on this.
Jason Rute (Aug 28 2025 at 11:53):
Phillip Harris said:
there's a fair number of people working on this.
Who are you referring to? I think the Trinity stuff by Morph is the only thing of that sort released, especially focusing on the harder case where the proof has intermediate definitions and lemmas.
Jason Rute (Aug 28 2025 at 11:53):
(Although as theorem proving gets more advanced, there is a fair amount of proof formalization going on already in automated theorem proving, since usually the model first comes up with a separate natural language proof, either in its chain of thought or through another model. It seems easier for a model to do complex math reasoning in natural language than in formal language.)
Bas Spitters (Aug 28 2025 at 13:29):
@Jason Rute Only one proof has been published hasn't it? Not the tool itself. At least, I could not find it here:
https://github.com/morph-labs
(deleted) (Aug 28 2025 at 14:13):
That's right. The tool is not publicly available
Jason Rute (Aug 28 2025 at 14:16):
Yeah, by “release” I mean releasing results, not tools.
Phillip Harris (Sep 02 2025 at 15:32):
Jason Rute said:
Phillip Harris said:
there's a fair number of people working on this.
Who are you referring to? I think the Trinity stuff by Morph is the only thing of that sort released, especially focusing on the harder case where the proof has intermediate definitions and lemmas.
Yeah, Morph and some other people who haven't released anything yet.
Yutong Wu (Sep 03 2025 at 05:43):
We evaluated common language models on the autoformalization task (just statement translation) and provided additional training focused on two key capabilities required for autoformalization models: formal domain knowledge and informal-to-formal reasoning.
Our paper is available at: https://arxiv.org/abs/2508.04440, and the model can be found at: https://huggingface.co/stepfun-ai/StepFun-Formalizer-32B
Although this work is still in its early stages, we hope it will be helpful to everyone.
Kelly Davis (Sep 03 2025 at 06:04):
Thanks for the evaluation. It is of help. Any plans on adding Goedel-Formalizer-V2 to the comparison?
PS: I know Goedel-Formalizer-V2 was released one day before your article, but maybe in 2508.04440v2?
Yutong Wu (Sep 03 2025 at 06:12):
Kelly Davis said:
Thanks for the evaluation. It is of help. Any plans on adding Goedel-Formalizer-V2 to the comparison?
PS: I know Goedel-Formalizer-V2 was released one day before your article, but maybe in 2508.04440v2?
Thank you for the reminder. We will add a comparison with Goedel-Formalizer-V2 in the revised version of the paper.
Last updated: Dec 20 2025 at 21:32 UTC