Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Autoformalization using Megalodon

Timothy Chow (Jan 08 2026 at 14:05):

Josef Urban, who has been working on machine learning for theorem proving for decades, just posted an arXiv preprint about autoformalizing much of Munkres's topology textbook. Although he did not use Lean, I think his work is still on-topic here, and I'm curious about what others think of this project. Is there anything the Lean community can learn from Urban's experience? What did he do right and what (if anything) did he do wrong?

Auguste Poiroux (Jan 08 2026 at 16:00):

Really impressive @Josef Urban

@Timothy Chow Codex and Claude Code seems to work surprisingly well out of the box on Megalodon. OpenAI and Anthropic probably don't have a RL pipeline for Megalodon, and still here we have a 130k lines formalization produced with these models. As said by @Josef Urban in the introduction:

Based on the fast progress, low cost, virtually unknown ITP/library, and the
simple setup available to everyone, we believe that (auto)formalization may become quite easy and
ubiquitous in 2026, regardless of which proof assistant is used.

I am speculating here and sharing some early thoughts:
Here we see that LLMs can now be used to autoformalize large chunks of theorems. So, we are probably not very far from a time when a freshly designed proof assistant can be bootstrapped with a Mathlib-scale library using LLMs. The quality will probably not be great in the beginning, but still I think it is worth thinking about the consequences of that. Maybe new kinds of proof assistants designed for the LLM era will appear...

Lua Viana Reis (Jan 08 2026 at 17:14):

Maybe new kinds of proof assistants designed for the LLM era will appear...

I think this is missing a point that is: a well-designed library is useful for many things other than checking that some statements are correct. Autoformalizing an introductory book is quite far from engineering a library like Mathlib in this aspect.

Auguste Poiroux (Jan 08 2026 at 17:31):

Yes, I do agree, because otherwise, already in February, we could get a 2 million lines "library" by having 10 people working for a month like Josef Urban.
I am just speculating about a future possibility, and that's also why I am mentioning new kinds of proof assistants. Maybe if designed in certain ways, it could help LLMs building nice libraries.

Junyan Xu (Jan 08 2026 at 17:56):

Random data:

import Mathlib.Topology.UrysohnsLemma
import Mathlib.Topology.Metrizable.Urysohn
import Mathlib.Topology.TietzeExtension
#trans_imports "Mathlib." -- 1541

Julian Berman (Jan 08 2026 at 19:22):

(Impressive as it is) the output of the autoformalized textbook is not likely to be useful to the intended audience of the textbook is it? The textbook is a pedagogical tool for learning topology, but the outputted proofs sound like they're long and possibly roundabout (judging by the paper calling out their lengths). To be useful to students the autoformalization has to be as easy to follow for the student as the prose is, and to be useful to a mathematician the autoformalization has to be able to deal with texts which are less precise about making sure they're digestible to their reader than a carefully authorized textbook, right? So the oversimplified takeaway is that this is a step somewhat in between those two goals?

Jason Rute (Jan 08 2026 at 20:16):

I don’t think the goal of this project is pedogogy.

Miguel Marco (Jan 08 2026 at 21:28):

I would like to see if this kind of approach can work in an LLM that can be run locally (like llama or mistral).

km (Jan 15 2026 at 06:55):

This is my first time posting here. Please forgive me if my comment is off-base as I am not familiar with how things work here.

I read Professor Urban's paper and was also very impressed. I had a feeling that the automated formalization of mathematics textbooks was likely possible, but I am very excited to see a concrete example of it.

Now, like Professor Urban, I am also experimenting with automated proving using coding agents (mainly Cursor and Codex CLI) by providing natural language proofs as context (although I use Lean). These general-purpose coding agents solve Putnam and IMO problems truly magnificently, just like other specialized AI agents. (Of course, I am not solving hundreds of problems exhaustively, so there is a possibility that I am seeing biased results...)

And while they sometimes solve these competitive math problems almost on their own all the way to the end without user intervention, it is only since GPT-5.2 that they have become this dramatically better. Both GPT-5 and 5.1 had terrible so-called "laziness" and frequently required user intervention (although I could still manage to solve them), but I am experiencing that this has been dramatically improved now.

By the way, I would like to ask a small question: Has anyone already carried out a Lean proof for IMO 2025 P6 ? I think I was probably able to solve it with my setup (although it required quite a bit of user intervention and the help of ChatGPT 5.2 Pro).

Franz Huschenbeth (Jan 15 2026 at 09:18):

km said:

This is my first time posting here. Please forgive me if my comment is off-base as I am not familiar with how things work here.

I read Professor Urban's paper and was also very impressed. I had a feeling that the automated formalization of mathematics textbooks was likely possible, but I am very excited to see a concrete example of it.

Now, like Professor Urban, I am also experimenting with automated proving using coding agents (mainly Cursor and Codex CLI) by providing natural language proofs as context (although I use Lean). These general-purpose coding agents solve Putnam and IMO problems truly magnificently, just like other specialized AI agents. (Of course, I am not solving hundreds of problems exhaustively, so there is a possibility that I am seeing biased results...)

And while they sometimes solve these competitive math problems almost on their own all the way to the end without user intervention, it is only since GPT-5.2 that they have become this dramatically better. Both GPT-5 and 5.1 had terrible so-called "laziness" and frequently required user intervention (although I could still manage to solve them), but I am experiencing that this has been dramatically improved now.

By the way, I would like to ask a small question: Has anyone already carried out a Lean proof for IMO 2025 P6 ? I think I was probably able to solve it with my setup (although it required quite a bit of user intervention and the help of ChatGPT 5.2 Pro).

There are likely lots of proofs of seemingly open problems, which with very high certainty includes a 6 month old IMO problem. Also a new topic would be more appropriate as it is not directly related to Josef Urbans Paper.

Last updated: Feb 28 2026 at 14:05 UTC