Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: MerLean: autoformalization in quantum computation

Jinzheng Li (Feb 10 2026 at 01:52):

We introduce MerLean, a fully automated agentic framework for autoformalization in quantum computation, see our webpage.

It’s a promising demonstration of how autoformalization can be applied to frontier research areas. We’d love to get feedback from the community on our approach!

TongKe Xue (Feb 14 2026 at 07:57):

Skimming the webpage, I did not find any mention of foundation models. Does your technique use external foundation models, or does it directly to LaTeX -> Lean4 ?

Jinzheng Li (Feb 14 2026 at 11:05):

Thank you for your interest in MerLean. Our agent is based on Claude-opus-4.5/4.6. For more details, please refer to https://arthurmerlean.com/assets/MerLean_arxiv.pdf

Shreyas Srinivas (Feb 14 2026 at 11:08):

Has this been reviewed by someone who knows both lean and quantum computing?

Jinzheng Li (Feb 14 2026 at 11:16):

We have sent it to Alex. Alex has pointed out servel crucial problems to our primary version of MerLean. We have made substantial updates to it during past few days and we are planning to have a meeting about it next week.

Shreyas Srinivas (Feb 14 2026 at 13:02):

So it’s not ready yet? Or at least it suggests that autoformalization is not yet reliable

Shreyas Srinivas (Feb 14 2026 at 13:04):

Another question. How is this novel? I am aware of 8 projects (2 of which I have been allowed to check up) where GPT-5.2 codex autoformalized and even discovered solutions to previously problems (but not necessarily “open”) problems.

Shreyas Srinivas (Feb 14 2026 at 13:06):

Some of these formalisations produce 100000 lines of code but either.

The core definitions and theorems are human produced. So they can be run through safeverify and checked.
The definitions and top level theorems are auto generated, rendering the formalisation hard to review and therefore pointless.

Shreyas Srinivas (Feb 14 2026 at 15:58):

You also say and I quote

MerLean handles this through explicit axiom declarations,
clearly distinguishing intentional assumptions from incomplete proofs (sorry)

sorry simply uses a sorryAx axiom. So maybe at best you get a long list of axioms. Worse, if there is an inconsistency in the axioms, lean can't tell you and the entire formalisation could be in jeopardy

Yidi Qi (Feb 14 2026 at 23:04):

Thanks for the feedback, @Shreyas Srinivas! We have been continuously updating MerLean and have seen significant improvement in code quality as both the agent framework and the base model have evolved. We are not yet at the point of a general-purpose, fully reliable autoformalization tool, but the main point of this work is to show that it is already becoming useful for downstream tasks in applied research areas like quantum computing.

In our case, we adopted the second approach, where definitions, top-level theorems, and axioms are auto-generated. Most of the axioms the agent introduced in our experiments are actually known to be correct in mathematics but just difficult to prove formally due to the lack of prerequisite theorems in Mathlib. I think this is inevitable in formalizing frontier research in these applied fields. Our pipeline has multiple agents cross-checking them carefully, and human review is still necessary to ensure correctness, but it would still be more efficient than checking proofs from scratch.

Shreyas Srinivas (Feb 14 2026 at 23:32):

Yidi Qi said:

In our case, we adopted the second approach, where definitions, top-level theorems, and axioms are auto-generated. Most of the axioms the agent introduced in our experiments are actually known to be correct in mathematics

Then it is better to write them as sorried out theorems.
"actually known to be correct in mathematics" is not the same as "mathematically correct in lean + mathlib"

but just difficult to prove formally due to the lack of prerequisite theorems in Mathlib. I think this is inevitable in formalizing frontier research in these applied fields. Our pipeline has multiple agents cross-checking them carefully, and human review is still necessary to ensure correctness, but it would still be more efficient than checking proofs from scra

This still doesn't explain how it is better than 10 codex agents on GPT-5.3

Yidi Qi (Feb 15 2026 at 05:14):

Shreyas Srinivas said:

Then it is better to write them as sorried out theorems.

The reason we have this unconventional usage of axiom is that during the agent’s loops, we want to distinguish between two situations: “there is a gap in the theorem the agent should keep trying to fill” and “This is probably too hard for current AI models and they’d better move on for now” after a certain number of failed attempts. In the final output it would be easy to convert any axioms to sorried out theorems and we would be happy to adapt to community conventions on this.

"actually known to be correct in mathematics" is not the same as "mathematically correct in lean + mathlib"

This is a fair point. The ultimate goal is of course to have everything built from mathlib and fully verified, but often in physics, Lean can still be very useful even before reaching this level of verification, as long as the trust boundary is clear. We will phrase this more carefully in the next version of our manuscript.

This still doesn't explain how it is better than 10 codex agents on GPT-5.3

MerLean is essentially an orchestration layer built on top of general-purpose agents like Claude code or GPT-codex, specifically tailored for the task of autoformalization. It regulates what these agents should and shouldn’t do, aiming to improve the quality and reduce “cheating”. The current version is powered by Claude code due to its better integration with tools like Lean-MCP, but one can imagine building a similar workflow upon GPT-codex as well.

Jinzheng Li (Feb 15 2026 at 09:43):

Shreyas Srinivas said:

Another question. How is this novel? I am aware of 8 projects (2 of which I have been allowed to check up) where GPT-5.2 codex autoformalized and even discovered solutions to previously problems (but not necessarily “open”) problems.

Our objective is to achieve a translation from LaTeX to LEAN in some form. It is a rather ambitious goal, and we are just beginning this process. As you mentioned, auto-generated definitions and top-level theorems are not 100% reliable, which represents one of the most challenging issues we are trying to address. However, I believe we have made a good progress (although not perfect), as shown in our example.

On a different note, we created MerLean to accelerate research in applied mathematics, particularly in areas like Quantum Computing. Formalization is not our ultimate goal. Our future plan is to develop a conjecturer (Arthur) that will work alongside MerLean. Therefore, as long as the results are acceptable to physicists, we will consider our objective achieved.

Last updated: Feb 28 2026 at 14:05 UTC