Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: best open weight lean4 autoformalization tool ?

TongKe Xue (Dec 07 2025 at 10:26):

What are the best open weight lean4 autoformalization tools ?

Are we close to the point where we can dump the Springer undergrad/grad math textx, or all of Arxiv math/cs-theory, and have it auto formalize non trivial sections of it?

I'm looking primarily for open weight systems that I can run locally. Though of closed-weight, api-only systems perform significantly better, I'm curious about learning them too.

Kevin Buzzard (Dec 07 2025 at 17:24):

Answer to your second question is "no such tool exists yet"

Gregory Constantine (Dec 07 2025 at 18:00):

afaik the only autoformalization-specific model releases have been in very small model sizes, e.g. kimina autoformalizer, herald etc, and quickly lose coherence for graduate-level math, or creating new definitions. Maybe something like Kimi K2 would perform, which is usually intractable to run locally. I think API-gated stuff is the way to go basically.

Deleted User 968128 (Dec 08 2025 at 01:17):

I think you have to define "non trivial" more precisely. Perhaps provide some example problems via git?

It's very possible we are more capable currently than people are aware. gpt-oss-120b is a fairly big leap in mathematical reasoning. There are some very good qwen models. Using that with a strong open weight autoformalizer model and appropriate tool calling/agentic flow could be a powerful 1-2-3 punch (likely Aristotle secret sauce).

Depending on what you're willing to host, latest DeepSeek is openweight and SOTA at various math benchmarks, though at 650B parameters very costly to host.

If you're looking for a magical 1 shot openweight model, no, unlikely to work. But I think that perspective has lead to some confusion about capabilities.

I can't say for sure, but my guess is Aristotle is potentially based largely on openweight models (with some finetuning). You will not be able to get superior results, but you could probably come relatively close to them with the correct flow and minimal effort.

So, a good rule of thumb might be, wherever Aristotle (or an average of similar efforts) is and multiple by about 70%-80%. That is probably where we are with basic open weight flows.

Last updated: Feb 28 2026 at 14:05 UTC