Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Kaggle, 38/50 already via open models, $2M prize fund

Deleted User 968128 (Dec 01 2025 at 18:43):

https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-3/leaderboard

Some impressive results here, especially considering the limits on model sizes (roughly gpt-oss-120b, mxfp4 quantized tho). Tool calls are likely being used, and these are probably the low hanging fruit.

$270K for 1st, and $1,589,248 for 47/50. It's possible they made the last 5-10 problems very hard, however.

Kevin Buzzard (Dec 01 2025 at 18:52):

This has nothing to do with Lean though. Tim this is not a generic forum for AI discussions. With my AIMO advisor hat on I would love to be promoting discussions about AIMO here but I do not do this precisely because it is not on topic for this forum.

Deleted User 968128 (Dec 01 2025 at 20:21):

Informal reasoning is likely the future of autoformalization, at least for now. LLMs are about next tokens, which doesn't lend itself well to mathematical reasoning, especially over long ranges and there is little indication that pure RLVR is an effective way forward at this time.

The results in this competition are astounding, indicating a breakthrough has possibly occurred, likely empowered by tool calling. Tool calling is a technique in solving math problems which is being sadly overlooked, and the recent SAT article in quanta was trying to rectify that.

A big gap I am finding in the Lean community is while it has domain experts, this expertise and singular focus has lead to lack of understanding of the tooling which leads to some misleading evaluations of capability and overlooked opportunities.

Deleted User 968128 (Dec 01 2025 at 20:24):

As you mentioned, the zulip is about lean, however this parent topic is about something else. Kaggle provides a flood of open notebooks and is not just about the competition itself. There is a lot to learn there if someone is interested in "machine learning for theorem proving"

gpt-oss-120b is particularly exciting given the reasoning tokens it will provide (which could be used to drive autoformalization), though the qwen models are still very powerful.

Kim Morrison (Dec 02 2025 at 12:12):

Re-iterating Kevin's comment above, we do not intend this channel to cover everything about machine learning for theorem proving; first and foremost this forum is about Lean, and not all discussions about ML theorem proving need to live here. @Tim Shephard, please be careful in future about your signal/noise ratio; we may move future threads like this to junk.

Deleted User 968128 (Dec 02 2025 at 23:03):

As above, I think this "singular focus has lead to lack of understanding of the tooling". I've seen a lot of statements here and elsewhere that are misleading others.

A good example is this arxiv which I encourage folks to read in depth, specifically sections on informal reasoning and decomposition. This will give you better context and appreciation to what I am saying above (and elsewhere). It will help avoid some of the low hanging errors that are being repeated in the community, which originally motivated me to post.

https://arxiv.org/html/2510.01346v1

Relevant given the lean advances in erdos problems recently.

Last updated: Feb 28 2026 at 14:05 UTC