Zulip Chat Archive

Stream: IMO-grand-challenge

Topic: openai claiming gold medal results


Andy Jiang (Jul 19 2025 at 09:00):

https://x.com/alexwei_/status/1946477742855532918

Oliver Nash (Jul 19 2025 at 09:54):

See also https://github.com/aw31/openai-imo-2025-proofs/

Junyan Xu (Jul 19 2025 at 10:22):

35/42 ("independently graded by three former IMO medalists"), didn't produce a solution for problem 6

Adam Kurkiewicz (Jul 19 2025 at 18:08):

@Oliver Nash anything to share from GDM?

Oliver Nash (Jul 19 2025 at 19:16):

I understand and appreciate your interest but I'm afraid I have nothing to share @Adam Kurkiewicz.

For one thing I have moved on from GDM so my insider knowledge is not up to date. There are others here who are fully up to date but I don't think they'll share anything.

For another, when I was involved last year, the team graciously committed not to share anything until the competition had finished. After all, this is a kids event and one should allow time to celebrate their contributions before advertising corporate interests. (See also #Machine Learning for Theorem Proving > Blind Speculation about IMO 2025 @ đź’¬ )

Ad Astra (Jul 21 2025 at 03:15):

@Oliver Nash thanks for being a champion of high integrity. I’m new here myself, and certainly don’t claim any moral authority, but - as someone who is a leery eyed skeptic of everything “Open”AI - I found this behavior to be especially distasteful. So much so, that I did some deeper digging into all this and came across this fantastic, underground community that I never knew existed! Warms my heart to see so many taking the high ground here :)

Junyan Xu (Jul 21 2025 at 22:19):

https://x.com/pfftdontcare/status/1947405104367472948

OpenAI also provided curated corpus: https://x.com/polynoamial/status/1947398534753620147 And from what I've read of both reports, OpenAI spent both 4.5hr sessions while Gemini spent only one 4.5hr session. But that still doesn't say how much compute went into both.

Over the past several months, we made a lot of progress on general reasoning. This involved collecting, curating, and training on high-quality math data, which will also go into future models. In our IMO eval we did not use RAG or any tools.

https://x.com/polynoamial/status/1947398532899738064

~2 months ago, the IMO emailed us about participating in a formal (Lean) version of the IMO. We’ve been focused on general reasoning in natural language without the constraints of Lean, so we declined. We were never approached about a natural language math option.

OpenAI scored 0 on P6: https://x.com/polynoamial/status/1947404940902863246


Last updated: Dec 20 2025 at 21:32 UTC