Zulip Chat Archive

Stream: IMO-grand-challenge

Topic: AI Math Olympiad Prize


David Renshaw (Nov 27 2023 at 13:53):

https://aimoprize.com/

$10 million prize fund! And I appreciate that models must be "publicly-shared"

Adam Topaz (Nov 27 2023 at 15:46):

My university's network is blocking that site as potentially malicious... How serious is this?

David Renshaw (Nov 27 2023 at 15:50):

Looks pretty legit to me. https://aimoprize.com/supporters has blurbs from Terence Tao, Leonardo de Moura, and Kevin Buzzard.

David Renshaw (Nov 27 2023 at 15:50):

A big difference from the IMO Grand Challenge is that this one is framed as informal-to-informal.

Adam Topaz (Nov 27 2023 at 15:51):

Gotcha. Ok, I'll spin up my VPN :)

David Renshaw (Nov 27 2023 at 15:51):

(But I still think the best way to get there is through formal-to-formal.)

Jason Rute (Nov 27 2023 at 16:35):

They are taking this really seriously. $10M, paid roles, and prizes for partial progress. That is great to hear! I think they are missing a lot of details, like what is the "AI-MO"public sharing protocol"? What are the partial progress goals? What is the entry criteria? How many resources can one use?

David Renshaw (Nov 27 2023 at 16:36):

presumably the folks in the paid roles will eventually fill in those details

Jason Rute (Nov 27 2023 at 16:38):

Overall I hope the criteria is not too onerous. Otherwise we could have a situation where some tech company produces a working solution which gets a gold medal in the IMO and publishes in Nature, but it doesn't count since it relies on proprietary components, so a bunch of teams rush over that next year to publicly reproduce the results to get the $10M. (Or maybe this is fine too. It is the progress that matters.)

Jason Rute (Nov 27 2023 at 16:38):

As for legitimacy, let's ask @Kevin Buzzard.

David Renshaw (Nov 27 2023 at 16:39):

but it doesn't count since it relies on proprietary components, so a bunch of teams rush over that next year to publicly reproduce the results to get the $10M.

that sounds like a good outcome to me

Jason Rute (Nov 27 2023 at 16:40):

Also, this is more money than the Millenium prizes combined. Are there any other comparably-sized prizes in AI? I do worry that this might be too much money. Hopefully it doesn't end up like the Netflix prize where the solutions are just a hodgepodge of models and techniques with no overall pattern and generalizability (or so I thought I remember hearing, as it was before I was in AI).

Miroslav Olšák (Nov 27 2023 at 16:42):

To me, it is not clear what they are going to do against lengthy calculations / proofs. Human beings are not capable of writing as long proofs as computer programs in 4.5 hours, and according to standard IMO rules, solution of any length should be accepted as long as it is correct.

David Renshaw (Nov 27 2023 at 16:42):

I think the Netflix prize was of the form "whoever decreases the loss the most wins". Here we have a concrete endpoint we're aiming for.

David Renshaw (Nov 27 2023 at 16:49):

AI models must consume problems in the same format as human contestants and must produce human readable solutions that can be graded by an expert panel, using standard Olympiad criteria.

A human grader would not accept an enormous proof.

Jason Rute (Nov 27 2023 at 16:58):

I think the Netflix prize was of the form "whoever decreases the loss the most wins". Here we have a concrete endpoint we're aiming for.

I'm not sure what you mean here. Maybe I don't know enough about the Netflix prize. How is this different? (Is it because the test set is hidden?) Couldn't we easily end up in the situation where there are 100 sub-models? Each model does better on a different type of IMO problem. Teams realizing they are not quite there, start to combine into super teams where they aggregate all their models. There are a lot of hard coded solutions for particular common types of problems. There is very complicated and ad hoc logic to decide what model to use in each situation and what solution to send to the reviewers. Of course complexity can be fine. Alpha-fold 2 is quite complex. But I think high prize money could lead to a kitchen-sink approach where everything is thrown in.

Jason Rute (Nov 27 2023 at 16:58):

Alternately, another possible issue with extreme prize money is that teams not wanting to get scooped on $10M stop sharing their partial results. (Are we going to stop getting models like LLEMMA now?)

Jason Rute (Nov 27 2023 at 16:58):

Of course I might just be too jaded. The attention will be good, and most AI labs will still be looking to the big picture even if they aren't first (and $10M isn't that much money).

Miroslav Olšák (Nov 27 2023 at 17:02):

A human grader would not accept an enormous proof.

On IMO, I am pretty confident they would (although truly enormous proof couldn't appear). If there is an exception in this challenge (which I find pretty reasonable), I would consider it fair to define "enormous".

David Renshaw (Nov 27 2023 at 17:02):

Netflix Prize = "win computer chess tournament"
AI-MO Prize = "beat Garry Kasparov"

David Renshaw (Nov 27 2023 at 17:03):

Like, there are lot of people saying that it's just never going to happen. Part of the point is to prove those people wrong.

Adam Topaz (Nov 27 2023 at 17:03):

Still can't connect from my university network :-/ Is there an expected timeline for this, or is it completely open-ended?

Jason Rute (Nov 27 2023 at 17:04):

David Renshaw said:

(But I still think the best way to get there is through formal-to-formal.)

I think formal-to-formal could still be an important part of the winning solution, especially if it is the best way to get there. Imagine something like draft-sketch-prove-sketch-draft.

Jason Rute (Nov 27 2023 at 17:05):

Adam Topaz said:

Still can't connect from my university network :-/ Is there an expected timeline for this, or is it completely open-ended?

From the website:

The first AI-MO approved competitions will open to participants in early 2024. There will be a presentation of progress at the 65th IMO, which will be held in Bath, England in July 2024

Jason Rute (Nov 27 2023 at 17:05):

Overall, they don't give many concrete details.

Kevin Buzzard (Nov 27 2023 at 17:12):

yeah this is definitely a thing, XTX Markets are behind it. My understanding from talking to them is that they (i.e. XTX) get to decide what "partial progress" means.

As for this v the netflix prize, here there is a very clear goal. XTX are working in collaboration with the IMO organisers so will probably manage to persuade them to mark a couple of computer entries, if there are any this year.

Re sharing: I was talking to a bunch of DeepMind and ex DeepMind people over the weekand and they say that this (suppression of results) is already happening a lot anyway.

Jason Rute (Nov 27 2023 at 17:35):

I do especially like the partial progress part. My big complaint on the original IMO Grand Challenge is that there is no mechanism for measuring, rewarding, or acknowledging partial progress.

Jason Rute (Nov 27 2023 at 17:42):

Ok, I now understand @David Renshaw's point. Since one doesn't have to beat other models, just humans, there isn't as much need for extreme optimization as seen in some Kaggle competitions and the Netflix challenge. So it might (hopefully) be that a model of reasonable complexity does fine on all types of problems, or that maybe there is a model for only a few select types of problems, but not the kitchen sink aggregate approach I am worried is needed to squeeze out that remaining 1% of improvement.

Kevin Buzzard (Nov 27 2023 at 17:43):

My big complaint on the original IMO grand challenge is that if you look at the 2023 IMO day 1, two of the questions aren't "prove this", they are "determine this set", and I have never figured out how to formalise this. This is why I think the natural language approach is a better idea -- it makes it much easier to define the question.

Miroslav Olšák (Nov 27 2023 at 18:55):

My big complaint on the original IMO grand challenge is that if you look at the 2023 IMO day 1, two of the questions aren't "prove this", they are "determine this set", and I have never figured out how to formalise this. This is why I think the natural language approach is a better idea -- it makes it much easier to define the question.

One option already mentioned here (I think) was to have human-checked solution to the "determine" part but computer-verified proof, which is going a bit towards AI-MO Prize just without people having to check the proofs (which is more annoying part).

Another (more formal) option is to restrict per problem which constructing operations are allowed. For example in case of IMO 2023-1, it would be a set defined by a free variable x, and besides this variable, allowed functions would be boolean connectives, equality, integer less than, arithmetic operations,, prime decomposition, and basic functions providing manipulation with the prime decomposition (length of a list, index into a list, ...).

Kevin Buzzard (Nov 27 2023 at 19:06):

yeah, like I said, we could either contort ourselves into a convoluted position or just ask for a natural language solution.

Miroslav Olšák (Nov 27 2023 at 19:18):

my problem with human-checking is that I wouldn't be much motivated to check 20 pages of geometry calculations that are likely correct (in case they came from a computer algebra system, and not ChatGPT :laughing: )

Joseph Myers (Nov 28 2023 at 03:35):

After (briefly) discussing this prize with XTX a few weeks ago I noted a few considerations that didn't come up in that discussion but would be relevant for establishing detailed rules, one of which was the matter of establishing a length limit on solutions to avoid coordinators needing to read 1000-page claimed proofs (which have a legitimate place in mathematics, but not in solving IMO problems, since that's far longer than an IMO contestant could reasonably write in 4.5 hours).

Joseph Myers (Nov 28 2023 at 03:39):

If a human IMO contestant produces a solution with 20 pages of algebra (which is within what might be written in reality by such a contestant, occasionally, albeit 20 handwritten pages for the human contestant), the coordinators need to read it, even if boring. I think motivation in the AI-MO Prize case might be provided by paying the coordinators to check the AI solutions. In both cases, coordinators are free to use computer algebra systems themselves to help them in checking.

Andy Jiang (Nov 28 2023 at 05:34):

How exactly is this a problem? If they came from a computer algebra system can't we just verify it with a computer algebra system?

By the way, can someone correct my understanding on this. I actually thought most plane geometry problems can be efficiently solved by computer algebra. But i gather this isn't true?

Siddhartha Gadgil (Nov 28 2023 at 06:03):

As I understand if you trust the algebra systems then you can do so but with proved algorithms it is not so easy. My experiments on this are at https://siddhartha-gadgil.github.io/automating-mathematics/posts/solving-pappus/ (following on https://siddhartha-gadgil.github.io/automating-mathematics/posts/proving-by-solving/).

If someone knows about checked algorithms that work I would be grateful for the information.

Joseph Myers (Nov 29 2023 at 03:02):

If the problem reduces to showing some polynomial equations in the coordinates imply another polynomial equation in the coordinates, then it can be solved efficiently (in practice) by Gröbner bases. Otherwise (if there are inequalities involved), you need quantifier elimination, which is much less efficient.

A large proportion of IMO geometry problems do in fact involve configuration or nondegeneracy information that prevents them reducing to Gröbner bases. For example, IMO 2023 problem 2 asks to prove that two points meet on an internal angle bisector (the polynomials in question won't distinguish internal from external bisectors, and indeed a similar issue applies to the definition of the point S). IMO 2023 problem 6 asks to prove that circles have two common points, which polynomials won't distinguish from having a single common point.

Sometimes the relevant information in the conclusion, beyond what can be deduced with Gröbner bases, may be geometrically obvious and not need remarking on in a human solution, but that's not always the case. In IMO 2023 problem 6 you could lose points for not justifying various nondegeneracy conditions needed in a solution. In IMO 2002 problem 2 you could lose a point for failing to explain why an angle condition meant you had the incentre rather than an excentre.

Even if you don't have such configuration information in the conclusion, you might need to use such information from the hypotheses. Maybe the polynomials generate an ideal that contains polynomial PQPQ, but actually you have to prove something equivalent to P=0P=0, and excluding the possibility of Q=0Q=0 with P0P\ne 0 requires using nondegeneracy conditions from the problem statement, for example. Again, it depends on the problem whether this is something that needs explicit justification in a solution, but if you're doing an algebraic solution you probably need to least to interpret Q=0Q=0 geometrically before saying it can't occur.

Siddhartha Gadgil (Nov 29 2023 at 05:48):

Joseph Myers said:

If the problem reduces to showing some polynomial equations in the coordinates imply another polynomial equation in the coordinates, then it can be solved efficiently (in practice) by Gröbner bases. Otherwise (if there are inequalities involved), you need quantifier elimination, which is much less efficient.

A large proportion of IMO geometry problems do in fact involve configuration or nondegeneracy information that prevents them reducing to Gröbner bases. For example, IMO 2023 problem 2 asks to prove that two points meet on an internal angle bisector (the polynomials in question won't distinguish internal from external bisectors, and indeed a similar issue applies to the definition of the point S). IMO 2023 problem 6 asks to prove that circles have two common points, which polynomials won't distinguish from having a single common point.

Sometimes the relevant information in the conclusion, beyond what can be deduced with Gröbner bases, may be geometrically obvious and not need remarking on in a human solution, but that's not always the case. In IMO 2023 problem 6 you could lose points for not justifying various nondegeneracy conditions needed in a solution. In IMO 2002 problem 2 you could lose a point for failing to explain why an angle condition meant you had the incentre rather than an excentre.

Even if you don't have such configuration information in the conclusion, you might need to use such information from the hypotheses. Maybe the polynomials generate an ideal that contains polynomial PQPQ, but actually you have to prove something equivalent to P=0P=0, and excluding the possibility of Q=0Q=0 with P0P\ne 0 requires using nondegeneracy conditions from the problem statement, for example. Again, it depends on the problem whether this is something that needs explicit justification in a solution, but if you're doing an algebraic solution you probably need to least to interpret Q=0Q=0 geometrically before saying it can't occur.

Are Grobner basis efficient in practice for Euclidean geometry problems? My experience was they are not (and I believe they are doubly exponential).

Joseph Myers (Nov 30 2023 at 01:00):

Both Gröbner bases and quantifier elimination are doubly exponential in worst cases, but I think Gröbner bases are much more likely to work with reasonable resources in practice. A previous discussion https://leanprover.zulipchat.com/#narrow/stream/208328-IMO-grand-challenge/topic/geometry.20problems says something about what problems could be proved with Gröbner bases plus handling nondegeneracy conditions manually.

Miroslav Olšák (Dec 01 2023 at 18:09):

Recently, I have noticed that the AIMO Prize website spins my CPU's. What is it doing? Are they using my hardware to solve IMO problems :-D? Am I the only one, or is it happening to someone else too? I am on Firefox 118.0.2 (Linux).

David Renshaw (Dec 01 2023 at 18:12):

Yeah me too, with Firefox and Chrome on Linux.

David Renshaw (Dec 01 2023 at 18:13):

For me on Firefox it looks like it spins up like 6 or 7 threads.

Eric Wieser (Dec 01 2023 at 18:24):

It's this SVG file: https://aimoprize.com/pattern-tile.svg

Kevin Buzzard (Dec 01 2023 at 20:19):

I've reported the issue to XTX. Thanks.


Last updated: Dec 20 2023 at 11:08 UTC