Zulip Chat Archive

Stream: IMO-grand-challenge

Topic: F2F

Johan Commelin (Sep 07 2019 at 04:54):

I just read on the website:

To eliminate the need for human judges, we propose the formal-to-formal (F2F) variant of the IMO: the AI receives a formal representation of the problem (in the Lean Theorem Prover), and is required to emit a formal (i.e. machine-checkable) proof. We are working on a proposal for encoding IMO problems in Lean and will seek broad consensus on the protocol.

Doesn't this mean that human "judges" will not be able to say why they should trust Lean's solution, and why it deserves a gold medal?

Johan Commelin (Sep 07 2019 at 04:54):

Of course I still think this is an awesome challenge.

Johan Commelin (Sep 07 2019 at 04:55):

So don't take this complaint too serious :stuck_out_tongue_wink:

Patrick Massot (Sep 07 2019 at 09:16):

I guess the IMO judges won't like this "eliminate the need for human judges". I'm sure you can explain the idea using a less aggressive phrasing. When talking to colleagues about formal maths, I've noticed there are always some people who feel threatened and react defensively. I think they shouldn't feel like this, but most mathematicians are still human beings, so they don't always act rationnaly...

Jason Rute (Sep 07 2019 at 11:00):

Yes, I think the motivation is set up in a confusing way. This is more of a computer IMO instead of a computer-judged IMO. As an analogy, think computer go/chess/bridge. The key is not that you are replacing the original IMO competition with computer judges, but making a computer variant which naturally is judged by a computer proof checker.

Jason Rute (Sep 07 2019 at 11:17):

When I first read the webpage, I thought this was humans (probably not high school students) writing proofs in Lean. It took me another read to realize it was computers writing proofs in Lean. (But this might have just been based on my having seen others here talk about solving IMO problems in Lean.)

Daniel Selsam (Sep 07 2019 at 13:28):

Doesn't this mean that human "judges" will not be able to say why they should trust Lean's solution, and why it deserves a gold medal?

@Johan Commelin Hi Johan. First, I'll say that the rules are very much up for debate. The current proposal reflects two competing desiderata. On one hand, we want people to agree on the scoring system ahead of time, and we fear that it might be hard to preempt post-test controversy if we rely on human judges instead of the Lean kernel. On the other hand, we want to honor the spirit of the (traditional) IMO and only accept high-level proofs. If it happened that a fancy decision procedure could get Gold on the IMO with incomprehensible low-level proofs, I would probably consider that more a quirk of the test than progress in AI. The current rules allow lower-level proofs than humans could possibly provide, but type-checking in 10 minutes given only the acceptable background math does limit how low-level a proof can be. If a team wanted their AI to use e.g. Sturm sequences, the proofs they submit would need to include formally verified versions of all relevant claims, all of which must be type-checkable within the 10 minute time limit. To some extent we can enforce higher level proofs even in the F2F setting by just decreasing the amount of time allowed for type-checking.

Daniel Selsam (Sep 07 2019 at 13:45):

I guess the IMO judges won't like this "eliminate the need for human judges". I'm sure you can explain the idea using a less aggressive phrasing. When talking to colleagues about formal maths, I've noticed there are always some people who feel threatened and react defensively. I think they shouldn't feel like this, but most mathematicians are still human beings, so they don't always act rationnaly...

@Patrick Massot Thanks for mentioning this. It never occurred to me. I will think about how to clarify the motivations. Also, see discussion a few posts up about whether we even want machine-checkable proofs.

Daniel Selsam (Sep 07 2019 at 13:48):

Yes, I think the motivation is set up in a confusing way. This is more of a computer IMO instead of a computer-judged IMO. As an analogy, think computer go/chess/bridge. The key is not that you are replacing the original IMO competition with computer judges, but making a computer variant which naturally is judged by a computer proof checker.

@Jason Rute Hi Jason. Funny, this interpretation never occurred to me. I will add a sentence to preempt it.

Scott Morrison (Sep 07 2019 at 23:43):

Presumably the intention is that solutions can use mathlib; is compiling the needed parts of mathlib included in the 10 minutes? (That seems like a terrible interpretation to me.)

Jason Rute (Sep 09 2019 at 13:02):

@Daniel Selsam I'm curious who the intended audience for this challenge is. For example, do you think you will be able to get major players involved in this competition, from the ITP, ATP, and/or AI communities? Do you imagine the submissions will come from large teams or individuals? Do you think you will get lots of submissions or just a few? Are there other similar competitions that we can compare this to?

Daniel Selsam (Sep 09 2019 at 13:21):

Presumably the intention is that solutions can use mathlib; is compiling the needed parts of mathlib included in the 10 minutes? (That seems like a terrible interpretation to me.)

@Scott Morrison Hi Scott, good question. We imagine there being one Lean library for the background math that will be type-checked once, and AI proofs can use theorems in this library without it counting against the type-check time-limit. However, the background math library will not include all of mathlib, and instead will only include (a) theorems that human competitors are allowed to cite without proof and (b) basic things that are too obvious for human competitors to even state. Does this sound reasonable to you?

Daniel Selsam (Sep 09 2019 at 14:51):

@Daniel Selsam I'm curious who the intended audience for this challenge is. For example, do you think you will be able to get major players involved in this competition, from the ITP, ATP, and/or AI communities? Do you imagine the submissions will come from large teams or individuals? Do you think you will get lots of submissions or just a few? Are there other similar competitions that we can compare this to?

@Jason Rute I think the best analogy is chess. It was a northstar for decades before Deep Blue. I hope researchers from many different fields think about what it would take to win gold, reflect on the limitations of their existing tools, and develop new approaches over time. I don't expect any submissions until 2021 at the earliest.

Scott Morrison (Sep 09 2019 at 22:50):

Ooof, building a custom library seems a lot of work. Why not just start with "all-of-mathlib" as the spec? Most of mathlib is completely irrelevant to IMO problems, and its irrelevance saves you the effort of cutting it out by hand. :-)

Mario Carneiro (Sep 10 2019 at 07:09):

It doesn't seem unreasonable to me to allow all of mathlib but retain the privilege to reject a solution that uses some cheating theorem (assuming you didn't think to blacklist it in the first place). TBH it's a lot more interesting if you don't curate the library since library relevance filtering is a useful skill.

Daniel Selsam (Sep 18 2019 at 04:18):

@Scott Morrison @Mario Carneiro To clarify: everything done for the IMO Grand Challenge will be built on mathlib. There may also be some additional miscellaneous material that is needed for the IMO that isn't worth putting into mathlib. Last week I was also concerned about blacklisting parts of mathlib, but my current thinking is that this is not worth it and that any formal proof that can be checked in a "reasonable" amount of time should be counted.

Floris van Doorn (Sep 18 2019 at 04:49):

@Daniel Selsam That's good to hear. I (also) think blacklisting of results from mathlib is not necessary. I can't think of any results in mathlib that would be useful for the IMO but that a (human) participant is not allowed to use (without proof) on the IMO. I'm quite sure that the problems which are chosen do not become much easier when using university level math.

L.B.M. Jill (May 04 2021 at 10:45):

Daniel Selsam said:

Scott Morrison Mario Carneiro To clarify: everything done for the IMO Grand Challenge will be built on mathlib. There may also be some additional miscellaneous material that is needed for the IMO that isn't worth putting into mathlib. Last week I was also concerned about blacklisting parts of mathlib, but my current thinking is that this is not worth it and that any formal proof that can be checked in a "reasonable" amount of time should be counted.

I saw your more recent presentation about the challenge and I instantly fell in love with the idea. There, you mention Lean 4 as the "War Machine" that makes this possible. I'm just beginning to learn lean and would prefer to only focus on Lean 4 since it's not backward compatible with lean 3, and seems to be a better language (duh). However, it seems like most of the formalization work related to the challenge is being done in mathlib (lean 3 ?); so I'm a little confused.
How can people working with lean 4 help you reach your goal ? Which part of the project would you prefer them to contribute to ? Maybe there are plans to automatically convert lean 3 code into lean 4 (?) Should they contribute in this direction instead ? @Kevin Buzzard what do you think ?

Kevin Buzzard (May 04 2021 at 10:47):

I think that right now Lean 4 is still a work in progress and ultimately Lean 3's maths library will be ported over to Lean 4 and people are working on that question, which is difficult, right now.

L.B.M. Jill (May 04 2021 at 10:53):

Kevin Buzzard said:

I think that right now Lean 4 is still a work in progress and ultimately Lean 3's maths library will be ported over to Lean 4 and people are working on that question, which is difficult, right now.

Is mathlib the only factor keeping the project on lean 3 ? In you experience, how relevant is mathlib to formalizing/solving IMO problems ?

Scott Morrison (May 04 2021 at 10:54):

Quite. You should have a look in the archive/ directory of mathlib, for examples of IMO problems formalised by humans.

Scott Morrison (May 04 2021 at 10:54):

They make quite extensive use of results from mathlib.

Scott Morrison (May 04 2021 at 10:55):

And no, mathlib is not the only factor keeping the project on Lean 3. Lean 4 is still very much a work in progress.

Scott Morrison (May 04 2021 at 10:56):

It seems there is not yet a huge amount for people to do to help porting mathlib to Lean 4, unless you are a Lean 4 hacker. Occasionally we do get requests from them to change some feature of mathlib to make to more compatible with the (still gestational) porting effort, although for the most part it seems like the people actively involved in mathport are mostly making these changes to mathlib directly.

Scott Morrison (May 04 2021 at 10:57):

(I think there are others of us out here who are very ready to take direction, if there are particular mathlib refactors that seem useful or necessary at this stage.)

Last updated: May 02 2025 at 03:31 UTC