Zulip Chat Archive
Stream: Machine Learning for Theorem Proving
Topic: formal-imo (discussion)
Joseph Myers (Nov 12 2025 at 17:07):
Assuming this thread also covers the formal-imo benchmark since you didn't create a separate thread for that:
- I don't think the 2012 P3 (liar's guessing game) statement is very useful (I don't know if there are other such cases in the repository):
@[imo_problem_subject combinatorics]
theorem imo_2012_p3 : (sorry : Prop) := by
sorry
- Do you have plans to upstream non-mathlib definitions used in these statements (while noting that some definitions might need to be in much more generality, and need to have much more API, to be upstreamed)? (And what definitions are relevant for a given problem also can depend heavily on how it's stated. I have this theory I've mentioned here before that 2004 P3 should literally be interpreted geometrically rather than assuming the unit squares are aligned with an underlying grid, but to make such a geometrical statement I need to get back to working on AperiodicMonotilesLean, complete the discrete part and get onto the geometrical part that includes lemmas about aligning tilings with an underlying grid.)
- What are your plans for future development of this repository - freezing it as a fixed benchmark set of problems or keeping improving it? Updating to new Lean versions? Fixing any mistakes found? Adding more problems? Improving consistency of statements with your chosen conventions? Adapting statements to use new mathlib definitions that become available? Improving consistency of statements with the original informal statements when better versions of those become available in place of paraphrased versions (I'd still like to expand my collection of scans of pre-2006 papers as originally given to contestants)?
Notification Bot (Nov 12 2025 at 17:15):
A message was moved here from #Machine Learning for Theorem Proving > AlphaProof Paper (discussion) by Eric Wieser.
Eric Wieser (Nov 12 2025 at 17:19):
Joseph Myers said:
- I don't think the 2012 P3 (liar's guessing game) statement is very useful (I don't know if there are other such cases in the repository):
Indeed, we populated the repository exhaustively for non-geometry with sorrys (in order to have docstrings). We decided that this problem was not worth the time investment to arrive at a correct formalization. This is marked as "not formalized" in our paper figure.
Eric Wieser (Nov 12 2025 at 17:46):
Freezing it as a fixed benchmark set of problems or keeping improving it?
I think we will follow PutnamBench in not attempting to freeze in any way
Updating to new Lean versions?
Yes, at least in sync with formal-conjectures
Fixing any mistakes found?
Yes, definitely.
Adding more problems?
For now we only intend for this to contain non-geometry problems. Any problems already in the repository and stated with sorry are fair game for a PR, though I think there is only one and it is a lot of work to review formalizations of.
Improving consistency of statements with your chosen conventions?
Adapting statements to use new mathlib definitions that become available?
Improving consistency of statements with the original informal statements when better versions of those become available in place of paraphrased versions
I think we're unlikely to invest effort into this, but we'd appreciate PRs. We may be slow to review them though.
Joseph Myers (Nov 12 2025 at 20:18):
The "not formalized" marker was how I found that sorry statement of 2012 P3: I saw the "all non geometry historical IMO problems" description in the paper and went looking for where you put the boundary between geometry and other problems. And while you can dispute whether 1995 P3 (not formalized), say, should count as geometry (I think it's no more geometry than other problems that you formalized; the IMO Compendium claims it was on the shortlist under N, which makes no sense whatever for that problem), it's clear that the liar's guessing game is not geometry at all.
Joseph Myers (Nov 12 2025 at 20:23):
I don't think 2012 P3 should be particularly harder to state than other "game" problems (so maybe an hour or two, though quite likely several days to write a formal solution). But that's thinking in terms of stating with free use of auxiliary definitions rather than the style of this repository with putting everything into a single theorem statement. (And I don't expect to actually attempt to write a formal statement for it any time soon, since I tend to give a higher priority to increasing mathlib coverage of definitions needed to state all modern IMO problems, over expanding my collection of statements following my conventions to cover more past years but with only partial coverage for each year because of API gaps in mathlib.)
Eric Wieser (Nov 12 2025 at 21:17):
Joseph Myers said:
the IMO Compendium claims it was on the shortlist under
I was unaware that categorized shortlists were available from 1995 onwards, as https://www.imo-official.org/problems.aspx does not list them. I think the "based on the official subject when available, or classification by domain experts" from the paper refers specifically to availability from this source.
Joseph Myers (Nov 12 2025 at 23:13):
The only original shortlist booklets I have pre-2006 are 1994 and 2002, but, yes, where the shortlist booklets put problems in categories, the IMO Compendium may report them.
There is no guarantee that, when applying the Smith problem selection protocol, the IMO Jury respects the categories in the shortlist booklist; they can e.g. decide to consider A1 as an option for the best easy combinatorics problem. And which category the PSC puts a problem in may be a matter of balancing the shortlist as a whole, and of how many good A or N problems were submitted that weren't somewhat combinatorial (A and N tend to be in shorter supply on the longlist than C and G), not just a local objective assessment of each problem. (When we were preparing the IMO 2019 shortlist, the problem that ended up as C6 spent a while under G, for example, before we decided that moving it to C would be better, and that it made sense for G8 to be G but C6 to be C because G8 had more actual geometrical content in the solution.)
I take the view that we should think of competition problems as being part of mathematics as a unified whole and so it's most natural not to try to exclude any problems based on their subject matter (this does of course require adding more definitions to mathlib). On that basis I prepared formal statements for all the problems at IMO 2025 for potential AI use before it turned out that (a) all the AI groups attempting IMO 2025 problems in formal form didn't know that a Lean version was on offer from the organizers, or wanted to produce their own rather than sharing a single version more comparable between all AIs and (b) as far as I know, none of them wanted to attempt geometry problems in Lean. (If a geometry problem had been selected involving polygons, areas, angle bisectors, arcs of circles or other missing definitions, I'd have prepared a Lean version working around the missing definitions, but that wouldn't meet my normal criteria for IMOLean.)
Last updated: Dec 20 2025 at 21:32 UTC