Zulip Chat Archive

The currently proposed rules for the IMO Grand Challenge use the F2F (formal-to-formal) format. There are certainly compelling reasons for this, and I think it makes complete sense to restrict the problem in this way.

I wonder though, how much additional scientific value there would be in considering the more difficult problem of inputting the problem in natural language (the same format that human contestants use), and outputting a high-level solution in natural language. One downside of this rule set is the subjectivity of a human judge's opinion of whether a given proof is valid - this appears to me the main justification for the F2F format.

But it seems to me that there are significant upsides to considering the harder problem.

It forces us to solve the automated NL2F translation problem (natural language to formal language). Solving this problem would be useful anyways for the F2F task by allowing us to more easily generate a large library of problems (which is painstakingly done by manual NL2F translation currently).

It also forces us to solve the F2NL translation problem - not just any NL representation, but, say, a minimal high-level representation that human judges deem acceptable. On the one hand, there are aspects of this problem that are weirdly IMO-domain-specific, reducing scientific value: there are certain theorems that somewhat arbitrarily are not allowed to be cited in an IMO proof (e.g., Muirhead’s Theorem for inequalities is specifically blacklisted). But that is the exception not the rule. To map a series of low-level Lean steps into a library of high level concepts, and to automatically execute proof simplifications (like symmetry arguments or extracting repeatedly performed steps into lemmas) - this seems to me a worthy goal. Besides, the roadmap for the F2F solution, at least as I understood it from Daniel Selsam's AITP talk, entails a solving algorithm that operates on the level of high-level tactics, so the F2F solver might already have an internal solution representation that maps to the high-level concepts entailed in a typical human proof.

These two problems (NL2F translation and F2NL translation) might conceivably each be harder than the currently stated IMO Grand Challenge, I have no idea. So again, I think the F2F format specification makes sense to restrict the problem. But if all 3 components are solved, we would end up with a NL2NL solver that passes a sort of Olympian Turing test. That would be an impressive stunt, and also would more widely open the door to human-interfacing applications.

So, does it make sense to work on these problems in parallel to the F2F problem? They might be problems more suited for, say, the NLP community, but if the goal is to interface those solutions with an F2F solver, then perhaps this community should at least be part of the solution.

Kevin Lacker (Sep 24 2020 at 15:13):

typically if you were solving the natural-language-to-formal-mathematics as an AI translation problem you'd want lots of training data. but we have very little training data. so how would you even get started on the "NL2F" problem? with the formal-to-formal problem you can set it up as, Lean will be doing most of the heavy lifting but it's guided by some AI. which is necessary, I think, because the state of the art of pure AI solving math problems is like, it can't do long division.

Daniel Selsam (Sep 24 2020 at 15:14):

If somebody could win gold in I2I without internally solving it as F2F, that would be amazing, but I don't know anybody who thinks this is plausible. So if we are all attacking it as F2F, I don't see value in throwing on a preprocessing component that has nothing to do with mathematical problem solving. Also, I think F2F is more useful than I2I anyway -- in the limit it is much easier to write a formal statement than to manually verify an informal proof.

Kevin Lacker (Sep 24 2020 at 15:17):

it would certainly be handy to have a nl2f converter - imagine throwing in pdfs and it outputs a pull request into mathlib that formalizes the paper. but it seems like a logically separate problem from solving a math problem in a f2f way

Daniel Selsam (Sep 24 2020 at 15:18):

NL2F needs to be trusted though -- human still needs to manually verify the F

David Shin (Sep 24 2020 at 15:23):

@Daniel Selsam To be clear, I am not suggesting that we try to solve internally in a non-F2F manner. I agree that the core solving needs to be done F2F. Just asking whether someone should work on the automated NL2F-translation problem. At the very least, it would help the F2F effort by helping us automatically generate F2F training data from the large corpus of NL-format problems that exists out there. As Kevin Lacker points out, it is a logically separate problem.

Kevin Lacker (Sep 24 2020 at 15:23):

@David Shin how would you get started working on it?

David Shin (Sep 24 2020 at 15:26):

@Kevin Lacker I really don't have many concrete ideas. It is certainly out of my domain of expertise. But perhaps people at Google/Amazon/etc. that work on technologies like Alexa and Siri might have ideas that could work as starting points.

Mario Carneiro (Sep 24 2020 at 15:30):

IIRC Christian Szegedy at google is looking at NL2F for this kind of thing, partially because there has already been some good progress with GPT3 and also because the corpus of NL mathematics is much larger than all formal libraries combined

David Shin (Sep 24 2020 at 17:07):

@Kevin Lacker Perhaps this is how I might start. I would focus on a problem genre whose problem statements tend to be formulaic. Geometry comes to mind. A sentence in a problem statement like, "Let O be the circumcenter of triangle ABC" - I can imagine a carefully hand-crafted rule-based parser that translates this by testing the sentence against a fixed set of candidate grammatical structures (e.g., "Let ... be ..."). An army of humans can gradually build up this rule-based parser, building up a corpus of simple NL statements with corresponding formal translations. Maybe GPT3 could somehow do this more elegantly, but the space of grammatical constructs used in English geometry problems does not seem to me that large, making the manual approach perhaps feasible.

To verify translations, a tool to identify wrong translations would be helpful. To this end, I would similarly develop a reverse translator that aims to reconstruct the NL representation from the lean representation. With this pair of translators, you have a sort of autoencoder that can be used to do a consistency check: f2nl(nl2f(text)) == text. Of course, there can be multiple valid translations in either direction, causing this equality to fail even for a valid pair of translators. But maybe you can strive to make "idempotent" translators, with the property that for example nl2f(text) == nl2f(f2nl(nl2f(text))). Passing this check does not guarantee translation correctness, but can identify possible incorrectness, which can guide iterative translator development.

This could perhaps generate a corpus of pseudo-validated (NL, F) translation pairs. If this corpus becomes big enough then maybe this can bootstrap other techniques like something driven by GPT3.

Daniel Selsam (Sep 24 2020 at 18:16):

@David Shin Perhaps this thread should migrate to the ML4TP stream https://leanprover.zulipchat.com/#narrow/stream/219941-Machine-Learning.20for.20Theorem.20Proving

Joseph Myers (Sep 24 2020 at 20:38):

@David Shin It's perfectly valid to cite Muirhead's inequality in an IMO proof. But a problem that's too easy with Muirhead's inequality has been unlikely to be selected for the IMO since about 15 years ago when students became too familiar with Muirhead.

David Shin (Sep 24 2020 at 21:21):

@Joseph Myers Good to know! I remember our coaches told us that was Muirhead's was off-limits, at least back in 2001 when I participated. This training material dated from 2005 matches the advice I was given: "Muirhead’s theorem...is generally not favorably regarded as part of a formal olympiad solution...whenever possible, you should use Muirhead’s inequality only to deduce the correct relationship and then explicitly write all of the necessary applications of AM-GM..." (https://artofproblemsolving.com/articles/files/MildorfInequalities.pdf)

With that said, I think my point still stands that there is an element of domain-specificity to the problem I pose of constructing a minimal human-judge-acceptable F2NL translation of a formal proof. Such a translation can be thought of as a summary of a formal proof, with steps omitted if a reasonable human judge can be expected to trivially fill in the gaps. An understanding of the judge's expectations can be thought of as a domain-specific detail.

Joseph Myers (Sep 24 2020 at 22:38):

Some people may avoid Muirhead as a matter of good taste, trying to avoid excessively powerful theorems. I think the IMO Grand Challenge should allow the whole of mathlib to be used.

Joseph Myers (Sep 24 2020 at 22:42):

It's acceptable, if not in the best of taste, for olympiad students to use standard results they have no hope of proving. Problems that require use of Dirichlet's theorem might be a bit controversial for the IMO, but are acceptable for the Romanian Master of Mathematics. Sometimes students may use Mihăilescu's theorem (Catalan's conjecture) in an olympiad solution, or for that matter Fermat's last theorem, though olympiad problems are very unlikely to depend on either of those.

Last updated: May 02 2025 at 03:31 UTC

leanprover-community / mathlib