Zulip Chat Archive

Stream: IMO-grand-challenge

Topic: fill in the blank


David Renshaw (Sep 30 2023 at 22:57):

it is unfortunately still an open problem to figure out how to actually turn these kinds of "work out this number" question into a formal theorem statement without giving away information

FWIW, here is a Lean formalization of the problem that I banged out just now, with the "missing number" labelled via a comment:

import Mathlib.Data.Nat.Basic
import Mathlib.Tactic


/-!
# International Mathematical Olympiad 1981, Problem 6

Suppose that f : ℕ × ℕ → ℕ satisfies

 1) f (0, y) = y + 1
 2) f (x + 1, 0) = f (x, 1),
 3) f (x + 1, y + 1) = f (x, f (x + 1, y))

for all x y ∈ ℕ.

Determine f (4, 1981).

-/

namespace Imo1981Q6

/- fill_in_the_blank -/
abbrev solution_value :  := sorry

theorem Imo1981Q6 (f :     )
    (h1 :  y, f 0 y = y + 1)
    (h2 :  x, f (x + 1) 0 = f x 1)
    (h3 :  x y, f (x + 1) (y + 1) = f x (f (x + 1) y)) :
    f 4 1981 = solution_value := sorry

David Renshaw (Sep 30 2023 at 22:57):

see https://dwrensha.github.io/compfiles/problems/Compfiles.Imo1981P6.html, generated from https://github.com/dwrensha/math-puzzles-in-lean4/blob/af54de06124c7b294900a878061a02d3b474a406/MathPuzzles/Imo1981Q6.lean

David Renshaw (Oct 01 2023 at 02:52):

And the second one they claim to solve is IMO 1974 P5 but this is not in the dataset as far as I can see (I can't find it here https://github.com/openai/miniF2F/tree/main)

Here it is: https://github.com/openai/miniF2F/blob/4e433ff5cadff23f9911a2bb5bbab2d351ce5554/lean/src/valid.lean#L1602-L1608

David Renshaw (Oct 01 2023 at 02:53):

(and here's my more faithful formalization, requiring both directions of the implication: https://github.com/dwrensha/math-puzzles-in-lean4/blob/18ab1833dcc4c962e8d6191e58ed23b36d1269ac/MathPuzzles/Imo1974Q5.lean#L68-L76 )

Kevin Buzzard (Oct 01 2023 at 06:39):

So in both cases your solution to the problem is not "prove a theorem", it's "come up with some data yourself and then prove a theorem". Am I right thinking that in both cases the data can just be the question statement and the solution to the IMO problem can just basically be rfl though?

Siddhartha Gadgil (Oct 01 2023 at 10:18):

In the code pasted above, it seems as if the hypothesis is not enough to determine f, so the solution value cannot be defined to make this rfl.

David Renshaw (Oct 01 2023 at 11:03):

Am I right thinking that in both cases the data can just be the question statement and the solution to the IMO problem can just basically be rfl though?

In general, yes you are right. A fill_in_the_blank needs a human to judge whether a solution should be accepted or not: https://github.com/dwrensha/math-puzzles-in-lean4/blob/18ab1833dcc4c962e8d6191e58ed23b36d1269ac/MathPuzzles/Meta/ProblemExtraction.lean#L81-L88

David Renshaw (Oct 01 2023 at 11:21):

So, yes, completely automated scoring of IMO problems is still an open problem. But the amount of work that human scorers will need to do to judge fill_in_the_blank solutions should be pretty small, and I don't see it as a barrier to running the IMO Grand Challenge. (It might be more of a barrier to participants in the Grand Challenge, because having a human in the loop for a large training run could be expensive.)

Siddhartha Gadgil (Oct 01 2023 at 13:31):

At least when the answer is an integer, instead of having an arbitrary term to be filled in the fill_in_the_blankone can fill in to an elaborator accepting only numerical literals. Then the value can be checked. This can be extended a bit (for instance parsing rationals/floats as well).

Adam Topaz (Oct 01 2023 at 13:35):

Yeah I was about to say the same thing. I’m not familiar at all with IMO type problems. What other kinds of data appear in IMO problems?

Siddhartha Gadgil (Oct 01 2023 at 13:39):

Here is a crude implementation just to illustrate:

elab "natural:" n:num "only" : term => do
  return Expr.lit (Literal.natVal n.getNat)

#eval natural: 3 only -- 3

def fillIn :  := natural: 3 only

-- #eval natural: 3 + 4 only -- Error

Siddhartha Gadgil (Oct 01 2023 at 13:39):

I mean where 3 is filled in should be blank (or a placeholder).

David Renshaw (Oct 01 2023 at 13:48):

Here's a more interesting fill_in_the_blank from Romania 1998: https://github.com/dwrensha/math-puzzles-in-lean4/blob/18ab1833dcc4c962e8d6191e58ed23b36d1269ac/MathPuzzles/Romania1998Q12.lean#L558-L559

David Renshaw (Oct 01 2023 at 13:51):

instead of having an arbitrary term to be filled in the fill_in_the_blank, one can fill in to an elaborator accepting only numerical literals

Note that it's not actually feasible to write the solution data as a literal in Imo1981P6: it's (2^·)^[1984] 1 - 3, which is too big to evaluate.

David Renshaw (Oct 01 2023 at 13:52):

Also, if we want to have an extended discussion on this topic, we should probably stop hijacking the Lyra thread. :)

Kevin Buzzard (Oct 01 2023 at 15:03):

Adam Topaz said:

Yeah I was about to say the same thing. I’m not familiar at all with IMO type problems. What other kinds of data appear in IMO problems?

In the other example which the paper is claiming to solve, the actual question asks for a subset of the positive reals plus a proof that this is the range of a certain function. The answer is the open interval (1,2)(1,2) in this case. The formalisation gives this open interval explicitly in the question but doesn't even ask to prove that it's the range, it just asks the far far easier question to prove that it contains the range. I agree with David, let's stop hijacking this thread.

Notification Bot (Oct 02 2023 at 06:32):

16 messages were moved here from #Machine Learning for Theorem Proving > Lyra by Johan Commelin.

Johan Commelin (Oct 02 2023 at 06:33):

I think it's interesting to discuss how elaborators can help make these problem statements more precise. So I've moved the discussion to a new thread, so that we stop the hijacking but don't stop the discussion :grinning:

Johan Commelin (Oct 02 2023 at 06:35):

In the case of Nat, Int, Rat, I conjecture that "fill in the blank" can always (= 99% of the cases) be an expression made up of numerals and basic arithmetic operations: +, -, *, /, ^.

Scott Morrison (Oct 02 2023 at 08:50):

Publishing a few of these "find" questions with some clearly specified restriction on the "found" term would be great. Maybe it could even include enough examples to cover the possibilities:

  • "the human markers will check that ..."
  • "you must use this term elaborator ..."
  • "you must given a term of this inductive type representing a class of formulas ..."

Bolton Bailey (Oct 02 2023 at 12:57):

One way to get a sense of what the blanks look like is to look at one of the shortlists (such as this one) and see what form the values in the "Answers" subsections take.

Bolton Bailey (Oct 02 2023 at 13:13):

A quick summary of all such answers for the shortlist I linked:

fun k => k + 4
{ fun x => 1 / x }
{ 2, 3, 4 }
{ ( n + 1 ) / n | n \in \ZZ, n \ne 0 }
506
{ (n,k) : N x N // n \le k \le (3 n + 1) / 2 }
2271380
{ c : Fin n \to N // sum i * c i = n * (n + 1) / 2 (mod n) }
fun n => if exists i, n = 2^i then 1 else 2
3
fun n => 2 * n ^ 2 - 2 * n + 1
(2500, 7500)
1344
7
fun a d => floor (logb a d)
{ (2,2,2), (3,4,3) }

As you can see there are a few numeric literals, but there are also functions, and sets. Besides the basic arithmetic operations, the functions also seem to use if-then-else and inequalities.

Miroslav Olšák (Oct 02 2023 at 13:14):

There is also a collection I made some time ago: http://www.olsak.net/mirek/determine-answers.txt

David Renshaw (Oct 02 2023 at 13:38):

ah, I think I may rename my fill_in_the_blank to determine


Last updated: Dec 20 2023 at 11:08 UTC