Zulip Chat Archive
Stream: IMO-grand-challenge
Topic: proof of concept solution grader
David Renshaw (Nov 07 2023 at 01:05):
Hello!
Today I added a checkSolution
executable to my compfiles repo. (The repo was until recently named "math-puzzles-in-lean-4".)
With this addition, we now have an end-to-end demonstration of what the inputs and outputs of an IMO Grand Challenge competition might look like.
To start, we extract spoiler-free problem files:
$ lake exec extractProblems
Then the contents of the _extracted/problems/
directory are Lean files ready to be passed as input to an IMO Grand Challenge solver.
For example, this is _extracted/problems/Imo2022P2.lean
:
import Mathlib.Data.Real.Basic
import Mathlib.Algebra.Order.Positive.Field
import Mathlib.Tactic
/-!
# International Mathematical Olympiad 2022, Problem 2
Let ℝ+ be the set of positive real numbers.
Determine all functions f: ℝ+ → ℝ+ such that
for each x ∈ ℝ+, there is exactly one y ∈ ℝ+
satisfying
x·f(y) + y·f(x) ≤ 2
-/
namespace Imo2022P2
abbrev PosReal : Type := { x : ℝ // 0 < x }
notation "ℝ+" => PosReal
/- determine -/ abbrev solution_set : Set (ℝ+ → ℝ+) := sorry
theorem imo2022_p2 (f : ℝ+ → ℝ+) :
f ∈ solution_set ↔
∀ x, ∃! y, x * f y + y * f x ≤ ⟨2, two_pos⟩ := sorry
A solver's job is to accept a file like this and to fill in the sorry
s (and possibly add other declarations and Mathlib imports). The declaration marked determine
is special, in that the term filled in there will be judged by a human grader.
The solver gives us back a solution.lean
, and we pass that to checkSolution
like this:
$ lake exe checkSolution Imo2022P2 solution.lean
* compiling problem into olean ...
_check/Imo2022P2.lean:22:23: warning: declaration uses 'sorry'
_check/Imo2022P2.lean:24:8: warning: declaration uses 'sorry'
* compiling solution into olean ...
* verifying types and axioms ...
* replaying environment ...
* collecting any 'determine' declarations ...
determine Imo2022P2.solution_set := {fun x => 1 / x}
* verified!
The checker verifies that each declaration of the problem has a corresponding declaration in the solution, with a definitionally equal type.
After the checker runs, all that's left is for the human grader to verify that the determine
term looks good, and here {fun x => 1 / x}
indeed does look good.
Does this all sound reasonable? I'm interested in any feedback.
Last updated: Dec 20 2023 at 11:08 UTC