## Stream: Machine Learning for Theorem Proving

### Topic: IMO Grand Challenge

#### Jesse Michael Han (Sep 16 2020 at 14:42):

(creating a new topic here since AITP participants might not be subscribed to the IMO-gc channel)

(blog post about building IMO geometry solvers for reference)

Patrick Massot said:

I have a question about geometry in the IMO challenge that neither watching the talk nor skimming through the blog post answers. Do you actually have Lean code formalizing geometry problem? Did you formalize Euclidean geometry in Lean 4?

our current solvers are written in Haskell, and operate in a fragment of Tarski's elementary geometry (conservatively expanded with definable function symbols). they are proof producing, but do not target Lean4 proof terms.

in principle, everything we have done could be implemented in Lean4, but Haskell was a convenient choice for rapid prototyping

#### Patrick Massot (Sep 16 2020 at 14:47):

I see. Let's hope your last sentence will soon sound very strange :four_leaf_clover: .

#### Daniel Selsam (Sep 16 2020 at 14:51):

our current solvers are written in Haskell, and operate in a fragment of Tarski's elementary geometry (conservatively expanded with definable function symbols). they are proof producing, but do not target Lean4 proof terms.

I'll add that we used many high-level lemmas without proof, and relied on numerical diagrams for NDGs such as distinctness of points and non-collinearity. The goal was to experiment with different solver architectures efficiently.

#### Patrick Massot (Sep 16 2020 at 14:52):

What does NDGs means?

#### Daniel Selsam (Sep 16 2020 at 14:53):

It stands for "non-degeneracy conditions".

Ok, thanks.

#### Jesse Michael Han (Sep 16 2020 at 15:00):

to elaborate a little: often, NDGs like exactly which points are distinct, what circles are not trivial, etc. are not explicit in the official problem statements, and are sometimes not even provable from the rest of the problem; we approximated human intuition for these kinds of assumptions by synthesizing a diagram for the problem and reading NDGs off of them

#### Johan Commelin (Sep 16 2020 at 16:23):

@Jesse Michael Han

(it took a team of mathematicians months to formally construct a single perfectoid space in Lean)
well... we actually only have the empty perfectoid space. And I think that it took Kevin < 3hrs (but of course it took us months to define its type).

#### Kevin Buzzard (Sep 16 2020 at 16:34):

I was expecting it to be rfl

#### Kevin Buzzard (Sep 16 2020 at 16:35):

but it turned out I had to prove that an arbitrary projective limit of zero topological rings was bi-continuously isomorphic to the zero topological ring

#### Joseph Myers (Sep 16 2020 at 19:56):

In an IMO problem, "triangle" implies nondegenerate (and the mathlib definition also requires that); "circle" implies positive radius (but I'm less sure whether that should be included in a bundled euclidean_geometry.sphere definition, as more lemmas may well work without that condition and it may also be useful to construct such a bundled object from centre plus radius without proving the radius positive at that point).

#### Joseph Myers (Sep 16 2020 at 19:59):

Some nondegeneracy conditions are the appropriate formal interpretation of words such as "between" or "again" in the problem statement. Other conditions that don't come from such explicit words in the problem statement aren't generally safe to assume. For example, IMO 2019 problem 2 has some solutions that involve looking that the point of intersection of two lines that might be parallel, and the mark scheme deducted marks if you did that without handling the special case where those lines were parallel.

#### Joseph Myers (Sep 16 2020 at 20:03):

It wouldn't surprise me if some older problems (from before leaders could readily play with moving points around in geometry software to see what configurations can occur), especially older shortlist problems, do fail to mention such conditions at all, however. (Cf. IMO 1994 shortlist problem G2, which was false.)

#### Johan Commelin (Sep 17 2020 at 07:40):

@Jesse Michael Han You should build a web interface for that diagram generator! It looks awesome. (Could it run in WASM on a modern laptop, or is that asking too much?) Bonus points if you can add the tikz code next to the output.

#### Johan Commelin (Sep 17 2020 at 07:41):

Your blogpost is very interesting, btw

#### Joseph Myers (Sep 17 2020 at 11:40):

Daniel Selsam said:

our current solvers are written in Haskell, and operate in a fragment of Tarski's elementary geometry (conservatively expanded with definable function symbols). they are proof producing, but do not target Lean4 proof terms.

I'll add that we used many high-level lemmas without proof, and relied on numerical diagrams for NDGs such as distinctness of points and non-collinearity. The goal was to experiment with different solver architectures efficiently.

It sounds like you have a long list of geometrical definitions and lemmas found to be useful for formal synthetic geometry proofs. It would be good for that list to be available somewhere as a checklist for anyone expanding mathlib's support for Euclidean geometry. (Of course definitions and lemmas going in mathlib should be in suitably general form, possibly more general than you needed for your purposes. For example, results in geometry.euclidean are proved in general dimensions, for simplex not triangle, where applicable (even two-dimensional results take place in a two-dimensional subspace of a possibly larger space), and anything true more generally for affine spaces (or affine spaces over an ordered field, etc., but not depending on anything Euclidean) should go in linear_algebra.affine_space.)

#### Daniel Selsam (Sep 17 2020 at 16:24):

@Joseph Myers Responded here: https://leanprover.zulipchat.com/#narrow/stream/208328-IMO-grand-challenge/topic/Geometry/near/210413765

Last updated: May 09 2021 at 22:13 UTC