Zulip Chat Archive

Stream: IMO-grand-challenge

Topic: Formalization of IMO 2025 P6 (the Matilda problem)

lean-tom (Feb 14 2026 at 14:18):

Hi everyone,

I’m excited to share a complete formalization of IMO 2025 Problem 6. To my knowledge, this is the first sorry-free solution for this problem in Lean 4.

The proof spans about 4,700 lines. My approach focused on:

Lower bound: A set-theoretic topological partitioning of the grid, avoiding complex analytic boundary casework by using unions/intersections of discrete intervals, combined with the Erdős–Szekeres Theorem. I engineered a "topological pivot" region to identify required properties within discrete set intervals, avoiding the grueling analytic coordinate casework often associated with boundary conditions.
Upper bound: An explicit fiber-based construction using integer forms and linear congruences to ensure the tiles cover the grid correctly for the $n=k^2$ case.

This was a high-intensity collaboration between human intuition and AI assistance. I acted as the lead architect, rigorously decomposing the problem into a granular series of lemmas, which were then implemented and cross-checked with the assistance of Gemini 3 Pro. This workflow allowed for a rapid yet verified formalization of this challenging combinatorics problem.

The code is fully compatible with Lean v4.28.0-rc1 and passes all CI checks. I have submitted a PR to compfiles:
https://github.com/dwrensha/compfiles/pull/166

Given the current length (~4,700 lines), I plan to golf and refactor the code to improve readability and eventually create a Lean Blueprint to visualize the proof's structure in the near future.

I would be very grateful for any feedback from this amazing community. Thank you!

Snir Broshi (Feb 14 2026 at 15:39):

How many lines do you estimate an idiomatic solution to the problem will have?

Kevin Buzzard (Feb 14 2026 at 15:45):

Well done for achieving this. But in general you'll find that the community here is not particularly interested in giving feedback on AI-generated code. The generic feedback is "yes we know AI can write lean code and it's typically very unidiomatic and about 5 to 10 times as long as a proof written by a human expert".

Joseph Myers (Feb 14 2026 at 16:11):

The statement is definitely a lot longer than necessary; it's longer than the statement I wrote in IMOLean, which itself is longer than necessary because I wasn't away of docs#NonemptyInterval when I wrote it, and expressing the tiles with NonemptyInterval allows a very concise statement.

Joseph Myers (Feb 14 2026 at 16:19):

There are lots of different ways to solve this problem (you can find several posted on AoPS by someone on the PSC; unfortunately most years don't produce an official IMO solutions document such as I did for 2019 and 2024). I don't know the relative merits of the different solutions as regards complexity of formalization.

Mirek Olšák (Feb 14 2026 at 17:15):

Kevin Buzzard said:

it's typically very unidiomatic and about 5 to 10 times as long as a proof written by a human expert

On the other hand, the same can be said about a Lean proof by a human non-expert. To me, it is a bit hard to distinguish this.

I also don't feel very motivated reading code that even its author didn't understand properly , and understand that manually spending time trying to figure out how to make Lean satisfied without AI tools can make the author understand the code better. However

is struggling with Lean without AI tools an efficiently spend time?
does a beginner actually understand a proof when finishing some of the branches with chaotically trying out "simp", "grind", "simp!" until something gets through

David Renshaw (Feb 14 2026 at 17:18):

@lean-tom thanks for the contribution. I autogolfed it somewhat here: https://github.com/dwrensha/compfiles/commit/0cc1cecc554d86d10a2f081887e583a23ccd245a

David Renshaw (Feb 14 2026 at 17:19):

further simplifications welcome!

lean-tom (Feb 15 2026 at 08:16):

Hi everyone, thank you so much for the warm welcome and the invaluable feedback!

@David Renshaw Thank you for the autogolf and for merging the PR! I’m honored to have this contribution in compfiles. I will continue to work on refactoring the code to make it more readable and idiomatic, as suggested.

@Kevin Buzzard I am taking your point about the noise in AI-assisted code to heart. I’m using this merged version as a "scaffold" to learn the Mathlib way, and I will do my best to understand and address this noise as I refactor the proof. My goal is to manually "golf" it into something concise and readable, and I’m eager to understand how a human expert would have structured the logic. This is very much a learning project for me.

@Joseph Myers Thank you for the pointer to docs#NonemptyInterval. I’ve just refactored the ProblemSetup using it, and it already looks much cleaner and more concise! I can see how this abstraction will help eliminate a significant amount of manual coordinate casework and boundary-related lemmas as I refactor the rest of the proof.

@Snir Broshi It’s hard to estimate exactly as a Lean beginner, but given Kevin’s observation about the 5-10x factor, I suspect an idiomatic expert solution might land somewhere between 500 and 800 lines. My goal is to see how close I can get to that.

@Mirek Olšák Thank you for the support! You’ve hit on a key part of my workflow: using AI to bridge the initial "syntax gap" while I focused on the global logical architecture. Interestingly, the rigor required by Lean—facilitated by AI’s speed—led me to arrive at a set-theoretic topological approach to region partitioning. This was a deliberate strategic choice to bypass the messy analytic casework and instead capture the combinatorial essence of the Erdős–Szekeres application within discrete intervals. While experts might find such an approach redundant, it was a perspective I arrived at because Lean forced me to find a more robust logical structure than my initial pen-and-paper intuition. I’m still learning whether this is the most efficient path, but this formalization process itself has already deeply reshaped my understanding of the problem.

lean-tom (Feb 25 2026 at 17:24):

@David Renshaw Thank you so much for the additional refactoring! I’m embarrassed to admit that while you were golfing my code from 4,700 to 3,700 lines (and significantly collapsing the exhaustive casework I had initially expanded to around 50 cases), I was so buried in studying Mathlib's design patterns and Leonardo's blog that I haven't even caught up with your latest changes yet.
Over the past two weeks, inspired by Joseph Myers’ code and the feedback regarding redundancy or noise, I’ve been obsessed with finding a "sparsifying" backbone for this proof.
I’m currently designing a roadmap to refactor the Sufficiency (Upper Bound) part using a bivariate polynomial approach in Z[X, Y]. My goal is to prove the tiling’s validity through algebraic identities.

The Roadmap:

Map each cell (i,j) to the monomial X^i * Y^j.
Represent the entire grid and each rectangular tile as polynomials in Z[X, Y].
Utilize the property that a tiling corresponds to a direct sum decomposition, which translates to a polynomial sum identity.
Use geometric series to represent the tiles and verify the partition algebraically using the ring tactic.
While this strategy might seem elementary or inefficient to the experts here, I want to ensure I’m building on the right Mathlib's direction.
Questions:

Are there specific files or lemmas in Mathlib regarding MvPolynomial that would be particularly effective for this kind of tiling problem?
Is there a preferred "idiomatic" way to bridge the gap between a Finset of coordinates and its polynomial representation?
I’d appreciate any guidance on whether this "sparsifying" direction is aligned with Mathlib's standards, or if there’s a more direct way I should be looking at. Thank you!

Alex Meiburg (Feb 25 2026 at 17:38):

What do you expect to gain from an MvPolynomial formulation? Why do you want to make this change? In general, it's bringing in a lot of extra "machinery" that feels irrelevant. So, probably, all the facts that you need about MvPolynomial are better expressed in some simpler structure.

lean-tom (Feb 26 2026 at 12:04):

@Alex Meiburg Thank you for the sharp observation. I see now that bringing in the full "machinery" of Mvpolynomial would indeed be overengineering for this problem. After reflecting on your comment, I realized that I don't even need polynomials and geometric series formulas. The core algebraic idea I was aiming for can be expressed much more simply: by representing the grid cells via indicator functions and expressing each rectangle via differences, leading to telescoping sum for the whole tiling.

My goal isn't to use ring theory, but simply to translate the geometric casework into a verifiable algebraic calculation.
Motivation for the change: My initial thought was to find a more "generalizable" approach that could handle complex tilings on Z-grids (beyond just n×n squares) by converting geometric shapes into algebraic calculation. I wanted to avoid the extensive coordinate-based casework.

I’m still learning where the line is between "elegant generalization" and "unnecessary machinery."
Thank you again for helping me find a simpler path!

Last updated: Feb 28 2026 at 14:05 UTC