Zulip Chat Archive

Stream: Lean for teaching

Topic: Game about finishing tactics

Oliver Dressler (Feb 21 2025 at 18:40):

I would like to make a game about finishing tactics, optimally in lean4game.

A simple game loop could be:

Display goal
Player picks finishing tactic
Give or take points

This is a sketch of the game loop: https://oli.show/final-move/.

Is implementing this loop possible in lean4game?

Or maybe a simpler version:
Could lean4game allow only one attempt for each level?

Martin Dvořák (Feb 22 2025 at 10:04):

Very interesting idea! Are the levels automatically generated from existing proofs? My issue is that the states are not as readable as I would like them to be.

Oliver Dressler (Feb 22 2025 at 11:10):

Yes, in the sketch, the levels are extracted from mathlib 4.15 a priori and then randomly selected at runtime.

I agree that the visualization is quite basic, just a syntax-highlighted goal string for now.

Kevin Buzzard (Feb 22 2025 at 13:53):

I find it hard to figure out if I succeeded or failed. I click and then something flashes up quickly and then disappears. I guess I succeeded iff the puzzle changes? But that's not very intuitive (in theory I could get things wrong and be asked another question anyway). Can it be clearer whether I got it right or wrong?

Kevin Buzzard (Feb 22 2025 at 13:54):

But the concept is brilliant!

Kevin Buzzard (Feb 22 2025 at 13:55):

If the goal is displayed as 2 > 0 then I try norm_num and it fails :-( Maybe these are weird 2's and 0's? Can you make it use verso so I can see the types? :-)

Kevin Buzzard (Feb 22 2025 at 13:57):

(I sometimes feel cheated because I can see the goal and if the terms in it were of the type I'm guessing then a tactic will work, and I try the tactic and it fails, so I got it wrong because of information which was withheld from me)

Oliver Dressler (Feb 22 2025 at 14:13):

I agree, the notification of success/fail is too short. Also in this sketch there is a bug, if you click too quickly, it doesn't show the notification at all.

You are correct, that the game as it is now, is not correct. Just to test I extracted the goals and the tactic used to close it. But often multiple tactics would work, not only the one used in mathlib. So you are definitely being cheated.

If it is possible in lean4game or lean in general, then this would be solved by actually closing the goal. If stays as this version, then I would have to check every tactic at every goal properly. This would also open an interesting option for gamification: Different "prices" for use of tactics. Omega is more expensive rfl...

Kevin Buzzard (Feb 22 2025 at 16:59):

Well I'm certainly learning new things! Which tactic solves

⊢ c ⨳ a ♯ 0 ⨳ a ≤ 0 ⨳ b ♯ c ⨳ a

I wonder!

Patrick Massot (Feb 22 2025 at 18:12):

ac_rfl?

François G. Dorais (Feb 22 2025 at 22:56):

I think you need ac_rotfl for inequalities :smile:

Oliver Dressler (Mar 24 2025 at 11:43):

I made some progress on this project, there is a new version of the game available under:

https://oli.show/final-move

If you see an error, try opening the page in incognito mode to do a full refresh.

Key changes:

The game is now fair, I have brute-forced all 16 tactics at all goals.
There is now a link to the source in mathlib.

Next up:

Improved visuals: Game in general, proof state, tooltips, better notifications, ...
Mobile-friendly design.
Balancing and some kind of win condition.

For those interested, here are some details on brute-forcing the tactics:

Mathlib: v4.18.0-rc1

Checked 5754 goals in 1265 files with 16 tactics: 86295 checks

Tactic           | Solved | Uniquely solved by this tactic
aesop            | 2558   | 382
simp             | 1210   | 34
linarith         | 1137   | 169
tauto            | 1104   | 67
norm_num         | 913    | 34
ring             | 893    | 306
omega            | 786    | 294
congr            | 648    | 46
infer_instance   | 626    | 443
positivity       | 551    | 257
abel             | 430    | 186
gcongr           | 360    | 161
decide           | 253    | 80
assumption       | 240    | 10
fun_prop         | 237    | 224
contradiction    | 177    | 23
TOTAL            | 12123  | 2716

This program is based on the LSP (leanclient, I used this project as a real-world test case) and runs in about 8.5h on my machine.

Any feedback is welcome!

Isak Colboubrani (Mar 25 2025 at 08:59):

Those brute force statistics are fascinating—thanks for sharing! The “endgame game” also looks great.

Would you be willing to share the brute forcing program, or the output of it (say where the lemma/ tactic combinations are listed)? I’m working on a project that involves gathering examples demonstrating the “unique” capabilities of various tactics (see thread #new members > Examples highlighting "unique" capabilities of tactics).

I’m currently missing two specific test cases that, based on your brute force approach, you seem to have uncovered. I’d find these especially interesting to examine:

Test cases uniquely solved by congr.
Test cases uniquely solved by abel.

I’m also missing unique test cases for a few other tactics, but I suspect these might not exist because some tactics are strict subsets of others with regard to capabilities. In particular:

I don't think it is possible to create a (zero-argument) simp test case that can’t be solved by simp_all.
I don't think it is possible to create a linarith test case that can’t be solved by nlinarith.
I don't think it is possible to create a assumption or contradiction test case that can’t be solved by trivial.

Luigi Massacci (Mar 25 2025 at 09:27):

Indeed trivial is just a wrapper that tries a bunch of other finishing tactics, including contradiction and assumption

Isak Colboubrani (Mar 25 2025 at 10:00):

Yes, simp, assumption, rfl, contradiction, decide, apply True.intro, and apply And.intro <;> trivial to be precise.

Aaron Liu (Mar 25 2025 at 15:11):

Isak Colboubrani said:

I don't think it is possible to create a (zero-argument) simp test case that can’t be solved by simp_all.

You can "poison" simp_all like this:

import Mathlib

example {X Y : Type} [Monoid X] [Monoid Y] (f : X →* Y) {x y : X} (hxy : x = x * y) :
    f (x * y * x) = f x * f y * f x := by
  success_if_fail_with_msg
"tactic 'simp' failed, nested error:
maximum recursion depth has been reached
use `set_option maxRecDepth <num>` to increase limit
use `set_option diagnostics true` to get diagnostic information"
    simp_all
  simp

Aaron Liu (Mar 25 2025 at 15:24):

You can similarly poison trivial

import Mathlib

example {X : Type} {x y : X} (hirrel : Nat.fib 300000 = 0) (hxy : x = y) :
    x = y := by
  /-
    maximum recursion depth has been reached
    use `set_option maxRecDepth <num>` to increase limit
    use `set_option diagnostics true` to get diagnostic information
  -/
  -- trivial
  assumption

Oliver Dressler (Mar 25 2025 at 16:04):

Isak Colboubrani said:

Would you be willing to share the brute forcing program, or the output of it (say where the lemma/ tactic combinations are listed)?

Sure, I have just made the repo public but I won't find time soon to clean it up. Specially the data extraction part is still a bit of a battlefield.
Also this program could most likely be sped up massively by attempting a tactic at all locations in a file at once.

Oliver Dressler (Mar 25 2025 at 16:06):

The output JSON used in the game is somewhat of a summary, if a bit convoluted because of tiers in the game.

Oliver Dressler (Mar 25 2025 at 16:06):

I have pulled out the theorems you are interested in:
unique_solves.txt

Isak Colboubrani (Mar 25 2025 at 21:18):

@Aaron Liu That's a very clever trick! TIL!

Isak Colboubrani (Mar 25 2025 at 21:19):

The simp case that is not solved by simp_all is solved by at least bound, simp_arith, simp_intro and norm_num, so I'm afraid that test case is "non-unique" (unless all those tactics invoke simp under the hood). Same for the assumption case that is not solved by trivial: it is solved by at least aesop making it non-unique.

Isak Colboubrani (Mar 25 2025 at 21:34):

@Oliver Dressler Thanks a lot for sharing both the code and the output!

Isak Colboubrani (Mar 25 2025 at 21:40):

@Oliver Dressler I tried to extract one of the goals that is marked as uniquely solvable by congr, and while it is not solved by gcongr (which is cool) it does appear to be solvable by aesop (which is one of the tactics you're testing with as I understand it). Could it be that I'm failing to reproduce the goal exactly? I used extract_goal.

#mwe for the attempt at reproducing the state at https://github.com/leanprover-community/mathlib4/blob/db4c8abb9243cf1b6c77c8e544c7c851495af16a/Mathlib/Logic/Function/Basic.lean#L645:

import Mathlib
open Classical
noncomputable def extend' {α β γ} (f : α → β) (g : α → γ) (j : β → γ) : β → γ := fun b ↦
  if h : ∃ a, f a = b then g (Classical.choose h) else j b
example {α β γ} (f : α → β) (g : α → γ) (e' : β → γ) (b : β) [Decidable (∃ a, f a = b)] :
    extend' f g e' b = if h : ∃ a, f a = b then g (Classical.choose h) else e' b := by
  unfold extend'
  -- congr -- works (as expected)
  -- gcongr -- does not work (as expected)
  aesop -- works (not expected)

Isak Colboubrani (Mar 25 2025 at 22:02):

I think I managed to extract one of the goals that is marked as uniquely solvable by abel.

Here is the #mwe:

import Mathlib
example {α} [LinearOrderedRing α] [FloorRing α] (a : α) (c : ℕ) :
  Int.fract a * ↑c + Int.fract a - Int.fract (a * ↑c + a) =
  Int.fract a * ↑c - Int.fract (a * ↑c) - (Int.fract (a * ↑c + a) - Int.fract (a * ↑c) - Int.fract a) := by
  abel
  -- noncomm_ring -- works too (but not tested by script)
  -- module -- works too (but not tested by script)
  -- ring -- does not work (tested by script)

Oliver Dressler (Mar 26 2025 at 05:32):

Isak Colboubrani said:

Oliver Dressler I tried to extract one of the goals that is marked as uniquely solvable by congr, and while it is not solved by gcongr (which is cool) it does appear to be solvable by aesop (which is one of the tactics you're testing with as I understand it). Could it be that I'm failing to reproduce the goal exactly? I used extract_goal.

No, this is on me. It is missing imports --> "unknown tactic" --> picked up as invalid tactic by the script.
This means lots of false negatives in the dataset.

Does anyone know of a good way to insert all necessary imports to run these 16 tactics into 1200 files in mathlib without manually fixing circular imports?

Isak Colboubrani (Apr 08 2025 at 18:56):

@Oliver Dressler Have you been able to resolve this issue? If you fix your script, I'm really excited to review the updated results—especially those related to "Uniquely solved by this tactic".

Last updated: May 02 2025 at 03:31 UTC