Zulip Chat Archive
Stream: Lean for teaching
Topic: Game about finishing tactics
Oliver Dressler (Feb 21 2025 at 18:40):
I would like to make a game about finishing tactics, optimally in lean4game.
A simple game loop could be:
- Display goal
- Player picks finishing tactic
- Give or take points
This is a sketch of the game loop: https://oli.show/final-move/.
Is implementing this loop possible in lean4game?
Or maybe a simpler version:
Could lean4game allow only one attempt for each level?
Martin Dvořák (Feb 22 2025 at 10:04):
Very interesting idea! Are the levels automatically generated from existing proofs? My issue is that the states are not as readable as I would like them to be.
Oliver Dressler (Feb 22 2025 at 11:10):
Yes, in the sketch, the levels are extracted from mathlib 4.15 a priori and then randomly selected at runtime.
I agree that the visualization is quite basic, just a syntax-highlighted goal string for now.
Kevin Buzzard (Feb 22 2025 at 13:53):
I find it hard to figure out if I succeeded or failed. I click and then something flashes up quickly and then disappears. I guess I succeeded iff the puzzle changes? But that's not very intuitive (in theory I could get things wrong and be asked another question anyway). Can it be clearer whether I got it right or wrong?
Kevin Buzzard (Feb 22 2025 at 13:54):
But the concept is brilliant!
Kevin Buzzard (Feb 22 2025 at 13:55):
If the goal is displayed as 2 > 0 then I try norm_num
and it fails :-( Maybe these are weird 2's and 0's? Can you make it use verso so I can see the types? :-)
Kevin Buzzard (Feb 22 2025 at 13:57):
(I sometimes feel cheated because I can see the goal and if the terms in it were of the type I'm guessing then a tactic will work, and I try the tactic and it fails, so I got it wrong because of information which was withheld from me)
Oliver Dressler (Feb 22 2025 at 14:13):
I agree, the notification of success/fail is too short. Also in this sketch there is a bug, if you click too quickly, it doesn't show the notification at all.
You are correct, that the game as it is now, is not correct. Just to test I extracted the goals and the tactic used to close it. But often multiple tactics would work, not only the one used in mathlib. So you are definitely being cheated.
If it is possible in lean4game or lean in general, then this would be solved by actually closing the goal. If stays as this version, then I would have to check every tactic at every goal properly. This would also open an interesting option for gamification: Different "prices" for use of tactics. Omega is more expensive rfl...
Kevin Buzzard (Feb 22 2025 at 16:59):
Well I'm certainly learning new things! Which tactic solves
⊢ c ⨳ a ♯ 0 ⨳ a ≤ 0 ⨳ b ♯ c ⨳ a
I wonder!
Patrick Massot (Feb 22 2025 at 18:12):
ac_rfl
?
François G. Dorais (Feb 22 2025 at 22:56):
I think you need ac_rotfl
for inequalities :smile:
Oliver Dressler (Mar 24 2025 at 11:43):
I made some progress on this project, there is a new version of the game available under:
If you see an error, try opening the page in incognito mode to do a full refresh.
Key changes:
- The game is now fair, I have brute-forced all 16 tactics at all goals.
- There is now a link to the source in mathlib.
Next up:
- Improved visuals: Game in general, proof state, tooltips, better notifications, ...
- Mobile-friendly design.
- Balancing and some kind of win condition.
For those interested, here are some details on brute-forcing the tactics:
Mathlib: v4.18.0-rc1
Checked 5754 goals in 1265 files with 16 tactics: 86295 checks
Tactic | Solved | Uniquely solved by this tactic
aesop | 2558 | 382
simp | 1210 | 34
linarith | 1137 | 169
tauto | 1104 | 67
norm_num | 913 | 34
ring | 893 | 306
omega | 786 | 294
congr | 648 | 46
infer_instance | 626 | 443
positivity | 551 | 257
abel | 430 | 186
gcongr | 360 | 161
decide | 253 | 80
assumption | 240 | 10
fun_prop | 237 | 224
contradiction | 177 | 23
TOTAL | 12123 | 2716
This program is based on the LSP (leanclient, I used this project as a real-world test case) and runs in about 8.5h on my machine.
Any feedback is welcome!
Isak Colboubrani (Mar 25 2025 at 08:59):
Those brute force statistics are fascinating—thanks for sharing! The “endgame game” also looks great.
Would you be willing to share the brute forcing program, or the output of it (say where the lemma/ tactic combinations are listed)? I’m working on a project that involves gathering examples demonstrating the “unique” capabilities of various tactics (see thread #new members > Examples highlighting "unique" capabilities of tactics).
I’m currently missing two specific test cases that, based on your brute force approach, you seem to have uncovered. I’d find these especially interesting to examine:
- Test cases uniquely solved by
congr
. - Test cases uniquely solved by
abel
.
I’m also missing unique test cases for a few other tactics, but I suspect these might not exist because some tactics are strict subsets of others with regard to capabilities. In particular:
- I don't think it is possible to create a (zero-argument)
simp
test case that can’t be solved bysimp_all
. - I don't think it is possible to create a
linarith
test case that can’t be solved bynlinarith
. - I don't think it is possible to create a
assumption
orcontradiction
test case that can’t be solved bytrivial
.
Luigi Massacci (Mar 25 2025 at 09:27):
Indeed trivial
is just a wrapper that tries a bunch of other finishing tactics, including contradiction
and assumption
Isak Colboubrani (Mar 25 2025 at 10:00):
Yes, simp
, assumption
, rfl
, contradiction
, decide
, apply True.intro
, and apply And.intro <;> trivial
to be precise.
Aaron Liu (Mar 25 2025 at 15:11):
Isak Colboubrani said:
- I don't think it is possible to create a (zero-argument)
simp
test case that can’t be solved bysimp_all
.
You can "poison" simp_all
like this:
import Mathlib
example {X Y : Type} [Monoid X] [Monoid Y] (f : X →* Y) {x y : X} (hxy : x = x * y) :
f (x * y * x) = f x * f y * f x := by
success_if_fail_with_msg
"tactic 'simp' failed, nested error:
maximum recursion depth has been reached
use `set_option maxRecDepth <num>` to increase limit
use `set_option diagnostics true` to get diagnostic information"
simp_all
simp
Aaron Liu (Mar 25 2025 at 15:24):
You can similarly poison trivial
import Mathlib
example {X : Type} {x y : X} (hirrel : Nat.fib 300000 = 0) (hxy : x = y) :
x = y := by
/-
maximum recursion depth has been reached
use `set_option maxRecDepth <num>` to increase limit
use `set_option diagnostics true` to get diagnostic information
-/
-- trivial
assumption
Oliver Dressler (Mar 25 2025 at 16:04):
Isak Colboubrani said:
Would you be willing to share the brute forcing program, or the output of it (say where the lemma/ tactic combinations are listed)?
Sure, I have just made the repo public but I won't find time soon to clean it up. Specially the data extraction part is still a bit of a battlefield.
Also this program could most likely be sped up massively by attempting a tactic at all locations in a file at once.
Oliver Dressler (Mar 25 2025 at 16:06):
The output JSON used in the game is somewhat of a summary, if a bit convoluted because of tiers in the game.
Oliver Dressler (Mar 25 2025 at 16:06):
I have pulled out the theorems you are interested in:
unique_solves.txt
Isak Colboubrani (Mar 25 2025 at 21:18):
@Aaron Liu That's a very clever trick! TIL!
Isak Colboubrani (Mar 25 2025 at 21:19):
The simp
case that is not solved by simp_all
is solved by at least bound
, simp_arith
, simp_intro
and norm_num
, so I'm afraid that test case is "non-unique" (unless all those tactics invoke simp
under the hood). Same for the assumption
case that is not solved by trivial
: it is solved by at least aesop
making it non-unique.
Isak Colboubrani (Mar 25 2025 at 21:34):
@Oliver Dressler Thanks a lot for sharing both the code and the output!
Isak Colboubrani (Mar 25 2025 at 21:40):
@Oliver Dressler I tried to extract one of the goals that is marked as uniquely solvable by congr
, and while it is not solved by gcongr
(which is cool) it does appear to be solvable by aesop
(which is one of the tactics you're testing with as I understand it). Could it be that I'm failing to reproduce the goal exactly? I used extract_goal
.
#mwe for the attempt at reproducing the state at https://github.com/leanprover-community/mathlib4/blob/db4c8abb9243cf1b6c77c8e544c7c851495af16a/Mathlib/Logic/Function/Basic.lean#L645:
import Mathlib
open Classical
noncomputable def extend' {α β γ} (f : α → β) (g : α → γ) (j : β → γ) : β → γ := fun b ↦
if h : ∃ a, f a = b then g (Classical.choose h) else j b
example {α β γ} (f : α → β) (g : α → γ) (e' : β → γ) (b : β) [Decidable (∃ a, f a = b)] :
extend' f g e' b = if h : ∃ a, f a = b then g (Classical.choose h) else e' b := by
unfold extend'
-- congr -- works (as expected)
-- gcongr -- does not work (as expected)
aesop -- works (not expected)
Isak Colboubrani (Mar 25 2025 at 22:02):
I think I managed to extract one of the goals that is marked as uniquely solvable by abel
.
Here is the #mwe:
import Mathlib
example {α} [LinearOrderedRing α] [FloorRing α] (a : α) (c : ℕ) :
Int.fract a * ↑c + Int.fract a - Int.fract (a * ↑c + a) =
Int.fract a * ↑c - Int.fract (a * ↑c) - (Int.fract (a * ↑c + a) - Int.fract (a * ↑c) - Int.fract a) := by
abel
-- noncomm_ring -- works too (but not tested by script)
-- module -- works too (but not tested by script)
-- ring -- does not work (tested by script)
Oliver Dressler (Mar 26 2025 at 05:32):
Isak Colboubrani said:
Oliver Dressler I tried to extract one of the goals that is marked as uniquely solvable by
congr
, and while it is not solved bygcongr
(which is cool) it does appear to be solvable byaesop
(which is one of the tactics you're testing with as I understand it). Could it be that I'm failing to reproduce the goal exactly? I usedextract_goal
.
No, this is on me. It is missing imports --> "unknown tactic" --> picked up as invalid tactic by the script.
This means lots of false negatives in the dataset.
Does anyone know of a good way to insert all necessary imports to run these 16 tactics into 1200 files in mathlib without manually fixing circular imports?
Isak Colboubrani (Apr 08 2025 at 18:56):
@Oliver Dressler Have you been able to resolve this issue? If you fix your script, I'm really excited to review the updated results—especially those related to "Uniquely solved by this tactic".
Last updated: May 02 2025 at 03:31 UTC