Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Lean gym for Lean 4

Jason Rute (Jun 20 2021 at 03:27):

To avoid confusion in the previous thread #Machine Learning for Theorem Proving > Getting started with lean-gym, I'm splitting off the discussion about @Daniel Selsam's Lean 4 lean-gym prototype here.

Jason Rute (Jun 20 2021 at 03:27):

Continuing in that topic, I'm understanding the match, case, and induction syntax a little better. It seems that they are different from the tactic combinators like focus, try, and repeat. I'm not clear if this is a fundamental difference or just superficial. (For one, that ?i1 trick doesn't work for say focus or repeat, at least not without first wrapping the tag ?i1 in a refine.)

Jason Rute (Jun 20 2021 at 03:28):

Also, thinking about repeat and try, I think the following (superficial) example is interesting. Here I'm applying the same tactic block to both branches of a match. It doesn't work to naively label it as shown below. The question would be if the proof recording apparatus was robust enough to handle such cases.

-- This is valid
example : p ∨ q → q ∨ p := by
  intro h
  cases h with
  | _ =>
    try {apply Or.inr; assumption}
    try {apply Or.inl; assumption}

-- This is not valid
example : p ∨ q → q ∨ p := by
  intro h
  cases h with
  | _ => ?i1

  case i1 =>
    try {apply Or.inr; assumption}
    try {apply Or.inl; assumption}

Jason Rute (Jun 20 2021 at 03:29):

Also, speaking of the proof recording apparatus, it is not clear to me (just because I'm still very new to Lean 4), how proof recording would work with all the syntax sugar, macros, and other customizations with the Lean 4 Tactic DSL. For example, I think this "intro" tactic is syntax sugar for intro + match. Is it implemented in lean as just a single tactic or as (parse level syntactic sugar for) two tactics? How would it be recorded? Would your suggested handling of induction, cases, and match work out out of the box on this intro example as well? What if a user made their own similar tactic (or syntactic sugar)?

theorem ex2 : p ∨ q → q ∨ p := by
  intro
  | Or.inl _  => apply Or.inr; assumption
  | Or.inr h2 => apply Or.inl; exact h2

Mario Carneiro (Jun 20 2021 at 04:17):

I don't think lean 4 makes any sharp distinction between parse-level macro expansion and tactic execution. So intro | ... is probably implemented as macro expanding to intro h; cases h with | ..., however that doesn't really mean that you can eagerly split all such tactics into subcomponents, because repeat t is also implemented as macro expanding to t; repeat t, where the macro expansion halts once t succeeds. So in general, you would have to treat the input syntax pre-expansion, which means that user tactics have to be treated as their own things even if they are just macro expansion wrappers around existing tactics.

Jason Rute (Jun 21 2021 at 00:20):

I played around a little with the Lean4 lean-gym prototype. As for getting started, here is what worked for me:

git clone git@github.com:dselsam/lean-gym.git
cd lean-gym
elan override set leanprover/lean4:nightly  # sets up the nightly version of Lean 4 in this directory
LEAN_PATH=$HOME/.elan/toolchains/leanprover--lean4---nightly/lib/lean lean --run Gym.lean Nat.add_comm

Note, I used nightly version of Lean 4. I don't think stable works for this application yet. Also, I could't get leanpkg build bin to work.

Jason Rute (Jun 21 2021 at 00:20):

Instructions can be found as comments in the Gym.lean file. Note, again that I use a more verbose command than lean-gym, but it works.

Jason Rute (Jun 21 2021 at 00:20):

As for the prototype, I quickly run into errors. For example, it doesn't backtrack well.

$ LEAN_PATH=$HOME/.elan/toolchains/leanprover--lean4---nightly/lib/lean lean --run Gym.lean Nat.add_comm
{"state": "⊢ ∀ (n m : Nat), n + m = m + n",
 "proved": false,
 "error": null,
 "branchId": 0}
> 0 intro n
{"state": "n : Nat\n⊢ ∀ (m : Nat), n + m = m + n",
 "proved": false,
 "error": null,
 "branchId": 1}
> 0 intro n'
{"state": "n' : Nat\n⊢ ∀ (m : Nat), n' + m = m + n'",
 "proved": false,
 "error": null,
 "branchId": 2}
> 1 intro m
uncaught exception: unknown metavariable '?_uniq.3'

Jason Rute (Jun 21 2021 at 00:21):

Also, I can't seem to run a match tactic:

$ LEAN_PATH=$HOME/.elan/toolchains/leanprover--lean4---nightly/lib/lean lean --run Gym.lean Nat.add_comm
{"state": "⊢ ∀ (n m : Nat), n + m = m + n",
 "proved": false,
 "error": null,
 "branchId": 0}
> 0 intros n m
{"state": "n m : Nat\n⊢ n + m = m + n",
 "proved": false,
 "error": null,
 "branchId": 1}
> 1 match m with | 0 => simp | k+1 => simp [Nat.add_comm]
Error: index out of bounds
Error: index out of bounds
Error: index out of bounds
Error: index out of bounds
uncaught exception: tag not found

Daniel Selsam (Jun 21 2021 at 16:27):

@Jason Rute Thanks, fixed both.

Daniel Selsam (Jun 21 2021 at 18:33):

@Jason Rute @Stanislas Polu This is roughly what I had in mind for tracing: https://github.com/dselsam/lean4/tree/experiment-trace-tactics There are several design decisions to be made but this version seems like a workable starting point. For Nat.mod_lt, it successfully creates a dataset that can replay Nat.mod_lt in lean-gym:

0 intro x y
1 induction x, y using mod.inductionOn with | base x y h₁ => ?_| ind x y h h₂ => ?_
2 intro h₃
3 refine (noImplicitLambda% (have ⟨_, h₁⟩ := h; ?_))
4 rewrite [Nat.mod_eq_sub_mod h₁]
5 exact h₂ h₃
6 intro h₂
7 refine (noImplicitLambda% (have h₁ : ¬0 < y ∨ ¬y ≤ x := Iff.mp (Decidable.notAndIffOrNot _ _) h₁; ?_))
8 refine (match h₁ with | Or.inl h₁ => ?_ | Or.inr h₁ => ?_)
9 exact absurd h₂ h₁
10 refine (noImplicitLambda% (have hgt : y > x := gtOfNotLe h₁; ?_))
11 refine (noImplicitLambda% (have heq : x % y = x := mod_eq_of_lt hgt; ?_))
12 rewrite [←heq]at hgt
13 exact hgt

Daniel Selsam (Jun 21 2021 at 18:38):

TacticM should probably accumulate the datapoints (as Lean datatypes) in an array (if some flag is set) and allow a user-defined function to access/prune/postprocess/export (rather than building in the heuristics above).

Jason Rute (Jun 22 2021 at 01:39):

This is pretty cool! Especially considering how few lines of code it is to implement. For reference, here is the original proof that is being recorded in the example above:

theorem mod_lt (x : Nat) {y : Nat} : y > 0 → x % y < y := by
  induction x, y using mod.inductionOn with
  | base x y h₁ =>
    intro h₂
    have h₁ : ¬ 0 < y ∨ ¬ y ≤ x := Iff.mp (Decidable.notAndIffOrNot _ _) h₁
    match h₁ with
    | Or.inl h₁ => exact absurd h₂ h₁
    | Or.inr h₁ =>
      have hgt : y > x := gtOfNotLe h₁
      have heq : x % y = x := mod_eq_of_lt hgt
      rw [← heq] at hgt
      exact hgt
  | ind x y h h₂ =>
    intro h₃
    have ⟨_, h₁⟩ := h
    rw [mod_eq_sub_mod h₁]
    exact h₂ h₃

Jason Rute (Jun 22 2021 at 01:40):

Daniel Selsam said:

TacticM should probably accumulate the datapoints (as Lean datatypes) in an array (if some flag is set) and allow a user-defined function to access/prune/postprocess/export (rather than building in the heuristics above).

That would be amazing! I know HOL-Light for example has some code in the kernel that lets you record proofs if turned on.

Daniel Selsam (Jun 22 2021 at 01:48):

Most of that code is implementing the heuristic syntax-cleaning, which shouldn't be built-in anyway. The tracing code itself is tiny.

Jason Rute (Jun 22 2021 at 01:58):

Silly questions about this prototype:

Does one just build Lean from source as normal (but from your branch) and the tactics history is traced to STDOUT?
The first intro x y. Is that somehow recorded, or did you just add that to match the start of the tactic proof?
Is refine (noImplicitLambda% ...) explicitly added during the recording, or did you just know to add them when entering it back in?
How automatic is it to take the output of the recording process and reproduce a proof for your lean-gym? (This would be a great way to test if the recording and the gym both align and scale robustly to much of the library.)
What is being recorded is a pretty print of the parsed tactic command, verse what the human typed in, right? (I assume that since rw [← heq] at hgt was modified to be rewrite [←heq]at hgt.) I have had a lot of trouble in Lean 3 with putting pretty printed expressions back into Lean. They sometimes didn't parse. However, I think this is mostly fixed in Lean 4, right?

Daniel Selsam (Jun 22 2021 at 02:08):

Jason Rute said:

Does one just build Lean from source as normal (but from your branch) and the tactics history is traced to STDOUT?

Yes. In this PoC commit, there is no flag to disable or redirect it.

The first intro x y. Is that somehow recorded, or did you just add that to match the start of the tactic proof?

This was not recorded, and I added it.

Is refine (noImplicitLambda% ...) explicitly added during the recording, or did you just know to add them when entering it back in?

Evaluating many of the tactics involves constructing a new term t with holes, and then evaluating the tactic refine t. I did not add these.

How automatic is it to take the output of the recording process and reproduce a proof for your lean-gym? (This would be a great way to test if the recording and the gym both align and scale robustly to much of the library.)

It is not automatic, but you may automate it. If you do, note that the datapoints are not currently printed in chronological order, so you would either need to (a) fix that or (b) hash the goal strings to find the next tactic.

What is being recorded is a pretty print of the parsed tactic command, verse what the human typed in, right? (I assume that since rw [← heq] at hgt was modified to be rewrite [←heq]at hgt.) I have had a lot of trouble in Lean 3 with putting pretty printed expressions back into Lean. They sometimes didn't parse. However, I think this is mostly fixed in Lean 4, right?

The tactic syntax is formatted into a string automatically. Roundtripping expressions (i.e. Expr → String → Expr) may never work perfectly, but roundtripping syntax (i.e. Syntax → String → Syntax) should already work perfectly (AFAIK).

Daniel Selsam (Jun 22 2021 at 02:14):

For more context, here are the three heuristics I implemented above:

For every goal, keep the last tactic to be evaluated on that goal.
Post-process the syntax to replace parts that are evaluated downstream with ?_.
Store the datapoints in the backtracking part of the tactic state, so f <|> g won't keep the datapoints from f if f fails.

I didn't consider whether this approach makes sense for every combinator. This approach also has some known downsides. For example, intros x y will always trace as intro x and then intro y. One alternative would be to create additional data at different "levels", where you keep the first tactic that is "small enough", or the last otherwise.

Jason Rute (Jun 22 2021 at 02:21):

Thanks. Yes, I think there is still a lot of design work here. And given there isn't much Lean 4 data, I don't think we have to work it out all that soon, but having these as a place to start is really good! This is plenty to play with.

Last updated: May 02 2025 at 03:31 UTC