Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Getting started with lean-gym


Aidan Swope (Jun 02 2021 at 02:15):

(This continues discussion in #Machine Learning for Theorem Proving > HOList or Lean? but with an easier-to-search title)

Thank you @Stanislas Polu for putting together lean-gym! I'm excited to get started using it but I have a couple of questions after poking around a bit.

First, I noticed that the goal loaded in the REPL is sometimes different from the goal of the same name extracted with lean-proof-recording-public. For example, the first state in proving buffer.write_eq_write' is loaded in the REPL as

  {α : Type u} (b : buffer α) (i : ) (h : i < b.size) (v : α), b.write i, h v = b.write' i v

but appears in the data as

α : Type u,  b : buffer α,   i : ,   h : i < b.size,  v : α   b.write i, h v = b.write' i v

presumably because the universe u and variable α are defined globally within buffer.lean. In this case just running intros is sufficient to change one into the other, but I'm worried that minor differences like these might trip up a language model. Do you know what other differences I should expect between the REPL and the training data?

Second (though this might be more of a question for @Jason Rute): it seems that for some "compressed" proofs using tactic combinators, there are also "split-up" proofs. Using the same example of buffer.write_eq_write', the proof in buffer.lean is a one-liner by cases b; unfold write write'; simp [array.write_eq_write']. The training data includes this, but also the 3 tactics individually. Am I interpreting the origin of these "extra proof steps" correctly?

Finally, what are the correct data splits to use when comparing to PACT? The paper references a test theorem count of 3071. Using lean-proof-recording-public I see 3651 test theorems (cleaned_training_data/test.names), and using lean-gym I see 998 test theorems (data/mathlib-test.names).

Jason Rute (Jun 02 2021 at 02:51):

As for lean-proof-recording-public, it doesn't have a particular "first tactic state". It just includes all the places where tactics are run. docs#buffer.write_eq_write' appears in the code as:

lemma write_eq_write' (b : buffer α) (i : nat) (h : i < b.size) (v : α) :
  write b i, h v = write' b i v :=
by cases b; unfold write write'; simp [array.write_eq_write']

So even though lean stores it internally as a universal statement, the proof starts with b, i, h, and v in the local context. In this case, yes, a few intros will get you there. Also, you are correct on the origin of the "extra proof steps".

Jason Rute (Jun 02 2021 at 02:51):

I should also point out some other examples, so you see how the proof recording data works. Consider docs#nat.add_assoc. In that case, the proof is mostly a term proof, but one branch included a short tactic proof (starting with by). Only the tactic branch is recorded.

protected lemma add_assoc :  n m k : , (n + m) + k = n + (m + k)
| n m 0        := rfl
| n m (succ k) := by rw [add_succ, add_succ, add_assoc]

Jason Rute (Jun 02 2021 at 02:51):

I don't have another example off hand, but you can easily see complicated proofs that have parts like: try {refl; rw [by refl : foo]}. Here try {refl; rw [by refl : foo]} is a tactic, and so is refl; rw [by refl : foo], refl, rw [by refl : foo]. Further, inside this tactic block is a whole other tactic block by refl and we record refl too. If you are particularly interested in getting the outermost tactics, for example for testing the REPL, you may want to look deeper into the data being generated by the proof recording process. There is a lot more there specifying the structure of the proof. It isn't well documented but I'm happy to help.

Jason Rute (Jun 02 2021 at 02:55):

As for other differences between the REPL and training data, I would watch out for any differences in how tactic states are printed. (However, in your example, those are two different tactic states, so they should print differently.) The printing code might not be perfectly aligned (which is of course bad, and it should be addressed).

Jason Rute (Jun 02 2021 at 02:59):

As for the differences in theorem numbers, I think the lean-gym numbers are from a newer version of lean_proof_recording where we modified the test ratio to be smaller. (@Stanislas Polu we should make that public soon.) Also, I think different versions of lean or mathlib could also explain the small differences.

Jason Rute (Jun 02 2021 at 03:04):

Also, this is really subtle, but I believe lean-gym carried over from gptf a small rewrite tactic optimization. Our language model would sometimes make long rewrite chains like rw [foo1, foo2, foo3, .., foo7]. Since most of the rewrites were good, but one bad apple ruined the bunch, I think we added some special logic in how we parserw tactics. @Jesse Michael Han knows the details better. I wonder (and I'm opening speculating here) if this is a good feature to keep in lean-gym, or if it just adds unnecessary confusion.

Jesse Michael Han (Jun 02 2021 at 05:28):

yeah this was implemented by @Edward Ayers, we try to run a squeeze_rewrite function while parsing tactics from the strings emitted by the model (link)

Aidan Swope (Jun 02 2021 at 06:09):

Thanks, this definitely helps me understand the proof recording better. The case mentioned above isn't such an issue because the initial REPL state is only an intros away from the initial state faced by a person, but are there other cases where you'd expect the difference to be bigger? (I'd guess not because the only case I can think of involve implicit universes / variables, but I'm not certain)

Stanislas Polu (Jun 02 2021 at 07:20):

Hi Aidan, to compare with PACT precisely you would have to use lean-tpe and proper labels on lean_proof_recording and lean-step. Over the past few weeks I worked on bumping mathlib and including lean-liquid in lean_proof_recording and lean-step. Doing so, as mentioned by @Jason Rute I also reduced the test/valid ratios a notch which explains the discrepancy in number of test/valid theorems.

All in all, if you run an eval using lean-gym against the current valid/test set, I would say that the pass rate numbers should be roughly comparable to PACT (maybe PACT would be a bit under-estimated depending on your use of lean-gym (lean-tpe on top of the OpenAI API is prolly a bit less stable/more error-prone than our current lean-gym setup as an example)). We consider these numbers comparable for what it's worth.

Concerning the tactic state, it is indeed one intros away from what exist in the training set. I'll let @Jesse Michael Han provide some color on that, but will note that this was the case for PACT as well and that we have hardly seen evidence of this hurting our models.

Hope this helps!

Stanislas Polu (Jun 02 2021 at 08:14):

Also worth mentioning that lean-gym publishes splits for mathlib, lean-liquid and miniF2F separately, to compare to PACT you would want to use the mathlib split of course.

Jason Rute (Jun 02 2021 at 10:43):

Jesse Michael Han said:

yeah this was implemented by Edward Ayers, we try to run a squeeze_rewrite function while parsing tactics from the strings emitted by the model (link)

My point is that this should be documented or removed. Otherwise the users will be confused as to why rw tactics behave differently in lean-gym than in Lean. Do you think this optimization makes a big difference in lean gptf?

Jason Rute (Jun 02 2021 at 11:39):

@Aidan Swope Here is how I think of it. Think of an analogy with board games. In this analogy a proof is like the play of a (one-player) board game. Lean-gym is like a gym interface that let's an AI play (Lean) board games. It always starts at the initial state of the game. Now, lean-proof-recording is a bunch of recored player data. Because of fundamental limitations in the recording process, we don't have every move in the board game. For some games we do have recorded opening moves (usually intro or intros), but for other games we only have the moves for positions deeper into the game. Since the opening game is pretty consistent, this probably isn't a big deal. It isn't a misalignment issue as much as a missing data issue. Also, there aren't necessarily easy ways to fill in this missing data. It is part of the messiness of real world data.

Jason Rute (Jun 02 2021 at 11:39):

Ok, now, before I beat this analogy to death, let me just talk about Lean directly now. There are lots of ways in Lean for users to prove things outside tactic mode. None of these things will be recorded in lean-proof-recording because lean-proof-recording is only capable of recording tactics executed in interactive mode:

  • Putting the parameters on the left side of the colon automatically starts the proof with lambdas (the same thing you would get from applying intro tactics).
  • Using the induction/case-matching syntax as in the proof of nat.assoc above.
  • Using calc syntax.
  • Giving a term proof (possibly using human sugar coating syntax like match, assume, etc.).

Jason Rute (Jun 02 2021 at 11:39):

All of these things, in addition to tactic proofs, are turned by Lean into low-level term proofs. Those proofs are stored in Lean, and we do use those term proofs (in a number of ways) as additional training data in the PACT training framework. (See the PACT paper for the specifics.)

Jason Rute (Jun 02 2021 at 11:39):

Now, there is also another area where lean-gym and lean-proof-recording differ in scope which we have already talked about. lean-gym takes the simplistic approach that a proof is linear and of the form:

begin
tactic_1,
tactic_2,
...,
tactic_n
end

whereas actual Lean proofs can be more tree like. (For what it is worth, this is pretty consistent with other such projects like HOList. The only exception is that HOList works on an individual goal at a time, where lean-gym works on the whole goal stack every time even though most tactics only modify the first goal.) For example, one can combine tactics with all sorts of combinators, like tac1;tac2 to chain tactics and make sure tac2 is applied to all subgoals of applying tac1. We record the whole combination (and indeed Lean gptf is pretty good at using the ; combinator), as well as every instance of tac1 and tac2 (note, there are as many instances of tac2 as there are subgoals of applying tac1). Another common combinator is {...} which focuses on the first goal and solves it. Again, our model is trained to use {...} and also trained on the tactics inside. This way it can give a one-line tactic {...} or separately apply all the tactics inside. Usually, {...} is just a convenient book keeping mechanism more than an integral part of the proof. In short, even though lean-gym (and lean-gptf) are making the simplifying assumption that proofs need to be linear, the training data includes information about many levels of tactic execution so hopefully the final model can learn to construct linear proofs, either by (1) synthesizing whole tactic combinators, or (2) providing a (possibly long and repetitive) proof made of lots of simple steps.

Jason Rute (Jun 02 2021 at 11:39):

One future challenge with Lean4 is that the tactic framework seems to be even less linear, with a more programing like syntax full of combinators. I don't know if the mathlib community really wants to follow this pattern, but if they do, then it will be an even bigger challenge to record proofs and build a gym around this proof framework.

Jason Rute (Jun 02 2021 at 11:56):

(Ok, after writing all that, I think I understand your concern a bit better @Aidan Swope. You are worried about there being a distribution shift between the states in recorded proof data and the states in the lean-gym. This is certainly going to be true for the reasons above. You mention universes. I don't think there will be a distribution shift in universe names, but I could be mistaken. Nonetheless, with the diverse data we use for PACT, it doesn't seem to be a significant issue. Also, I'm sure adding reinforcement learning would significantly help distribution shift since it would push the training data distribution of tactic states to align with those tactic states encountered by the agent in the environment.)

Jason Rute (Jun 02 2021 at 13:18):

Another thing I should mention is that I discovered lean-gym allows the sorry tactic as well as any other tactic commands which are based on sorry, like exact sorry, try {sorry}, etc. I think any user of lean-gym would be prudent to put a guard on any AI to make sure "sorry" isn't in the tactic being tried. Also, there are aliases for sorry in some projects, so checking for the "sorry" string may not be a perfect solution. The silver lining is that unless the training data mentions sorry, a language model should not know it exists. I'm pretty sure there are no sorrys in mathlib. (Also, note HOList has the same problem with the CHEAT_TAC. If using a language model on HOList, it would be prudent to check for CHEAT_TAC especially since I think it is used to "prove" axioms in HOL Light and may have made its way into the training data, but I'm not certain.)

Stanislas Polu (Jun 07 2021 at 20:24):

PSA: lean-gym now checks that generated proofs don't rely on sorry and actually match the type of the original goal: https://github.com/openai/lean-gym/pull/5
See this topic for more details.

And MiniF2F now boasts 136 Lean valid statements! (h/t @Kunhao Zheng)

Stanislas Polu (Jun 09 2021 at 09:47):

lean-gym now also protects against the use of undefined (see the same topic for more details). And miniF2F has 133 Lean test statements (h/t @Kunhao Zheng).

Stanislas Polu (Jun 09 2021 at 09:50):

I've also added a Code of Conduct to MiniF2F's README:

MiniF2F is meant to serve as a shared and useful resource for the machine learning community working on formal mathematics. It is still a TODO to determine how we'll exchange respective results on the benchmark in a way that is the most useful to the community.

In the meantime, if you're using miniF2F and are discovering new proofs (manually or automatically) please contribute them back to the benchmark.

On the first point I think we don't want to over-engineer anything. We'll probably introduce a versioning system so that papers car refer to versions of MiniF2F in papers and call it a day so that it's all simple and easy. Concerning the second point, hope to see as many contributions as possible! Feedback very welcome on this obviously.

Aidan Swope (Jun 18 2021 at 20:58):

@Jason Rute I've noticed that lean-gym and lean-step-public use the same Lean version and mathlib revision, but lean-proof-recording-public uses an older version. This results in some theorems being assigned to different splits in the data vs evaluation environments. For example, stream.mem_map is in the test set in lean-gym and lean-step-public, but the validation set for lean-proof-recording-public.

I think it'd be good to pin the versions used by these three repositories since they seem to collectively make up the benchmark, and I'd worry that people down the line wouldn't be careful enough to check the versions and might report overconfident results.

Daniel Selsam (Jun 18 2021 at 21:37):

FYI here is a bare-bones proof-of-concept tactic repl for Lean4: https://github.com/dselsam/lean-gym The main missing feature is that there is no way yet to import until a given declaration. TBD the best way to handle this in Lean4

Aidan Swope (Jun 18 2021 at 23:15):

@Stanislas Polu Do you recall what Lean version / mathlib revision you used to make the mathlib-{train,valid,test} files in lean-gym? They don't seem to match the results from lean-proof-recording-public using either the commit listed there or in lean-gym (in leanpkg.toml).

Daniel Selsam (Jun 19 2021 at 02:05):

Jason Rute said:

One future challenge with Lean4 is that the tactic framework seems to be even less linear, with a more programing like syntax full of combinators. I don't know if the mathlib community really wants to follow this pattern, but if they do, then it will be an even bigger challenge to record proofs and build a gym around this proof framework.

Can you please clarify your concern? What does it mean for the tactic framework to be "even less linear"? What of importance is missing from the hundred line https://github.com/dselsam/lean-gym/blob/master/Gym.lean (besides importing upto a declaration)?

Jason Rute (Jun 19 2021 at 04:30):

@Aidan Swope, let me work with Stan to publish a public version of lean_proof_recording (or a data) which aligns more closely to lean gym. For the mean time, I've invited you (maxwells-daemons which I got from Googling your name) to the private GitHub repo which should be more closely aligned.

Jason Rute (Jun 19 2021 at 05:04):

@Daniel Selsam Just a quick note. I'll look more closely at your prototype (which is great to have!) soon. When I said "less linear", consider this example from the Lean 4 manual:

theorem ex (x : Nat) {y : Nat} (h : y > 0) : x % y < y := by
  induction x, y using Nat.mod.inductionOn with
  | ind x y h₁ ih =>
    rw [Nat.mod_eq_sub_mod h₁.2]
    exact ih h
  | base x y h₁ =>
     have : ¬ 0 < y  ¬ y  x := Iff.mp (Decidable.notAndIffOrNot ..) h₁
     match this with
     | Or.inl h₁ => exact absurd h h₁
     | Or.inr h₁ =>
       have hgt : y > x := Nat.gtOfNotLe h₁
       rw [ Nat.mod_eq_of_lt hgt] at hgt
       assumption

Jason Rute (Jun 19 2021 at 05:04):

If I'm not mistaken (and correct me if I am), this is just one tactic command, right? So if one puts that into your gym, it would have to go in as a single entry, right? If all (or most) induction examples in the dataset are like this, then it would be hard for a model like lean gptf to get started on an induction proof since it would have to generate the whole proof in one go. Of course Lean 3 has lots of nonlinear tactic proofs as well, and as discussed above that is a distribution shift from the linear paradigm that lean gym enforces. But the Lean 4 syntax seems to blend term proofs and tactic proofs even more. Overall, this seems like a useful thing, except as a source of training data to a linear proof generator. :smile: One could argue, of course, that the problem is with the linear proof assumption of Lean gym and Lean GPT-f (and most similar projects). The Lean gptf approach wouldn't work on say Lean3 term proofs without first converting them into some tactic language or just generating each term proof in one-go with no intermediate goal information.

Jason Rute (Jun 19 2021 at 05:04):

My comment about "the mathlib community" is entirely based on #lean4 > missing tactics which in hindsight, I'm reading too much into. Anyway, in that thread, Kevin talks about how he doesn't like the built in Lean4 cases tactic and was looking for a tactic more like the Lean3 split tactic.

Jason Rute (Jun 19 2021 at 05:04):

Since we are on this topic, how easy do you think it would be, @Daniel Selsam, to get a (proof-of-concept) training dataset of Lean4 proofs broken up by goal-tactic pairs similar to what I did for Lean 3? (As for nested tactics, ideally one would enumerate the outer tactic as well as any inner tactics.)

Stanislas Polu (Jun 19 2021 at 05:19):

Thanks @Jason Rute for inviting @Aidan Swope.

@Aidan Swope we indeed need to publish the latest version of lean_proof_recording. @Jason Rute I think your private version is ready for that?

Stanislas Polu (Jun 19 2021 at 05:44):

Thanks @Daniel Selsam for the prototype. This is hugely exciting!

Very interested to have your take ok @Jason Rute’s question. Would love to see in particular built in proof tracing infrastructure 👍

Aidan Swope (Jun 19 2021 at 09:27):

Thanks for the invite! I'll look into using the data from here.

Daniel Selsam (Jun 19 2021 at 16:58):

@Jason Rute @Stanislas Polu I don't see an issue, but I think I understand why you thought there might be. The tactic block above looks monolithic. But internally, evaluating the induction tactic will recursively call evalTactic on the syntax for the subgoals. Similar with the match tactic. And then evaluating the sequences at the leaves will also call evalTactic on the elements in sequence. So, we can collect the data by adding one line to evalTactic to print out the goal and the "cleaned up" syntax (i.e. where all recursive tactics are replaced with placeholders). Note: this would trace even for search branches that failed, but we can easily fix this too by having evalTactic instead store the datapoint in the backtracking part of the tactic state, so we only look at them for successful branches.

Daniel Selsam (Jun 19 2021 at 22:19):

In other words, the proof you posted above is isomorphic to one with explicit decomposition into steps, and we can trace the former as if it were the latter:

example (x : Nat) {y : Nat} (h : y > 0) : x % y < y := by
  induction x, y using Nat.mod.inductionOn with
  | ind x y h₁ ih => ?i1
  | base x y h₁   => ?i2

  case i1 =>
    rw [Nat.mod_eq_sub_mod h₁.2]
    exact ih h

  case i2 =>
    have : ¬ 0 < y  ¬ y  x := Iff.mp (Decidable.notAndIffOrNot ..) h₁
    match this with
    | Or.inl h₁ => ?i21
    | Or.inr h₁ => ?i22

    case i21 =>
      exact absurd h h₁

    case i22 =>
      have hgt : y > x := Nat.gtOfNotLe h₁
      rw [ Nat.mod_eq_of_lt hgt] at hgt
      assumption

Jason Rute (Jun 20 2021 at 00:05):

Yes. This makes sense. This would of course have to be reflected in both the gym interface (by adding support for that ?i1 syntax which I assume are not valid lean code, right?) and in the training data (by explicitly tracing this type of data). I guess this approach would even work for Lean 3, although there aren’t as many tactic combinators in Lean 3.

Mario Carneiro (Jun 20 2021 at 00:39):

The ?i1 syntax is valid lean 4, although I think it needs to be wrapped in a tactic like refine because it is a term, not a tactic (which creates a goal called i1 for the named metavariable)

Jason Rute (Jun 20 2021 at 01:31):

Oh, it does appear to be valid syntax, or at least it works in this example:

theorem ex1 : p  q  q  p := by
  intro h
  cases h with
  | inl h1 =>
    apply Or.inr
    exact h1
  | inr h2 =>
    apply Or.inl
    assumption

can be rewritten as

theorem ex1 : p  q  q  p := by
  intro h
  cases h with
  | inl h1 => ?i1
  | inr h2 => ?i2

  case i1 =>
    apply Or.inr
    exact h1

  case i2 =>
    apply Or.inl
    assumption

Aidan Swope (Jun 21 2021 at 19:07):

Hey @Stanislas Polu, I've had some difficulty loading MiniF2F theorems through lean-gym. I'm seeing not_a_theorem errors for all of them. Could you explain how to set up for loading those please?

Stanislas Polu (Jun 21 2021 at 19:08):

Did you run one of the setup scripts under the scripts/ directory?

Stanislas Polu (Jun 21 2021 at 19:08):

If the theorems are from mathlib, a typical leanpkg setup should do the trick?

Aidan Swope (Jun 21 2021 at 19:14):

Oh, woops, I only ran scripts/setup.sh, not scripts/setup_miniF2F.sh. That does it, thanks!

Ayush Agrawal (Mar 10 2022 at 18:53):

Is there a way so that we could provide custom tactic state or declare new theorem name in the lean-gym. Fro some of the theorems, (for eg. invertible_div) I am getting not_a_theorem error. Any help regarding this or workaround would be helpful!

Daniel Selsam (Mar 11 2022 at 01:47):

Ayush Agrawal said:

Is there a way so that we could provide custom tactic state or declare new theorem name in the lean-gym. Fro some of the theorems, (for eg. invertible_div) I am getting not_a_theorem error. Any help regarding this or workaround would be helpful!

Depending on what you are trying to accomplish, you might just want to skip non theorems. Otherwise, I don't see at a glance why you couldn't just comment out the check @ https://github.com/openai/lean-gym/blob/main/src/repl.lean#L172-L184

Ayush Agrawal (Mar 11 2022 at 15:39):

@Daniel Selsam thanks. Commenting that part is working. However, I get a warning: /home/t-agrawala/Desktop/lean_work/lean-gym/src/ayush_repl.lean:16:0: warning: imported file '/home/t-agrawala/Desktop/lean_work/lean-gym/src/tools/shrink_proof.lean' uses sorry. It would be helpful if you have some workaround to declare a new tactic state and initiating that for lean-gym so that one can also run new examples that are not part of the mathlib/or does not have any name for now in lean-gym

Daniel Selsam (Mar 11 2022 at 22:31):

Ayush Agrawal said:

Daniel Selsam thanks. Commenting that part is working. However, I get a warning: /home/t-agrawala/Desktop/lean_work/lean-gym/src/ayush_repl.lean:16:0: warning: imported file '/home/t-agrawala/Desktop/lean_work/lean-gym/src/tools/shrink_proof.lean' uses sorry. It would be helpful if you have some workaround to declare a new tactic state and initiating that for lean-gym so that one can also run new examples that are not part of the mathlib/or does not have any name for now in lean-gym

See e.g. https://github.com/openai/lean-gym/blob/main/scripts/setup_miniF2F.sh for a (very hacky) way of adding other projects to the environment. Basically you just need to add your favorite package to the leanpkg.toml and define an all.lean (which will be imported by the cryptic import all in repl.lean. You can also just add a file in lean-gym itself with the declarations you want and import that instead. Lean4 gym will be less hacky when the time comes.

Ayush Agrawal (Apr 09 2022 at 19:55):

Hi guys, can anyone please give some causes of the gen_tac_and_capture_res_failed error. From the repl file, it says : if there are remaining subgoals. However, I didn't understand the idea clearly

Ayush Agrawal (Apr 09 2022 at 20:36):

One more doubt:
I am trying this theorem in lean-gym:
lemma sum_two_pow_lt_iff_lt (A B : finset ℕ) : ∑ i in A, 2^i < ∑ i in B, 2^i ↔ A.to_colex < B.to_colex := begin have z : ∀ (A B : finset ℕ), A.to_colex < B.to_colex → ∑ i in A, 2^i < ∑ i in B, 2^i, { intros A B, rw [← sdiff_lt_sdiff_iff_lt, colex.lt_def], rintro ⟨k, z, kA, kB⟩, rw ← sdiff_union_inter A B, conv_rhs { rw ← sdiff_union_inter B A }, rw [sum_union (disjoint_sdiff_inter _ _), sum_union (disjoint_sdiff_inter _ _), inter_comm, add_lt_add_iff_right], apply lt_of_lt_of_le (@nat.sum_two_pow_lt k (A \ B) _), { apply single_le_sum (λ _ _, nat.zero_le _) kB }, intros x hx, apply lt_of_le_of_ne (le_of_not_lt (λ kx, _)), { apply (ne_of_mem_of_not_mem hx kA) }, have := (z kx).1 hx, rw mem_sdiff at this hx, exact hx.2 this.1 }, refine ⟨λ h, (lt_trichotomy A B).resolve_right (λ h₁, h₁.elim _ (not_lt_of_gt h ∘ z _ _)), z A B⟩, rintro rfl, apply irrefl _ h end

However, I got error with have:
["init_search",["colex.sum_two_pow_lt_iff_lt",""]] {"error":null,"proof_steps":[],"search_id":"0","tactic_state":"⊢ ∀ (A B : finset ℕ), A.sum (λ (i : ℕ), 2 ^ i) < B.sum (λ (i : ℕ), 2 ^ i) ↔ A.to_colex < B.to_colex","tactic_state_id":"0"} ["run_tac",["0","0","have z : ∀ (A B : finset ℕ), A.to_colex < B.to_colex → ∑ i in A, 2^i < ∑ i in B, 2^i"]] {"error":"gen_tac_and_capture_res_failed: pos=none msg=parse_itactic failed on have z : ∀ (A B : finset ℕ), A.to_colex < B.to_colex → ∑ i in A, 2^i < ∑ i in B, 2^i","proof_steps":[],"search_id":null,"tactic_state":null,"tactic_state_id":null} ["run_tac",["0","0","intros"]] {"error":null,"proof_steps":[],"search_id":"0","tactic_state":"A B : finset ℕ\n⊢ A.sum (λ (i : ℕ), 2 ^ i) < B.sum (λ (i : ℕ), 2 ^ i) ↔ A.to_colex < B.to_colex","tactic_state_id":"1"} ["run_tac",["0","0","have z : ∀ (A B : finset ℕ), A.to_colex < B.to_colex → ∑ i in A, 2^i < ∑ i in B, 2^i"]] {"error":"gen_tac_and_capture_res_failed: pos=none msg=parse_itactic failed on have z : ∀ (A B : finset ℕ), A.to_colex < B.to_colex → ∑ i in A, 2^i < ∑ i in B, 2^i","proof_steps":[],"search_id":null,"tactic_state":null,"tactic_state_id":null}
Can anyone help? Thanks!

Stanislas Polu (Apr 15 2022 at 09:22):

My first guess would be that you are missing the finset open. Would that work with ["init_search",["colex.sum_two_pow_lt_iff_lt","finset"]] ?

Stanislas Polu (Apr 15 2022 at 09:23):

Generally speaking you need to pass as second argument the open namespace as we don't have a good way to infer it from the declaration name only. Open locales should be fine and properly set.

Ayush Agrawal (Feb 11 2023 at 18:39):

@Stanislas Polu I was trying some experiments with lean-gym and have bumped into a segmentation fault error:
I am getting segmentation fault error after trying the following:

["init_search",["one_div_add_one_div","set"]]
{"error":null,"proof_steps":[],"search_id":"0","tactic_state":"⊢ ∀ {K : Type u} [_inst_1 : field K] {a b : K}, a ≠ 0 → b ≠ 0 → 1 / a + 1 / b = (a + b) / (a * b)","tactic_state_id":"0"}
["run_tac",["0","0","intros"]]
{"error":null,"proof_steps":[],"search_id":"0","tactic_state":"K : Type u,\n_inst_1 : field K,\na b : K,\nha : a ≠ 0,\nhb : b ≠ 0\n⊢ 1 / a + 1 / b = (a + b) / (a * b)","tactic_state_id":"1"}
["run_tac",["0","1","rw one_div"]]
{"error":null,"proof_steps":[],"search_id":"0","tactic_state":"K : Type u,\n_inst_1 : field K,\na b : K,\nha : a ≠ 0,\nhb : b ≠ 0\n⊢ a⁻¹ + 1 / b = (a + b) / (a * b)","tactic_state_id":"2"}
["run_tac",["0","2","simp [div_eq_mul_inv, inv_eq_one_div]"]]
Segmentation fault (core dumped)

However, on another machine, this was the output (timeout!):

["init_search",["one_div_add_one_div","set"]]
{"error":null,"proof_steps":[],"search_id":"0","tactic_state":"⊢ ∀ {K : Type u} [_inst_1 : field K] {a b : K}, a ≠ 0 → b ≠ 0 → 1 / a + 1 / b = (a + b) / (a * b)","tactic_state_id":"0"}
["run_tac",["0","0","intros"]]
{"error":null,"proof_steps":[],"search_id":"0","tactic_state":"K : Type u,\n_inst_1 : field K,\na b : K,\nha : a ≠ 0,\nhb : b ≠ 0\n⊢ 1 / a + 1 / b = (a + b) / (a * b)","tactic_state_id":"1"}
["run_tac",["0","1","rw one_div"]]
{"error":null,"proof_steps":[],"search_id":"0","tactic_state":"K : Type u,\n_inst_1 : field K,\na b : K,\nha : a ≠ 0,\nhb : b ≠ 0\n⊢ a⁻¹ + 1 / b = (a + b) / (a * b)","tactic_state_id":"2"}
["run_tac",["0","2","simp [div_eq_mul_inv, inv_eq_one_div]"]]
{"error":"gen_tac_and_capture_res_failed: pos=none msg=try_for tactic failed, timeout","proof_steps":[],"search_id":null,"tactic_state":null,"tactic_state_id":null}

Lean versions used are the same in both the machines: Lean (version 3.39.1, commit 1781ded0d006, Release)

I have also filed an issue here: https://github.com/openai/lean-gym/issues/31
Any idea of what I might be missing or the possible workarounds for this? Thanks.

Jason Rute (Feb 12 2023 at 14:12):

I’m not sure if this project is still maintained. But also, a segfault is likely an issue with Lean and not the gym itself. (I mean the gym is using Lean in sort of a nonstandard way and that may lead to the segfault but I not sure if the gym maintainers could fix it). Also the researchers on the Hypertree Proof Search paper (which didn’t use Lean-Gym, but their own system) said they encountered a lot of Lean segfaults. I’m curious if you put that proof directly into Lean (either as a new lemma or directly modify one_div_add_one_div in mathlib) if it would work or if you also get your segfault?

Jason Rute (Feb 12 2023 at 14:21):

docs#one_div_add_one_div

Ayush Agrawal (Feb 14 2023 at 06:27):

Thanks @Jason Rute Will check that.

venom Yu (Feb 15 2023 at 05:37):

Hi, when I was trying with lean-gym, I found it seemed not support the "induction" tactic. Here is my code:

    env = LeanEnv('/home/venom/projects/Automated_Theorem_Proving/lean-gym', decl='int.zero_add')
    observation, info = env.reset(return_info=True)
    observation = env.step((observation[0], 'induction a with d hd'))

And I got the error:

obs:(0, '⊢  (a : ), 0 + a = a')
obs:((-1, ''), 0, False, {'error': "gen_tac_and_capture_res_failed: pos=(some ⟨1, 2⟩) msg=unknown identifier 'a'", 'proof_steps': [], 'search_id': None, 'tactic_state': None, 'tactic_state_id': None})

Can anyone help? Thanks a lot!

venom Yu (Feb 15 2023 at 07:14):

Hi, I am trying to hack the lean-gym code, but I meet some troubles when I want to print something to the console. Is there a good way to insert a "print" function in the lean-gym code? Thanks!

Jason Rute (Feb 15 2023 at 12:57):

@venom Yu That is a normal lean error given by using a tactic which don't apply in that situation. LeanGym is acting as expected, just letting you know this tactic doesn't apply in this situation. It has nothing to do with LeanGym and everything to do with Lean. LeanGym is just a way to interact with Lean for RL. Here is my advice. If you are already familiar with Lean 3 as a theorem prover, then like I suggested to Ayush, I would learn how to input what you are seeing in LeanGym directly into Lean. In this case it is simple (also see here):

example :  (a : ), 0 + a = a :=
begin
  induction a with d hd,  -- error: unknown identifier 'a'
end

If you are not familiar with Lean 3, I'd take a little bit of time to play around with it in the usual way before using LeanGym. That will give you some sense of the tactics, error messages, and overall Lean experience. Theorem proving in Lean is pretty good, as is the Natural number game.

Jason Rute (Feb 15 2023 at 13:08):

@venom Yu As for modifying LeanGym, I assume you mean the Lean code and not the Python code. To do anything in the Lean code probably involves some familiarity with lean metaprogramming and monads. Unfortunately, for Lean 3, there isn't great documentation. (It will be better for Lean 4.) There is a lot on the #metaprogramming / tactics stream, and maybe a good place to start is Tactic Writing in Lean and Lean 3 for Hackers.

Jason Rute (Feb 15 2023 at 13:08):

But as for how to print, it depends which monad you are in. If it is the io monad, it is docs#io.put_str. If it is the tactic monad then you can use docs#tactic.trace or for more control you can switch to the io monad with docs#tactic.unsafe_run_io. Now I think LeanGym works by communicating with Python through stdin and stdout, so printing might get in the way of that. You may need to print to stderr instead. In the io monad you can use docs#io.write to write to the handle docs#io.stderr.

venom Yu (Feb 15 2023 at 14:02):

Jason Rute said:

venom Yu That is a normal lean error given by using a tactic which don't apply in that situation. LeanGym is acting as expected, just letting you know this tactic doesn't apply in this situation. It has nothing to do with LeanGym and everything to do with Lean. LeanGym is just a way to interact with Lean for RL. Here is my advice. If you are already familiar with Lean 3 as a theorem prover, then like I suggested to Ayush, I would learn how to input what you are seeing in LeanGym directly into Lean. In this case it is simple (also see here):

example :  (a : ), 0 + a = a :=
begin
  induction a with d hd,  -- error: unknown identifier 'a'
end

If you are not familiar with Lean 3, I'd take a little bit of time to play around with it in the usual way before using LeanGym. That will give you some sense of the tactics, error messages, and overall Lean experience. Theorem proving in Lean is pretty good, as is the Natural number game.

Thanks for your quick reply and kind advice. Emmm in fact, the "zero_add" case is from Natural number game, I didn't see the difference between my code and the groud truth.
image.png
Could you please explain it more detailly? Thanks a lot!

venom Yu (Feb 15 2023 at 14:04):

Jason Rute said:

But as for how to print, it depends which monad you are in. If it is the io monad, it is docs#io.put_str. If it is the tactic monad then you can use docs#tactic.trace or for more control you can switch to the io monad with docs#tactic.unsafe_run_io. Now I think LeanGym works by communicating with Python through stdin and stdout, so printing might get in the way of that. You may need to print to stderr instead. In the io monad you can use docs#io.write to write to the handle docs#io.stderr.

Thanks! I am blocked by the stdin and stdout truly.

Kevin Buzzard (Feb 15 2023 at 14:20):

If you write induction n then there needs to be an orange n : some_type in your context: a real n. Having a "for all n" in your goal is not the same thing. You can move from the "for all n" to the "orange n" with intro n and you can move back with revert n.

venom Yu (Feb 15 2023 at 14:49):

@Kevin Buzzard Oh, that's true. Thanks for your reply!!

Jason Rute (Feb 15 2023 at 14:57):

@venom Yu Ok. I see where you are stuck. In Lean, one writes theorems often by explicitly giving the assumptions, for example the (n : ℕ) in docs#nat.zero_add. For Lean this is stored internally as a forall statement which is what you are seeing in LeanGym, e.g. ∀ (a : ℤ), 0 + a = a. So the ground truth proof in Lean may not work directly. You may need a few intros to get it into the form a Lean user would see when they start the proof. (Also the LeanGym theorem you are looking at is for integers and you are copying a proof for natural numbers.)

venom Yu (Feb 15 2023 at 15:18):

@Jason Rute Thanks a lot !

venom Yu (Feb 18 2023 at 11:01):

Hi guys, I'm learning the document "Programming in Lean"(https://leanprover.github.io/programming_in_lean/#01_Introduction.html), when I try with the cases, nothing is output to the console like this.
image.png
Could anyone help with this?

Jason Rute (Feb 18 2023 at 14:01):

@venom Yu Programming in Lean was never finished and it is quite out of date. (That said the monad chapter is still pretty good, even if the examples don’t run.) Also, I think that this is a bit off topic for this Zulip thread. Maybe start a new thread under #general or #metaprogramming / tactics with what you would like to learn to do, and we can point you to available resources.

venom Yu (Feb 20 2023 at 04:24):

@Jason Rute Thanks. I will transfer to that thread.


Last updated: Dec 20 2023 at 11:08 UTC