Zulip Chat Archive

Stream: Equational

Topic: Using automated theorem provers

Stefan Hetzl (Sep 29 2024 at 18:54):

I wrote some code that takes an implication between two equations, 1. sends it to an automated theorem prover, e.g., prover9, and 2. translates the returned proof into Lean code to prove the original implication.

This work is based on the gapt system. You can find the code in the eqthproject branch at examples/eqthproject/eqthproject.scala.

The import from various automated theorem provers into gapt exists since a long time and is quite stable. The system also provides a translation from resolution proofs to sequent calculus proofs. What I implemented now is a rather naive (line-by-line) translation of simple sequent calculus proofs (such as those of the implicaton of one equation by another) to Lean code.

For example, it automatically generates code like

theorem eqimpl (G: Type*) [Magma G] (h0: ∀ ( x y :G), ( x ∘ x ) = ( x ∘ y )): ( ∀ ( x y z :G), ( x ∘ y ) = ( x ∘ z ) ) := by
  intro a1 a2 a3
  have h1: ( a1 ∘ a1 ) = ( a1 ∘ a1 ) -- cut
  · rfl -- refl
  · have h2 := h0 -- c:l
    have h3: ( a1 ∘ a3 ) = ( a1 ∘ a1 ) -- cut
    · have h4: ( a1 ∘ a1 ) = ( a1 ∘ a3 ) -- cut
      · have h5: ∀ y, ( a1 ∘ a1 ) = ( a1 ∘ y ) := by apply h2 -- ∀:l
        clear h2
        have h6: ( a1 ∘ a1 ) = ( a1 ∘ a3 ) := by apply h5 -- ∀:l
        clear h5
        exact h6 -- ax
      · nth_rw  1 [←h4] -- eq:r
    · have h7: ( a1 ∘ a1 ) = ( a1 ∘ a3 ) := by -- eq:l
        nth_rw  2 [←h3] at h1
        exact h1
      clear h3 -- w:l
      have h8: ( a1 ∘ a2 ) = ( a1 ∘ a1 ) -- cut
      · have h9: ( a1 ∘ a1 ) = ( a1 ∘ a2 ) -- cut
        · have h10: ∀ y, ( a1 ∘ a1 ) = ( a1 ∘ y ) := by apply h0 -- ∀:l
          clear h0
          have h11: ( a1 ∘ a1 ) = ( a1 ∘ a2 ) := by apply h10 -- ∀:l
          clear h10
          exact h11 -- ax
        · nth_rw  1 [←h9] -- eq:r
      · have h12: ( a1 ∘ a2 ) = ( a1 ∘ a3 ) := by -- eq:l
          nth_rw  1 [←h8] at h7
          exact h7
        clear h8 -- w:l
        exact h12 -- ax

It also generates code for "Proposition 1. Equation4 implies Equation7" from Tao's original blog post. This is approx. 1460 lines of Lean code and I ran intro trouble with Lean recommending to increase maxHeartbeats. Obviously, there is a lot of potential to optimise the code. This is just a first attempt. But I believe that it should be possible to generate Lean proofs for most of the true implications this way.

Joachim Breitner (Sep 29 2024 at 19:02):

It might scale better to write an elaborator that parses the output of the other took and generate the proof (as an Expr) directly, to avoid the detour through surface syntax. Not a trivial task though, with rewriting etc.

Stefan Hetzl (Sep 30 2024 at 09:03):

@Joachim Breitner Thanks for the suggestion! However, being a novice Lean user I don't think I could do that in a reasonable time frame.

I also think there is lots to improve on a purely proof-theoretic level. In principle, it should be possible to automatically transform such proofs into, essentially, rewriting sequences and to export them as such. I would expect this to cut down the number of lines in the Lean code to about 20% of what it is now.

Also, I am wondering: what is the best use of this code within this project? Do we have a list of equations we conjecture to be true? I had a look at the list of most wanted implications. Picking about a dozen at random, most of them seem to be false. Or should I run it over all implications in the subgraph?

Joachim Breitner (Sep 30 2024 at 09:09):

Our most-wanted implications are false? Sounds like it’s time we get the counter examples in, and remove that where we already prove a counter-example.

Stefan Hetzl (Sep 30 2024 at 09:19):

For example, Equation5 implies Equation2 is stated in the most wanted list. Mace4, when prompted with

assign(max_seconds, 5).

formulas(sos).

x = y * x.

end_of_list.

formulas(goals).

x = y.

end_of_list.

generates the two-element countermodel that satisfies x = y * x.

Joachim Breitner (Sep 30 2024 at 09:23):

Ok, then I am sure we have that counter-example already in SmallMagmas.lean.

That smells like a nice task: Write a tool that complains about any entry in the most wanted list that is already solved either way, and let it run in CI. (Or if this list was created by a script, adjust the script to take new data into account.)

Vlad Tsyrklevich (Oct 01 2024 at 05:27):

Stefan Hetzl said:

Joachim Breitner Thanks for the suggestion! However, being a novice Lean user I don't think I could do that in a reasonable time frame.

I also think there is lots to improve on a purely proof-theoretic level. In principle, it should be possible to automatically transform such proofs into, essentially, rewriting sequences and to export them as such. I would expect this to cut down the number of lines in the Lean code to about 20% of what it is now.

Also, I am wondering: what is the best use of this code within this project? Do we have a list of equations we conjecture to be true? I had a look at the list of most wanted implications. Picking about a dozen at random, most of them seem to be false. Or should I run it over all implications in the subgraph?

You can generate the list of already proven/disproven conjectures if that's what you're asking. That gets rid of >80% of conjectures at this point. I haven't used the tool to generate 'most wanted' implications though so I'm not sure it's current state.

Stefan Hetzl (Oct 01 2024 at 07:21):

To run an external theorem prover, do all the conversions, and output Lean code I would budget around 1 second per implication. So when I run this on my laptop over night I would estimate that I can deal with around 20.000-30.000 implications. My question is: how can I generate a list of unsolved implications of around this length?

If I understand you correctly what you suggest would generate a list of length approx. 22M * 0,2 = 4,4M.

Vlad Tsyrklevich (Oct 01 2024 at 07:47):

Stefan Hetzl said:

To run an external theorem prover, do all the conversions, and output Lean code I would budget around 1 second per implication. So when I run this on my laptop over night I would estimate that I can deal with around 20.000-30.000 implications. My question is: how can I generate a list of unsolved implications of around this length?

If I understand you correctly what you suggest would generate a list of length approx. 22M * 0,2 = 4,4M.

I could share a non-canonical way, but maybe there's a better way now. @Daniel Weber does #126 offer a way to do this? I see lake exe extract_implications now offers a closure option, but what file am I supposed to give it? lake exe extract_implications --closure --json equational_theories/Generated.lean does nothing

Vlad Tsyrklevich (Oct 01 2024 at 07:51):

Actually, I'm no longer sure how to pull out implications with lake exe extract_implications at all.

Daniel Weber (Oct 01 2024 at 08:07):

You need to use equational_theories.Generated, not equational_theoties/Generated.lean, and it also has to be after you've already built the files

Daniel Weber (Oct 01 2024 at 08:12):

Stefan Hetzl said:

To run an external theorem prover, do all the conversions, and output Lean code I would budget around 1 second per implication. So when I run this on my laptop over night I would estimate that I can deal with around 20.000-30.000 implications. My question is: how can I generate a list of unsolved implications of around this length?

If I understand you correctly what you suggest would generate a list of length approx. 22M * 0,2 = 4,4M.

Something to note is that as you prove/disprove implications other ones will be proved/disproved by transitivity. I'm not sure about strategies to choose what would be most beneficial to prove/disprove, but I think I could get something working to dynamically update the transitive closure

Joachim Breitner (Oct 01 2024 at 08:21):

Daniel Weber said:

You need to use equational_theories.Generated, not equational_theoties/Generated.lean, and it also has to be after you've already built the files

Or just pass nothing and it’ll use the top-level equational_theoreis by default, won’t it?

Vlad Tsyrklevich (Oct 01 2024 at 08:37):

It doesn't seem like it. Also seems like the json format of lake exe extract_implications --json equational_theories is different than previously unless I'm supposed to be using a different invocation

Daniel Weber (Oct 01 2024 at 08:39):

It doesn't use facts, it's back to the previous nonimplications

Vlad Tsyrklevich (Oct 01 2024 at 08:43):

@Stefan Hetzl Seems like the canonical answer is to run lake exe extract_implications --closure --json equational_theories > /tmp/foo.json and pull out the entries from 'nonimplications'/'implications' keys, and then don't search those

Amir Livne Bar-on (Oct 01 2024 at 08:51):

I'm away from my usual setup but managed to create the lists using old versions of all scripts.

These are at
https://github.com/amirlb/equational_theories/tree/unknown-implications

The readme says which files exist in results.zip, you need to download and unzip it. The file only_strongest (with 24283 implications) is probably the most interesting for SMT provers

Amir Livne Bar-on (Oct 01 2024 at 09:06):

Since it was cobbled together in a hackish way, it would be useful if somebody can do a quick consistency test, like whether it's a subset of their local list of unknown implications

Stefan Hetzl (Oct 01 2024 at 09:59):

@Amir Livne Bar-on This looks great! Thanks a lot!

Stefan Hetzl (Oct 01 2024 at 10:07):

I have to make some more adaptions to my code. I hope that I can run it on only_strongest tonight.

Stefan Hetzl (Oct 02 2024 at 07:01):

The run finished. I could generate 2491 Lean proofs. For 9586 implications, prover9 did not find a proof. The remaining cases were inconclusive, most of them likely due to the current state of the proof translation not being up to the task.

Stefan Hetzl (Oct 02 2024 at 07:04):

You can find the proofs in the gapt_generated branch of my fork at equational_theories/Generated/Gapt.

The file with the lean proofs has a size of around 80MB. Some of the proofs are quite short, say, up to 40 lines which should be ok. Many proofs however are quite long and cannot be parsed by Lean in reasonable time.

Stefan Hetzl (Oct 02 2024 at 09:34):

In order to generate better Lean proofs I will start to look into exporting from resolution proofs directly, instead of going via the sequent calculus. This will require a more specialised algorithm, but the results should be significantly better.

Stefan Hetzl (Oct 02 2024 at 09:37):

On the Lean side, I could use some advice on what the best (most efficient) format for equality inference would be. @Joachim Breitner, you have mentioned to generate an an Expr directly. I am not sure I understand what you mean by that (I am a novice Lean user). Could you point me to some documentation and/or examples?

Daniel Weber (Oct 02 2024 at 09:39):

You could take a look at https://leanprover-community.github.io/lean4-metaprogramming-book/

Vlad Tsyrklevich (Oct 02 2024 at 10:21):

@Stefan Hetzl Are you computing their transitive reduction in combination with the base repository? That usually reduces the total number of proofs quite a bit, and may make your job a lot easier if the total number of necessary proofs is fairly small

Stefan Hetzl (Oct 02 2024 at 10:24):

My problem is not which implications to check. I'll happily let others deal with this question :wink:

My problem is that the proofs I generate for an individual implication are too long to be parsed by Lean.

Vlad Tsyrklevich (Oct 02 2024 at 10:26):

Well, if reduction reduces the number of proofs by 95%, and that gets rid of the ones that are too long, then that solves your problem

Vlad Tsyrklevich (Oct 02 2024 at 10:29):

On my branch, this is the theorems of yours that are necessary:

Note that I have some additional uncommitted theorems on my branch so it may be overly reducing them compared to what's on main right now

Vlad Tsyrklevich (Oct 02 2024 at 10:32):

Note that this may be as small as it is because more implications were merged overnight that you may not have had when you started your run, so it may just be that there was overlap

Stefan Hetzl (Oct 02 2024 at 10:33):

Hmmm... I'm not sure I understand: 276,25means that I have proved Equation 276 implies Equation 25?

Vlad Tsyrklevich (Oct 02 2024 at 10:37):

Well, I am foolish. I included one wrong path in the invocation and used data that wasn't yours. Re-calculating, sorry for the confusion.

Vlad Tsyrklevich (Oct 02 2024 at 10:52):

ok, this took longer to figure out than expected. This workflow is slightly different and I had gotten to used to automatically doing my steps. 952 new theorems, so it doesn't make your life that much easier

Stefan Hetzl (Oct 02 2024 at 11:16):

Ok. Thanks!

Stefan Hetzl (Oct 02 2024 at 11:17):

I am also wondering if there is something useful I can / should do with these proofs now. Will it be easier for Lean if I split it in individual files? Is there a way to mark them as "proved by an external program" and comment out the tactic script? Should I throw away the long proofs and make a PR containing only the short proofs?

Stefan Hetzl (Oct 03 2024 at 04:28):

I plan to rewrite the export code to go from gapt resolution proofs to Lean proof terms directly, thus bypassing the sequent calculus in gapt and the tactics language in Lean. This should scale much better.

My questions to the Lean experts are: Is what I do in the following code a good way to create proof terms? Will Lean like large nestings of these snippets? In particular, is there a way to avoid explicitely indicating the motive in ex_paramod_cw and ex_paramod_ccw (something like nth_rewrite on the proof term level)? Can I deal with blocks of quantifiers more efficiently than I do currently in ex_flipwithforall and ex_substwithforall?

theorem ex_substwithforall (G: Type*) [Magma G] (h: ∀ (x y : G), x ∘ y = x) :
  ∀ (x y z : G), (z ∘ z) ∘ y = (z ∘ z) :=
  fun x : G => fun y : G => fun z : G =>
    show _ from h (z ∘ z) y

theorem ex_flipwithforall (G: Type*) [Magma G] (h: ∀ (x y z : G), x ∘ y = z) :
  ∀ (x y z: G), z = x ∘ y :=
  fun x : G => fun y : G => fun z : G =>
    show _ from Eq.symm (h x y z)

theorem ex_paramod_cw (G: Type*) [Magma G] (x y z: G)
  (h1: x = x ∘ y) (h2: x ∘ (z ∘ x) = z) : x ∘ (z ∘ (x ∘ y)) = z :=
  Eq.subst (motive := (fun a : G => x ∘ (z ∘ a) = z )) h1 h2

theorem ex_paramod_ccw (G: Type*) [Magma G] (x y z: G)
  (h1: x ∘ y = x) (h2: x ∘ (z ∘ x) = z) : x ∘ (z ∘ (x ∘ y)) = z :=
  Eq.subst (motive := (fun a : G => x ∘ (z ∘ a) = z )) (Eq.symm h1) h2

Stefan Hetzl (Oct 03 2024 at 04:53):

@Joachim Breitner Is that what you had in mind originally?

Joachim Breitner (Oct 03 2024 at 07:18):

Can you not write the code as a lean metaprogram, i.e. a command or tactic that reads the gapt resolution proof terms?

But a go program that produces lean expressions is also a good start, and having theorems corresponding to the primitives of that format is reasonable.

Stefan Hetzl (Oct 03 2024 at 07:32):

:thinking: How would this be invoked? Something like the following?

theorem Equation2_implies_Equation7 (G: Type*) [Magma G] (h: Equation2 G) : Equation7 G := by
  gapt_import( "...some large string describing a resolution proof produced by gapt..." )

Daniel Weber (Oct 03 2024 at 07:32):

I think it would make more sense as a DSL, not a string

Joachim Breitner (Oct 03 2024 at 15:19):

Stefan Hetzl said:

:thinking: How would this be invoked? Something like the following?

theorem Equation2_implies_Equation7 (G: Type*) [Magma G] (h: Equation2 G) : Equation7 G := by
  gapt_import( "...some large string describing a resolution proof produced by gapt..." )

Exactly. Or it could take a filename if the string is very large.

Joachim Breitner (Oct 03 2024 at 15:20):

Daniel Weber said:

I think it would make more sense as a DSL, not a string

Unless you want users to interactively write or inspect it I'm not sure I agree here. Maybe if implementing the parser is easier using lean's own parser, but otherwise I'd expect a plain string to be easier and more efficient.

Stefan Hetzl (Oct 04 2024 at 17:31):

I have now updated to code to generate Lean proof terms directly from resolution proofs. It avoids the large terms that caused the performance problems with the first translation to a very large extent. It now generates proofs like the following:

theorem Equation53_implies_Equation1038 (G: Type*) [Magma G] (h: Equation53 G): Equation1038 G :=
  have h0 := h
  have h1: _ := fun (v0 v1 : G) => h0 v0 v1 -- subst
  have h2: _ := fun (v0 v1 : G) => Eq.symm (h1 v0 v1) -- flip
  have h3: _ := fun (v100 v1 : G) => h2 v100 v1 -- subst
  have h4 := h
  have h5: _ := fun (v0 v1 : G) => h4 v0 v1 -- subst
  have h6: _ := fun (v0 v1 : G) => Eq.symm (h5 v0 v1) -- flip
  have h7: _ := fun (v100 v1 : G) => h6 v100 ( v1 ◇ ( v100 ◇ v1 ) ) -- subst
  have h8: _ := fun (v100 v1 : G) => by (have h7' := h7 v100 v1; nth_rewrite 1 [h3 v100 v1] at h7'; exact h7') -- paramod
  have h9: _ := fun (x y : G) => h8 x y -- subst
  have h10: _ := fun (x y : G) => Eq.symm (h9 x y) -- flip
  by exact h10

Stefan Hetzl (Oct 04 2024 at 17:31):

This is the same theorem as the one I mentioned at the beginning of this topic.

Stefan Hetzl (Oct 04 2024 at 17:34):

What would be a set of suitable implications to apply it to now?

Vlad Tsyrklevich (Oct 04 2024 at 17:36):

https://github.com/teorth/equational_theories/pull/270 I'd run it on top of main with https://github.com/teorth/equational_theories/pull/281 applied as that greatly reduces the search space

Vlad Tsyrklevich (Oct 04 2024 at 17:37):

With the size of the remaining theorems it may not matter to order them though

Stefan Hetzl (Oct 04 2024 at 19:02):

I didn't find any new theorem in first 600 implications generated by find_powerful_theorems.

Last updated: May 02 2025 at 03:31 UTC