Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: A (semi)-autoformalization challenge: 650=>448

Terence Tao (May 16 2025 at 19:26):

In #Machine Learning for Theorem Proving > Tools for semi-automated line-by-line formalization we discussed different ways to formalize a certain implication 1689=>2 studied in the Equational Theories Project, in which we had multiple proofs using multiple tools. While the implication was initially discovered in the 1990s using computer assistance, this particular implication is now relatively easy, with many different semi-automated or fully automated approaches being either partially or fully successful in establishing it.

I would now like to pose a more difficult challenge. There is a more difficult implication 650=>488 that we were able to establish by first verifying the implication in Vampire, and then developing enough of the theory of superposition calculus within Lean that one could translate that proof into a Lean proof, albeit one which is quite lengthy. Vampire took 3.6 seconds to establish this particular implication, which is unusually long (by comparison, 1689=>2 took 0.003 seconds for Vampire). In this regard, this is the hardest positive implication in the ETP.

Anyway, the challenge is to find an alternate proof, by whatever means you wish - human, semi-automated, or automated, with additional imports or without. In this file I have prepared a version of the Vampire proof with all the intermediate steps kept, but with the proofs replaced with sorries (and a comment describing the original proof in the superposition calculus). All in all there are 66 sorries to fill. It would be interesting to have people test out various tools and report their performance. For instance, directly applying canonical to each sorry without any parameter tuning works initially (it closes off eq9, eq10, and eq12) but after that it times out - apparently there are too many hypotheses in the proof state after that for canonical to work efficiently. How do other automated or semi-automated approaches fare?

David Renshaw (May 16 2025 at 20:14):

I was able to get grind to work on all but one of those sorries. (eq3685 is the one that remains.)

David Renshaw (May 16 2025 at 20:15):

Though it gives me the warning

The grind tactic is experimental and still under development. Avoid using it in production projects.

so I'm not sure how much to trust it.

Patrick Massot (May 16 2025 at 20:59):

You can totally trust proofs found by grind. I think that what you are asked to not trust is that grind will still find those proofs in the next release.

Terence Tao (May 16 2025 at 22:56):

Just to keep the discussion centralized: @Jeremy Avigad was able to clear all sorries (and even shorten the proof somewhat) purely through duper, see #Machine Learning for Theorem Proving > Tools for semi-automated line-by-line formalization @ 💬

Michael Kinyon (May 18 2025 at 21:42):

Using Prover9, I was able to get a 27 step proof of 650=>448 (actually xy=x). At Terry's request, I am posting here the Prover9 proof itself (as a text file) and my "humanization" of it (as a PDF).

I don't speak Leanish, but if anyone is interested in Leanifying this proof, feel free.

I should say I did not look at the Vampire proof. I started from scratch and followed the methodology described in a paper on proof simplification I wrote half a dozen years ago: https://arxiv.org/abs/1808.04251

650implies448.txt

650implies448.pdf

David Michael Roberts (May 18 2025 at 22:01):

I'd like to claim this mini project (I mentioned it to @Terence Tao on Mathstodon under my nom de plume)

Terence Tao (May 18 2025 at 22:24):

Good luck! I look forward to seeing your "field report". On my Youtube channel https://www.youtube.com/@TerenceTao27 I describe three different types of workflows to use AI tools to autoformalize these sorts of things, but of course you are welcome to come up with a fourth type instead, which may in fact be a more valuable data point.

Bruno Le Floch (May 19 2025 at 00:07):

I already have a Lean proof, which I am in the process of converting to human-readable LaTeX (probably tomorrow, it's 2am here). Here is the Lean proof. It is loosely based on several prover9 runs with intermediate results thrown in as assumptions or as goals.

Lean proof of 650=>448

class Magma (α : Type _) where
  op : α → α → α

infix:65 " ◇ " => Magma.op

abbrev Equation650 (G: Type _) [Magma G] := ∀ x y z : G, x = x ◇ (y ◇ ((z ◇ x) ◇ y))

abbrev Equation448 (G: Type _) [Magma G] := ∀ x y z : G, x = x ◇ (y ◇ (z ◇ (x ◇ z)))

abbrev f {G: Type _} [Magma G] (x y : G) := x ◇ (y ◇ x)
abbrev g {G: Type _} [Magma G] (x y : G) := x ◇ ((y ◇ x) ◇ x)
abbrev square {G: Type _} [Magma G] (x : G) := (x ◇ x)
abbrev cube {G: Type _} [Magma G] (x : G) := ((x ◇ x) ◇ x)
abbrev lcube {G: Type _} [Magma G] (x : G) := (x ◇ (x ◇ x))

theorem Equation650_implies_Equation448 (G: Type _) [Magma G] (h: Equation650 G) : Equation448 G := by
  have lemma_A (x y z : G) : x ◇ f y (z ◇ x) = x := (h x y z).symm
  have lemma_B (x y : G) : x ◇ lcube (y ◇ x) = x := (h x (y ◇ x) y).symm
  have lemma_C_ff (x y z w : G) : f x (y ◇ w) ◇ f z w = f x (y ◇ w) :=
    (congrArg (_ ◇ f z ·) (lemma_A w x y).symm).trans (lemma_A (f x (y ◇ w)) z w)
  have lemma_D_ff (x y z w : G) : y ◇ w = y → f x y ◇ f z w = f x y :=
    fun hywy => by rw [← hywy] ; exact lemma_C_ff x y z w
  have lemma_E_gf (x y z : G) : g x y ◇ f z x = g x y :=
    lemma_C_ff x y z x
  have lemma_F_fop (x z s : G) : f (g x z ◇ s) s = g x z ◇ s := by
    let a0 := x -- irrelevant choice
    let v := g x z
    let w := (z ◇ x) ◇ x
    have h_fsv : square (f s v) = f s v := by
      calc
        f s v ◇ f s v = f s v ◇ (f s v ◇ f a0 (f w x)) := by rw [lemma_D_ff s v a0 (f w x) (lemma_E_gf x z w)]
        _ = f s v ◇ (f s v ◇ (f a0 (f w x) ◇ f s v)) := by rw [lemma_C_ff a0 w s v]
        _ = f s v := lemma_D_ff s v (f s v) _ (lemma_A v a0 w)
    calc
      (v ◇ s) ◇ f s v = (v ◇ s) ◇ square (f s v) := congrArg (fun t => (v ◇ s) ◇ t) h_fsv.symm
      _ = (v ◇ s) ◇ lcube (f s v) := congrArg (fun t => (v ◇ s) ◇ ((f s v) ◇ t)) h_fsv.symm
      _ = v ◇ s := lemma_B (v ◇ s) s
  have lemma_G_projg (x y z : G) : (x ◇ y) ◇ g x z = x ◇ y := by
    let s := f y x
    let v := g x z
    calc
      (x ◇ y) ◇ v = (x ◇ y) ◇ (v ◇ s) := by rw [lemma_E_gf x z y]
      _ = (x ◇ y) ◇ f (v ◇ s) s := by rw [lemma_F_fop x z s]
      _ = x ◇ y := lemma_A (x ◇ y) _ y
  have lemma_H_eq58_g (x y z : G) : x ◇ (g y z ◇ (y ◇ x)) = x :=
    Eq.trans (congrArg (fun t => (_ ◇ (_ ◇ t))) (lemma_G_projg y x z).symm) (lemma_A x (g y z) y)
  have lemma_I_zx2 (x y z : G) : x ◇ ((y ◇ (z ◇ x)) ◇ (z ◇ x)) = x := by
    let u := (y ◇ (z ◇ x)) ◇ (z ◇ x)
    calc
      x ◇ u = x ◇ (u ◇ square ((z ◇ x) ◇ u)) := by rw [lemma_H_eq58_g u (z ◇ x) y]
      _ = x ◇ (u ◇ ((z ◇ x) ◇ u)) := by rw [square, lemma_G_projg (z ◇ x) u y]
      _ = x := lemma_A _ _ _
  have lemma_J_move_proj (x y z : G) : y ◇ (z ◇ x) = y → x ◇ y = x := by
    intro hyzxy
    calc
      x ◇ y = x ◇ ((y ◇ (z ◇ x)) ◇ (z ◇ x)) := by rw[hyzxy,hyzxy]
      _ = x := lemma_I_zx2 x y z
  have lemma_K_idem (y x : G) : (y ◇ x) ◇ x = y ◇ x := lemma_J_move_proj (y ◇ x) x _ (lemma_H_eq58_g x y x)
  have lemma_L_cube (x : G) : cube x = square x := lemma_K_idem x x
  have eq58 (x y z : G) : x ◇ (y ◇ (z ◇ x)) = x :=
    Eq.trans (congrArg (x ◇ ·) (lemma_K_idem y (z ◇ x)).symm) (lemma_I_zx2 x y z)
  have eq4 (x y : G) : x ◇ y = x :=
    calc
      x ◇ y = x ◇ (y ◇ ((x ◇ x) ◇ (x ◇ y))) := by rw [eq58 y (x ◇ x) x]
      _ = x ◇ (y ◇ (cube x ◇ (x ◇ y))) := by rw [lemma_L_cube x]
      _ = x ◇ (y ◇ cube x) := by rw [lemma_J_move_proj (cube x) (x ◇ y) x (lemma_G_projg x y x)]
      _ = x ◇ (y ◇ (x ◇ x)) := by rw [lemma_L_cube x]
      _ = x := eq58 x y x
  intro x y z
  rw [eq4]

Terence Tao (May 19 2025 at 00:35):

Thanks! I think it will be of interest to gather different proofs obtained by different methodologies, as this may be a useful point of comparison that may inform future autoformalization work.

Jason Rute (May 19 2025 at 02:41):

The term “autoformalization” is usually used to mean translating natural language to formal language (often that formal language is a theorem proving language like Lean, and the natural language is often a theorem statement, definition, or proof). Is this what we mean here, or are we talking about “automated theorem proving” (ATP), i.e. filling in (formal) proofs automatically. (ATP has, at least historically, meant formal proof since the field is quite old, and it’s only been in the last two years or so that it has even seemed conceivable for computers to construct fully correct natural language proofs.)

Jason Rute (May 19 2025 at 02:41):

I don’t mean to be pedantic. I just want to make sure I understand the goal here. I think this Zulip thread is more about ATP while the YouTube videos are about autoformalization. Of course, auto-formalization and ATP are closely related, and they are becoming more and more blended. To autoformalize a proof you usually need ATP to fill in the gaps, and many LLM approaches to ATP blend autoformalization and ATP by generating a natural language argument first and then translate it to formal language. Also code assistants like the ones you use in the video naturally do both (to various degrees of success).

Jason Rute (May 19 2025 at 02:41):

I am enjoying this discussion and I hope we keep it up, both with these problems as well as others! However, I also want to remark that the problems in this thread, as I understand them, are the sort of ATP problems that symbolic methods have been traditionally good at, first order reasoning without a lot of relevant background theory or big ideas. I’m looking forward to seeing other problems that arise naturally which symbolic solvers struggle with, and seeing what methods (if any) ultimately succeed.

Terence Tao (May 19 2025 at 02:47):

Well, as you say, this particular problem is already well within the range of existing ATPs, and we have multiple ATP-generated proofs at this point (from Vampire and Prover9). So it's more of a formal -> formal problem (going from an ATP proof to Lean), though it can be factored as formal -> informal -> formal (e.g., going through the "humanized prover9 proof" that @Michael Kinyon just supplied). Given the length of the proofs, any of these arrows would probably require at least some semi-automation, but I don't know which of these tasks would be classified as "autoformalization" in the accepted terminology.

One particular feature of these positive equational implications, which is very much not present for most math formalization tasks, is that they can be stated and proved without appeal to external libraries (other than minimal Mathlib tactic libraries). So the whole "lemma search" problem is eliminated for this particular type of task. This will of course change the playing field to favor some approaches and disadvantage others, but still I think it is a useful toy problem to start with, which can then partially inform the challenges involved for more realistic formalization problems.

[Some of the negative equational implications are significantly harder and required actual Mathlib input, e.g., the theory of the free group on countably many generators, or linear algebra over finite fields, but that's another story...]

Jason Rute (May 19 2025 at 03:38):

I neglected to think of the formal-to-formal side of autoformalization. Yes, I can see this problem fitting firmly in that niche.

Bruno Le Floch (May 19 2025 at 10:06):

Here is a human-understandable proof,
as a 3-page pdf file 650d.pdf and LaTeX file. In the process of writing it up I've revised some steps in the lean proof.

How I reached that proof

To get that proof I started from prover9 output, noticed some combinations x ◇ (y ◇ x) and x ◇ ((y ◇ x) ◇ x) showing up in many steps, and noticed some steps in the proof that were slightly more readable than others to serve as lemmas. Then I ran prover9 to prove each lemma one by one, checking that the proof was much shorter. Once I had this outline, I formalized the lemmas in Lean with many sorry, and filled them up by converting by hand the paramodulations suggested by vampire to Lean calc (once I learned about it—before that I was fighting with explicit congrArg and Eq.trans). Then I stared at the dependency graph of the lemmas to separate the proof into a few steps only. At lower level, I noticed some lemmas were not used as generally as they were stated, so for each one I considered whether the proof would be clearer with less generality. And some lemmas were using very explicit combinations of variables where really only one small property of it was needed, so I factored this out as a hypothesis.

Obviously, automated methods are much much faster than doing this by hand like I did (roughly 6h getting to a working Lean proof, 5½h cleaning it up, 2h making it human-readable).

Bas Spitters (May 19 2025 at 10:15):

@Terence Tao about formal-to-formal. Have you considered proof producing ATP? Fx for Vampire can emit proofs in Dedukti, which can likely be adapted to lean:
https://arxiv.org/abs/2503.15541

Michael Kinyon (May 19 2025 at 18:18):

@Bruno Le Floch Very nice work with the humanization! You did two things I usually do in my own work, but didn't do this time because of time constraints: spot useful lemmas and identify frequently occurring term functions. I didn't use lemmas in my runs, just simplification via Prover9's hints mechanism. I did notice repeating terms like x(yx) but decided to skip it since the proof was already short. So I just focused on converting paramodulations to "substitute then rewrite" and left it at that. (My final run was with back demodulation turned off so the proof was with paramodulations only.) Also, I completely agree that when Prover9 includes a clause in more generality than it ends up using, it's aesthetically better to humanize to a less general form (fewer variables). Anyway, great work!

Adam Kurkiewicz (May 24 2025 at 19:41):

I took @Bruno Le Floch's PDF file, and I was able to auto-formalise Lemma 1 using Claude 4 Opus without any meaningful human input.

Here is the approach I followed.

I used the following system prompt:

You are a professor of mathematics working at the Imperial College of London specialising in formalising mathematics in Lean 4. Help the user with their formalisation task.

I used the following initial prompt (I provided Bruno's blueprint, 650d_proof.pdf, as an attachment):

Can you finish the formalisation of the following proof in Lean 4?

650d_proof.pdf


``
class Magma (α : Type _) where
  op : α → α → α

infix:65 " ◇ " => Magma.op

abbrev Equation650 (G: Type _) [Magma G] := ∀ x y z : G, x = x ◇ (y ◇ ((z ◇ x) ◇ y))

abbrev Equation448 (G: Type _) [Magma G] := ∀ x y z : G, x = x ◇ (y ◇ (z ◇ (x ◇ z)))

abbrev f {G: Type _} [Magma G] (x y : G) := x ◇ (y ◇ x)

abbrev g {G: Type _} [Magma G] (x y : G) := x ◇ ((y ◇ x) ◇ x)

abbrev square {G: Type _} [Magma G] (x : G) := (x ◇ x)

abbrev cube {G: Type _} [Magma G] (x : G) := ((x ◇ x) ◇ x)

abbrev lcube {G: Type _} [Magma G] (x : G) := (x ◇ (x ◇ x))

theorem Equation650_implies_Equation448 (G: Type _) [Magma G] (h: Equation650 G) : Equation448 G := by

Claude Opus would then produce a formalisation, which I would run through lake build, and I would paste the errors (which could easily have been done by a script, I just did it manually to see the responses in real time). After several rounds of this, Claude 4 Opus produced the following Lean 4 formalisation, which correctly formalises Lemma 1:

theorem Equation650_implies_Equation448 (G: Type _) [Magma G] (h: Equation650 G) : Equation448 G := by
  -- First, let's establish Equation 4
  suffices eq4 : ∀ x y : G, x ◇ y = x by
    intro x y z
    simp only [eq4]

  -- The proof establishes several lemmas leading to Equation 4

  -- Lemma 1 (Easy identities)
  have h3 : ∀ x y z : G, x ◇ f y (z ◇ x) = x := fun x y z => (h x y z).symm

  have h4 : ∀ x z : G, x ◇ lcube (z ◇ x) = x := by
    intro x z
    have := h3 x (z ◇ x) z
    simp only [f, lcube] at this ⊢
    exact this

  have h5 : ∀ x y z w : G, f x (y ◇ w) ◇ f z w = f x (y ◇ w) := by
    intro x y z w
    have := h (f x (y ◇ w)) z w
    rw [h3 w x y] at this
    exact this.symm

  have h6 : ∀ x y z w : G, y ◇ w = y → f x y ◇ f z w = f x y := by
    intro x y z w hyw
    rw [← hyw]
    exact h5 x y z w

  have h7 : ∀ x y z : G, g x y ◇ f z x = g x y := by
    intro x y z
    -- Note: g x y = f x (y ◇ x) by definition
    simp only [g, f]
    exact h5 x y z x

  -- I'll complete the rest of the proof using sorry for now
  -- The key remaining steps are:
  -- 1. Prove h8 (idempotence of f under certain conditions)
  -- 2. Prove h9 (projection property for g)
  -- 3. Prove the various intermediate results leading to Equation 58
  -- 4. Finally derive Equation 4

  sorry

I unfortunately ran out of credit, so couldn't keep going with this experiment and see if throwing more money at the problem would have produced a correct autoformalisation of the entire blueprint provided by Bruno, but at least we know that the tool is capable of producing formalisation of Lemma 1.

Lydia Kondylidou (Jan 13 2026 at 14:23):

Marijn Heule, Jasmin Blanchette, and I have decided to take on the challenge to find a shorter proof of 650=>448 automatically. We implemented a tool that minimizes equational proofs. Using a combination of brute force and heuristics, and exploiting both Vampire and the Twee equational prover, it reduced the 66-step proof to 21 rewrite steps. Additionally, we were able to reduce the proofs of 1431 problems from the Equational Theories Project repository by about 45%.

Below a Lean reconstruction of our proof:

import Mathlib.Tactic.NthRewrite

class Magma (α : Type _) where
  op : α → α → α

infix:65 " ◇ " => Magma.op

abbrev Equation650 (G : Type _) [Magma G] :=
  ∀ x y z : G, x = x ◇ (y ◇ ((z ◇ x) ◇ y))

abbrev Equation448 (G : Type _) [Magma G] :=
  ∀ x y z : G, x = x ◇ (y ◇ (z ◇ (x ◇ z)))

theorem Equation650_implies_Equation448 (G : Type _) [Magma G]
    (op_law : Equation650 G) : Equation448 G :=
  have lemma1 (x y z w : G) :
      z ◇ ((x ◇ y) ◇ z) = (z ◇ ((x ◇ y) ◇ z)) ◇ (w ◇ (y ◇ w)) := by
    nth_rw 3 [op_law y z x]
    exact op_law (z ◇ ((x ◇ y) ◇ z)) w y

  have lemma2 (x y z w v u : G) :
      v ◇ ((x ◇ ((y ◇ z) ◇ x)) ◇ v) =
      (v ◇ ((x ◇ ((y ◇ z) ◇ x)) ◇ v)) ◇ (u ◇ ((w ◇ (z ◇ w)) ◇ u)) := by
    nth_rw 1 2 [lemma1 y z x w]
    exact lemma1 (x ◇ ((y ◇ z) ◇ x)) (w ◇ (z ◇ w)) v u

  have lemma3 (x y z w v : G) :
      w ◇ (y ◇ w) = (w ◇ (y ◇ w)) ◇ (v ◇ ((z ◇ ((x ◇ y) ◇ z)) ◇ v)) := by
    nth_rw 1 [lemma1 x y z w]
    exact op_law (w ◇ (y ◇ w)) v (z ◇ ((x ◇ y) ◇ z))

  have lemma4 (x y z w v : G) :
      v ◇ (x ◇ v) = (v ◇ (x ◇ v)) ◇ ((w ◇ (x ◇ w)) ◇ (y ◇ ((z ◇ x) ◇ y))) := by
    nth_rw 1 [lemma1 z x y w]
    exact lemma3 z x y v (w ◇ (x ◇ w))

  have lemma5 (x y z w : G) :
      w ◇ ((y ◇ ((z ◇ y) ◇ y)) ◇ w) =
      (w ◇ ((y ◇ ((z ◇ y) ◇ y)) ◇ w)) ◇ (x ◇ ((y ◇ ((z ◇ y) ◇ y)) ◇ x)) := by
    nth_rw 1 [lemma2 y z y ((z ◇ y) ◇ y) x x]
    exact lemma4 (y ◇ ((z ◇ y) ◇ y)) x ((z ◇ y) ◇ y) x w

  have lemma6 (x y z w : G) :
      (y ◇ ((z ◇ y) ◇ y)) ◇ w = ((y ◇ ((z ◇ y) ◇ y)) ◇ w) ◇ (x ◇ ((y ◇ ((z ◇ y) ◇ y)) ◇ x)) := by
    let T := y ◇ ((z ◇ y) ◇ y)
    calc
      T ◇ w = (T ◇ w) ◇ ((x ◇ (T ◇ x)) ◇ ((w ◇ (T ◇ w)) ◇ (x ◇ (T ◇ x)))) := by
        nth_rw 1 [←op_law]
      _ = (T ◇ w) ◇ ((x ◇ (T ◇ x)) ◇ (w ◇ (T ◇ w))) := by
        nth_rw 1 [←lemma5]
      _ = (T ◇ w) ◇ (x ◇ (T ◇ x)) := by
        nth_rw 1 [←lemma5]

  have lemma7 (x y z w: G) :
    (((x ◇ ((y ◇ x) ◇ x)) ◇ z) ◇ ((x ◇ ((y ◇ x) ◇ x)) ◇ w)) = ((x ◇ ((y ◇ x) ◇ x)) ◇ z) := by
    let T := x ◇ ((y ◇ x) ◇ x)
    calc
      (T ◇ z) ◇ (T ◇ w) =
      ((T ◇ z) ◇ ((T ◇ w) ◇ ((T ◇ (T ◇ w)) ◇ ((w ◇ (T ◇ w)) ◇ (T ◇ (T ◇ w)))))) := by
        nth_rw 1 [←op_law]
      _ = ((T ◇ z) ◇ ((T ◇ w) ◇ ((T ◇ (T ◇ w)) ◇ ((w ◇ (T ◇ w)) ◇
            (T ◇ ((T ◇ w) ◇ (w ◇ (T ◇ w)))))))) := by
        nth_rw 1 [←lemma6]
      _ = ((T ◇ z) ◇ ((T ◇ w) ◇ ((T ◇ (T ◇ w)) ◇ ((w ◇ (T ◇ w)) ◇ ((T ◇ ((T ◇ w) ◇ (w ◇ (T ◇ w)))) ◇
            (((T ◇ w) ◇ (w ◇ (T ◇ w))) ◇ (T ◇ ((T ◇ w) ◇ (w ◇ (T ◇ w)))))))))) := by
        nth_rw 2 [←lemma6]
      _ = ((T ◇ z) ◇ ((T ◇ w) ◇ ((T ◇ (T ◇ w)) ◇ (w ◇ (T ◇ w))))) := by
        nth_rw 1 [←op_law]
      _ = ((T ◇ z) ◇ ((T ◇ w) ◇ (T ◇ (T ◇ w)))) := by
        nth_rw 1 [←lemma6]
      _ = ((x ◇ ((y ◇ x) ◇ x)) ◇ z) := by
        nth_rw 1 [←lemma6]

  have lemma8 (x y z: G) :
      (x ◇ ((y ◇ ((z ◇ y) ◇ y)) ◇ x)) = x := by
    calc
      (x ◇ ((y ◇ ((z ◇ y) ◇ y)) ◇ x)) =
      (x ◇ (((y ◇ ((z ◇ y) ◇ y)) ◇ x) ◇ ((y ◇ ((z ◇ y) ◇ y)) ◇ x))) := by
          nth_rw 1 [lemma7]
      _ = (x ◇ (((y ◇ ((z ◇ y) ◇ y)) ◇ x) ◇ (((y ◇ ((z ◇ y) ◇ y)) ◇ x) ◇
            ((y ◇ ((z ◇ y) ◇ y)) ◇ x)))) := by
          nth_rw 2 [lemma7]
      _ = x := by
          nth_rw 1 [←op_law]

  show _ by
    intros x y z
    calc
      x = (x ◇ ((x ◇ ((y ◇ x) ◇ x)) ◇ x)) := by
          nth_rw 1 [lemma8]
      _ = ((x ◇ ((x ◇ ((y ◇ x) ◇ x)) ◇ x)) ◇ ((y ◇ (z ◇ (x ◇ z))) ◇
            ((x ◇ ((y ◇ x) ◇ x)) ◇ (y ◇ (z ◇ (x ◇ z)))))) := by
          nth_rw 1 [←lemma5]
      _ = (x ◇ ((y ◇ (z ◇ (x ◇ z))) ◇ ((x ◇ ((y ◇ x) ◇ x)) ◇ (y ◇ (z ◇ (x ◇ z)))))) := by
          nth_rw 1 [lemma8]
      _ = (x ◇ (y ◇ (z ◇ (x ◇ z)))) := by
          nth_rw 1 [lemma8]

Bruno Le Floch (Jan 14 2026 at 20:30):

@Lydia Kondylidou You mention reconstructing the proof in Lean. Is that step done by hand at present? Many of the lemmas take the form A=A◇B for some expressions A and B. Maybe some meta-patterns like this one could be detected and make the generated proof nicer and shorter? Currently my human proof upthread, file 650d.lean, is slightly shorter.

Lydia Kondylidou (Jan 14 2026 at 21:12):

The tool gives 2 outputs. The first one is a text file with this exact proof which can be easily manually translated to Lean. The second one is a Lean reconstruction of the proof using Leans automation (grind, duper).
I did look at your proof but I am not sure if it is shorter because if you replace the rw by nth_rw then in some terms you need more than one rewrites (which is then more than one single step if I see it correctly). But I will have a look again :)

Lydia Kondylidou (Jan 14 2026 at 21:12):

As for the meta-patterns, yes this sounds interesting. I will investigate!

Bruno Le Floch (Jan 14 2026 at 22:59):

From memory I wasn't particularly efficient in my use of rw so most of them only rewrite a single subterm. If you look at that lean proof, depending on your taste it might be useful to also look at the pdf version.

Another thing you could try is reduce the number of variables when a lemma is only used with some variables set to be equal. This makes the lemma less general but might make it more humanly readable. For instance your lemma2 is only used once, with first and third arguments equal and last two arguments equal. But this is unlikely to shorten the proof much.

Regardless, the proof lengths are more or less the same, which is clearly a win for automation given how much faster it is!

Lydia Kondylidou (Jan 15 2026 at 08:05):

@Bruno Le Floch thank you! This is a good idea, I will try this out. The first 5 lemmas are superposition steps directly from the initial Vampire proof. So mainly what we did is combine a superposition-based proof with Twee's equational proof. We are thinking about ways to rewrite the first 5 lemmas so that the whole proof becomes more humanly readable (or even smaller).

Jasmin Blanchette (Jan 15 2026 at 08:51):

Thank you, @Bruno Le Floch, for sharing your proof! It's very interesting. I am not worried too much about rw possibly rewriting more than one term, but I do notice that your proof carries out some of its argumentation using proof terms (e.g., the proof of lemma_C_ff). So it's hard to compare length (although overall your proof is probably shorter).

Bruno Le Floch (Jan 15 2026 at 21:22):

Staring at these proofs some more, I think the main readability improvement comes from having only relatively short expressions, in part thanks to introducing notation (f, g, cube), and in part because I had probably put a rather low bound on the maximum expression lengths when looking for a proof with Prover9. It might be useful to extend your tool to take expression length into account.

Last updated: Feb 28 2026 at 14:05 UTC