Zulip Chat Archive

Stream: Equational

Topic: Alternative proofs of E1689⊢E2


Bruno Le Floch (May 09 2025 at 21:19):

At the end of section 1.1 of the paper we state that all known proofs of E1689⊢E2 are computer-assisted. That seems to be a tad to strong, given the following human-readable proof. For some lemmas I got the initial idea from the prover9 ATP, so it could still be considered computer-assisted, I am not sure.

Human-readable proof that x=(yx)((xz)z)x = (yx)((xz)z) implies the singleton law.

Terence Tao (May 09 2025 at 21:50):

Fair enough. Perhaps one can update the blueprint to contain this proof, and then update the paper text to say that while this implication was initially computer generated, we obtained a human readable proof during the course of the project?

Bruno Le Floch (May 10 2025 at 10:02):

I'll make a pull request later some time this weekend. Does the proof have to be formalized in Lean too?

Terence Tao (May 10 2025 at 13:00):

I think for now it suffices to have it in informal form in the blueprint. It might be a cute test of modern tooling (e.g., the recently released Canonical) as to whether they can autoformalize many of the steps in the proof, though given the sketch above; if there is interest I could set that as a formal project task.

Bruno Le Floch (May 10 2025 at 15:31):

Hm... The existing proof of E1689⊢E2 in the blueprint has a \leanok macro, which informs automatic tooling (which one?) that the proof has been formalized. The proof I'm adding has not been formalized, which suggests removing \leanok, but then the statement would show up as not formally proven I suppose. This seems bad, I suppose the only natural way out is to formalize that specific proof?

Terence Tao (May 10 2025 at 15:37):

One can add a second lemma ("Alternate proof of E1689⊢E2") with the same statement but a different proof and an independent \leanok tag. (A little clunky, but blueprint doesn't currently support multiple proofs attached to the same theorem.) There is the separate question of whether both proofs should have the @[equational_theories] tag. I assume that this tag can handle redundancies, but then again for the purpose of generating the implication graph it is only important that one of the proofs have this tag, and we were already aiming to have a minimal set of such generating implications, so I guess it would make sense to only have one proof get the tag.

Pietro Monticone (May 10 2025 at 17:14):

The automatic tooling in question is, rougly speaking, a combination of plastexdepgraph and leanblueprint. In its current implementation, this system records only a single proof per statement. This is overwritten by the last parsed proof that includes a \proves{...} macro and then leanblueprintdetermines the proof formalisation status based solely on that single associated proof.

Given this current behavior, a reasonable workaround is to include two proof environments referring to the same lemma as follows:

\begin{lemma}
\label{lem}
\leanok

% Here we state the lemma, which has been formalized in Lean.

\end{lemma}

\begin{proof}
\proves{lem}

% Here we write the alternative proof, which has not yet been formalized in Lean.

\end{proof}

\begin{proof}
\proves{lem}
\leanok

% Here we write the proof that has already been formalized in Lean.

\end{proof}

This approach preserves the correct formalization status, as long as the last proof referencing \proves{lem} includes the \leanok tag.

We can also just clarify in the text of the first proof that it hasn’t been formalised yet. The tool will still pick up the last proof with \leanok, so in the end the node will show up as dark green in the dependency graph correctly indicating that both the statement and at least one proof have been formalised in Lean.

Bruno Le Floch (May 10 2025 at 22:05):

It's good that there are (two) ways to deal with having one formalized proof and one unformalized proof of the same statement, thanks for the details.

However, the new proof is really basic, so I should be able to formalize the new proof in the next few days, and delete the old one altogether (it was basically a translated version of an ATP-generated proof, with zero insight). This avoids all problems with duplicate proofs and tooling.

Terence Tao (May 10 2025 at 22:09):

As an experiment, I decided to formalize the proof, working line by line from Bruno's sketch, and relying heavily on both Github Copilot and canonical. I managed to complete a proof in 48 minutes, attached below [and also uploaded to a separate repository here]. I recorded myself doing this, and will also post the video later. Bottom line - the automated tools work - sort of. They often did complete individual steps (with rather opaque proofs), which did speed things up, but at the cost of not understanding so much what was going on. I was basically blindly focusing on extremely atomic tasks of justifying each individual line in the proof without any conception of the big picture. But the end result is hardly elegant. (In particular, I don't think this is the proof that should go into the repository; it was definitely intended to be an experiment only. There are obvious places to golf this proof, but I think it is more illustrative to show the raw, semi-AI-generated, proof.)

UPDATE: unfortunately, the video is not really usable: it shows my audio, but not the screen share of the code being generated, which largely defeats its purpose :(

import Mathlib.Tactic
import Canonical

class Magma (α : Type _) where
  op : α  α  α

infix:65 " ◇ " => Magma.op

abbrev Equation1689 (M: Type _) [Magma M] :=  x y z : M, x = (y  x)  ((x  z)  z)

abbrev Equation2 (M: Type _) [Magma M] :=  x y : M, x = y

variable {M : Type _} [Magma M]

/-
Human-readable proof that $x = (yx)((xz)z)$ (equation 1689) implies the singleton law (equation 2). -/

/- We denote $$S_z(x) = (xz)z$$ ...-/

abbrev S (z x : M) := (x  z)  z

/- ... and $$f(x,y) = xS_y(x) = x((xy)y)$$.-/

abbrev f (x y : M) := x  (S y x)

lemma f_eq (x y : M) : f x y = x  ((x  y)  y) := rfl

/-  The main equation is $$x=(yx)S_z(x)$$. -/

lemma main_eq (h: Equation1689 M) (x y z : M) : x = (y  x)  S z x := by
  exact h x y z

/- For $$x=S_b(a)$$ and $$y\in M a$$ we have $$yx=a$$. -/
lemma claim (h: Equation1689 M) (a b : M) (y : M) (hy :  z, y = z  a) : y  S b a = a := by
  obtain z, rfl := hy
  exact Eq.rec (motive := fun a_1 t  a_1 = a) (Eq.refl a) (h a z b)


/- **Lemma 1:** For any $$a,b,c$$, one has $$S_b(a) = a f(b,c)$$. -/
lemma Lemma1 (h:Equation1689 M) (a b c : M) : S b a = a  (f b c) := by
-- *Proof:* For $$x=S_b(a)$$ and $$y\in M a$$ we have $$yx=a$$.
  set x := S b a
  have claim (y : M) (hy:  z, y = z  a) : y  x = a := by
    exact claim h a b y hy

/- Then apply the main equation to these values of $$x,y$$ [Note: take y to be an arbitrary element of Ma, e.g. a ◇ a], to get [for any z] S_b(a) = aS_z(S_b(a)) -/
  have h1 (z:M) : S b a = a  S z (S b a) := by
    set y := a  a
    have hy :  z, y = z  a := by
      use a
    change x = a  S z x
    have h2 :  w, a = w  x := by
      exact Exists.intro (b  a) (h a b b)
    obtain w, hw := h2
    convert main_eq h _ w _

/- Then set $$z=S_c(b)$$ and note that $$S_b(a)z = ((ab)b)((bc)c) = b$$...
-/
  set z := S c b
  have h3 : S b a  z = b := by
    exact Eq.rec (motive := fun a t  a = b) (Eq.refl b) (h b (a  b) c)
  calc
    S b a = a  (((S b a)  z)  z) := by
      exact h1 z
    _ = a  (b  z) := by
      exact congrArg (Magma.op a) (congrFun (congrArg Magma.op h3) z)
    _ = a  (f b c) := by
      exact rfl

/-  ... to simplify the right-hand side above and get, as announced,

S_b(a) = a((S_b(a)z)z) = a(bz) = a f(b,c). -/

/- **Lemma 2** For all $$a$$ there exists $$b,c,d$$ such that $$f(b,c)=S_d(a)$$.
-/
lemma Lemma2 (h:Equation1689 M) (a : M) :  b c d, f b c = S d a := by
/- *Proof:* By definition of $$f$$ one has $$f(b,c)=bS_c(b)$$.-/
  have h1 (b c : M) : f b c = b  S c b := by exact rfl

  set x := a
  set b := S x a
  use b
/-  Taking $$b=S_x(a)$$ for some $$x$$, -/

  have h2 (c : M) : b = a  S c b := by
    unfold b
    have h3 :  w, a = w  S x a := by exact Exists.intro (a  a) (h a a x)
    obtain w, hw := h3
    convert main_eq h _ w _

/- and rewriting $$b=aS_c(b)$$ using the first equation in the proof of Lemma 1 -/

/- we find f(b,c) = (aS_c(b))S_c(b)...
-/
  have h4 (c :M) : f b c = (a  S c b)  S c b := by
    rw [ h2]

/-
... which has the desired form for $$d=S_c(b)$$.  (Thus, the statement actually holds for all $$a,c$$.)
-/
  exact Exists.intro a (Exists.intro ((((a  a)  a)  a)  a) (h4 a))


/- **Lemma 3** For all $$a$$ there exists $$e$$ such that $$S_e(a) = a$$.
-/
lemma Lemma3 (h:Equation1689 M) (a : M) :  e, S e a = a := by
  set a_2 := a  a
  set a_3 := a_2  a

/- *Proof:* Left-multiply the equation in Lemma 1 by $$a^3=(a^2)a$$ to get (the first equality below comes from the main equation)

a = ((a^2)a)S_b(a) = a^3 (af(b,c)) .
-/
  have h1 ( b c : M ) : (a_2  a)  S b a = a_3  (a  f b c) := by
    congr
    exact Lemma1 h a b c

  have h1' (b c : M) : a = a_3  (a  f b c) := by
    exact Eq.rec (motive := fun a_1 t  a = a_1) (h a (a  a) b) (h1 b c)

/-
Take $$b,c,d$$ as in Lemma 2... -/
  obtain b, c, d, h2 := Lemma2 h a

/-  to rewrite $$af(b,c)=aS_d(a)=f(a,d)$$.-/
  have h3 : a  f b c = a  S d a := by
    exact congrArg (Magma.op a) h2

/- On the other hand, Lemma 1 with $$a=b$$ and $$c$$ replaced by $$d$$ implies $$a^3 = a f(a,d)$$ -/

  have h4: a_3 = a  f a d := by
    exact Lemma1 h a a d

/-  so overall we get $$a=(af(a,d))f(a,d)$$... -/
  have h5 : a = (a  f a d)  f a d := by
    rw [ h4]
    exact Eq.rec (motive := fun a_1 t  a = ((a  a)  a)  a_1) (h1' b c) h3

/-  ...which is as desired for $$e=f(a,d)$$. -/
  exact Exists.intro (a  ((a  d)  d)) (Eq.rec (motive := fun a_1 t  a_1 = a) (Eq.refl a) h5)


 /-
*End of the proof:* For any $$a,y$$, using the notation $$e$$ from Lemma 3, the main equation gives $$a=(ya)S_e(a)=(ya)a=S_a(y)$$.
-/
theorem singleton_law (h:Equation1689 M) : Equation2 M := by
  have h1 (a y :M ): a = S a y := by
     obtain e, he := Lemma3 h a
     calc
       _ = (y  a)  S e a := by exact h a y e
       _ = (y  a)  a := by
         exact congrArg (Magma.op (y  a)) he
       _ = S a y := by
        exact rfl
/-Inserting this back into the main equation gives $$(zy)a=y$$ for any $$a,y,z$$.-/
  have h2 (a y z : M) : (z  y)  a  = y := by
    symm
    have h3 :  w, a = S w y := by exact Exists.intro a (h1 a y)
    obtain w, hw := h3
    convert main_eq h _ _ w

/- Thus $$ab=((da)c)b=c$$ for any $$a,b,c,d$$...-/
  have h3 (a b c d : M) : a  b = c := by
    exact Eq.rec (motive := fun a_1 t  a_1  b = c) (h2 b c (a  a)) (h2 c a a)

  have h4 (a b c d : M) : a = d := by
    exact Eq.rec (motive := fun a_1 t  a_1 = d) (h3 (a  a) a d a) (h2 a a a)

  exact fun x y  h4 x y y y

Notification Bot (May 10 2025 at 22:47):

9 messages were moved here from #Equational > Writing the paper by Bruno Le Floch.

Bruno Le Floch (May 10 2025 at 23:04):

The video/screen capture would have been instructive. I suppose you split the proof into steps yourself and used an LLM to translate each step to Lean and write proofs?

Interestingly, for that particular theorem the egg tactic needs no guidance to prove it.

My strategy was instead to simplify the human proof before formalizing it. After simplification (and renaming functions S,fS,f to f,gf,g) several of the steps disappeared or were specialized to particular variables so the formalized proof I plan to write will be quite different from what you wrote.

proof I plan to add in the blueprint (in LaTeX syntax)

Terence Tao (May 11 2025 at 00:57):

Video is now up: https://www.youtube.com/watch?v=cyyR7j2ChCI

I basically broke up your proof into really atomic pieces of text, and for each text tried to find the closest Lean equivalent sentence, which I then tried to fill in with the various tools available. Both of these latter tasks could be accomplished by automated tools about 50% of the time, but the other half I actually had to stare at the proof (and sometimes fix some typos introduced previously) in order to make a reduction to an even simpler step, at which point usually the automated tools work.

Pietro Monticone (May 11 2025 at 01:04):

The video at https://www.youtube.com/watch?v=cyyR7j2ChCI is private.

Terence Tao (May 11 2025 at 01:06):

Hmm. I think I changed the flag now, does it work?

Pietro Monticone (May 11 2025 at 01:07):

Yes, it's fine now. Thanks.

Bruno Le Floch (May 12 2025 at 11:14):

I'm done with the proof (both human and formalized), see pull request #1206. Checks pass. Our processes on Github are sufficiently automated that my pull request cannot be marked as awaiting-review, because there is no corresponding issue. It does await review whenever someone has time.

(Given my lack of Lean experience, it took me a whopping 220min, four and a half times what @Terence Tao took. But my formalized proof is somewhat shorter.)

Pietro Monticone (May 12 2025 at 13:55):

Thank you for your PR! I’ll review it later today or tomorrow.

Pietro Monticone (May 12 2025 at 21:10):

Done.

Notification Bot (May 13 2025 at 11:36):

Bruno Le Floch has marked this topic as resolved.

Terence Tao (May 13 2025 at 15:53):

For sake of comparison, I was able to use Claude (after patching various local issues, either by hand or via canonical) to formalize the new informal proof in about 20 minutes. It helped that I could give Claude the previous formal proof (annotated with the informal proof) as a guide. Interestingly, o4-mini fared worse; it got started on the task but gave up after formalizing (either correctly or near-correctly) a couple of the steps. https://www.youtube.com/watch?v=zZr54G7ec7A The final proofs can be found here

[Again, I am not recommending that these LLM-assisted proofs become part of the formal ETP; I view them as purely experimental in nature.]

Notification Bot (May 14 2025 at 03:46):

Pauli Yang has marked this topic as unresolved.

Bruno Le Floch (May 14 2025 at 09:50):

I'm impressed by how close Claude got to a proof, and how helpful Github Copilot seems to have been when fixing the proof, by helping with basic "autocomplete" tasks. tl;dw: Main difficulties for Claude:

  • Terry had to symmetrize five equations or so in the last step of the proofs. This seems automatable.

  • The definition of power was missing the impossible y^0 case, which led Terry to define y^n with exponent shifted by one. The notation also failed to work due to conflict with preexisting use of ^. In the end, only cube and fifth powers are used in the proof so that Terry ended up wrapping them in some new function; I did the same in the formalized proof now in our code-base.

Interestingly, at some point of the video Terry blames Claude for stating a lemma t_cubed_eq_t_t5 incorrectly as (square t) = t ◇ (fourth t), but what happened is that it had written t^3 = t ◇ t^5, then Terry shifted powers by one and changed that to t^2 = t ◇ t^4 manually, and later on changed that to square and fourth.

Terence Tao (May 14 2025 at 20:30):

An alternate formalization of Bruno's first proof, in the proof assistant Acorn rather than Lean (which is focused on automating the derivation of each line from the next, rather than relying on tactics). https://www.youtube.com/watch?v=3OJjsH_jKYs

Shreyas Srinivas (May 14 2025 at 21:17):

I am not super excited about acorn. Their first introductory post on r/math left many underwhelmed about their claims. Maybe the situation has changed since then (3-4 months)

Shreyas Srinivas (May 14 2025 at 21:17):

It seems to be more like Dafny

Anthony Peterson (May 15 2025 at 14:56):

After watching Terry's video I tried using Copilot for lean and for basic exploratory stuff it made Lean way easier to work with. It did suggest some obsoleted functions like List.group (instead of splitBy) but it wasn't too hard to switch to the correct function.

Terence Tao (May 16 2025 at 19:42):

I decided to pose our hardest positive implication - 650 => 488 - as a challenge for autoformalizers over at #Machine Learning for Theorem Proving > A (semi)-autoformalization challenge: 650=>448.

Bruno Le Floch (May 16 2025 at 21:09):

The real challenge is to produce a human-readable proof… :wink:

Kevin Buzzard (May 16 2025 at 21:16):

Surely the real challenge is to produce a human-understandable proof!

Terence Tao (May 16 2025 at 21:25):

There is now a proof that is basically 65 lines of by grind (plus one remaining sorry). Would that count as human-readable?

Bruno Le Floch (May 16 2025 at 21:27):

Note that actually all these equations are equivalent to equation 4 (left projection). The proof of 650→4 and of 650→448 by prover9 are essentially identical. I didn't check the vampire proofs. Is there a reason we aimed for a proof of 448 instead of simply equation 4?

Terence Tao (May 16 2025 at 21:29):

@Vlad Tsyrklevich reported 650=>448 in #Equational > What are the hardest positive implications for an ATP? @ 💬 as the Vampire-hardest implication, but this was in a reduced set, so perhaps 650->4 was already stripped out due to transitivity.

Bruno Le Floch (May 17 2025 at 05:59):

I have an outline of a human-readable proof (found with computer help, as for E1689⊢E2), which I hope to formalize during the weekend. Should I post the result (LaTeX + Lean) in this thread, or on the "semi-autoformalization challenge" thread?

Terence Tao (May 17 2025 at 14:09):

It might make sense to use the challenge thread as the primary thread to centralize the discussion.


Last updated: Dec 20 2025 at 21:32 UTC