Zulip Chat Archive

Stream: lean4

Topic: Styles of reflective tactics

Timothy Mou (Mar 01 2024 at 05:16):

In order to learn how to write reflective tactics, I've been reading the source code of the abel and ring tactics. As I understand it, they both work by writing a function roughly like eval : (e : Expr) -> M (NF x Expr) that takes an Expr and returns a normal form of the Expr, as well as an Expr of a proof that e is equal to the expression that the normal form represents. If we have an equality e1 = e2, we can prove it by calling eval on both e1 and e2, and if their normal forms are equal, we can chain the two equality proofs with transitivity to prove e1 = e2, as shown here.

This can be contrasted with the style of reflection described in the book CPDT. The idea described here is that to prove an equality a1 = (a2 : A), we first define a type B that reifies expressions of type A, as well as a denotation function denote : B -> A and a normalization function norm : B -> B with the property denote b = denote (norm b). This allows us to write a "reflection theorem" forall (b1 b2 : A), denote (norm b1) = denote (norm b2) -> denote b1 = denote b2. We then write a tactic that will automatically perform the reification of an a : A into a b : B, with the property that denote b = a. If we can rewrite a1 = a2 into denote b1 = denote b2, then we can apply the reflection theorem and prove the goal by reflexivity.

I think there is a qualitative difference between these two approaches. In the CPDT style, we are able to prove a regular theorem that states that the normalization process is correct. Tactics are only used as a front-end for automating the reification process, which could in theory be done by hand. In the style of abel or ring, I don't see an analogue of a reflection theorem, because the normalization process is interspersed with the tactics and metaprogramming (for example here). I see the CPDT style as pushing the metaprogramming bits to the edges of the tactic and minimizing the amount of meta-code that needs to be written, which gives me more confidence in the tactic's correctness. This style of tactic can easily be written in Lean; for instance, I wrote this monoid solver that follows the same approach as described in CPDT, and I think it could easily be extended with more features like reifying variables. So why do these tactics such as ring and abel take this approach, and what are the other advantages/disadvantages that are relevant in deciding how to write a reflection tactic?

Kim Morrison (Mar 01 2024 at 05:29):

@Timothy Mou, I hope others will give you longer answers, but you might like to look at the leansat repository, which uses the classical reflection approach, but with the twist that we do the computation in native code (i.e. trusting the compiler).

Timothy Mou (Mar 01 2024 at 05:38):

I'll take a look at that. I suspected performance might be an important factor but don't know enough about what would affect it.

Henrik Böving (Mar 01 2024 at 09:13):

Timothy Mou said:

In order to learn how to write reflective tactics, I've been reading the source code of the abel and ring tactics. As I understand it, they both work by writing a function roughly like eval : (e : Expr) -> M (NF x Expr) that takes an Expr and returns a normal form of the Expr, as well as an Expr of a proof that e is equal to the expression that the normal form represents. If we have an equality e1 = e2, we can prove it by calling eval on both e1 and e2, and if their normal forms are equal, we can chain the two equality proofs with transitivity to prove e1 = e2, as shown here.

This can be contrasted with the style of reflection described in the book CPDT. The idea described here is that to prove an equality a1 = (a2 : A), we first define a type B that reifies expressions of type A, as well as a denotation function denote : B -> A and a normalization function norm : B -> B with the property denote b = denote (norm b). This allows us to write a "reflection theorem" forall (b1 b2 : A), denote (norm b1) = denote (norm b2) -> denote b1 = denote b2. We then write a tactic that will automatically perform the reification of an a : A into a b : B, with the property that denote b = a. If we can rewrite a1 = a2 into denote b1 = denote b2, then we can apply the reflection theorem and prove the goal by reflexivity.

I think there is a qualitative difference between these two approaches. In the CPDT style, we are able to prove a regular theorem that states that the normalization process is correct. Tactics are only used as a front-end for automating the reification process, which could in theory be done by hand. In the style of abel or ring, I don't see an analogue of a reflection theorem, because the normalization process is interspersed with the tactics and metaprogramming (for example here). I see the CPDT style as pushing the metaprogramming bits to the edges of the tactic and minimizing the amount of meta-code that needs to be written, which gives me more confidence in the tactic's correctness. This style of tactic can easily be written in Lean; for instance, I wrote this monoid solver that follows the same approach as described in CPDT, and I think it could easily be extended with more features like reifying variables. So why do these tactics such as ring and abel take this approach, and what are the other advantages/disadvantages that are relevant in deciding how to write a reflection tactic?

As Scott outlined the main reasons Abel and ring are implemented as directly proof producing are performance and TCB (trusted code base)

Regarding performance: The Abel and ring approach run in the meta code and produce proof terms that then get fed to the kernel, this means:

All of the proof computation is done either in the byte code VM or fully natively depending on how the tactic is run -> getting a proof is fast.
For the TCB you only need to trust the kernel like all other normal Lean proofs. Your concerns regarding correctness of the tactic don't matter much for the correctness of a proof as the kernel would spot it. That said they do of course matter for the initial UX. Your function might have scenarios that are left out or produce improper proofs that then get rejected by the kernel. However after enough usage one would expect all of these bugs to be found.

Naively copying the CPDT approach will give you the following results:

TCB is the same as before
Performance will degrade noticeably. This is because your proof machinery is now run using the kernel evaluator which is right now not very well optimized for fast evaluation of programs (in particular of programs with well founded instead of structural recursion)

This leads us to the approach that LeanSAT takes which is: CPDT but run the proof yielding computation itself with native code in the kernel.
This gives us:

good performance again
A TCB increase by the axiom ofReduceBool which essentially says that you trust the correctness of our compiler. Now is this a good idea. If you are very (like extremely) concerned about correctness then no. We know that the compiler has bugs so it is certainly not a correct program. That said: If you have a proof that your decision procedure is sound and you run it on a problem that people (and in the case of LeanSAT a SAT solver) believe are correct aaaaand your function also validates is correct using native code. Then it will be highly unlikely that there is a bug present after all.

Timothy Mou (Mar 01 2024 at 23:28):

I didn't mean correctness in terms of trusting the outputted proof, as of course everything is checked by the kernel. I meant it more in terms of writing and debugging the tactic. Maybe it's just my inexperience with Lean metaprogramming, but I would prefer to limit the scope of metaprogramming and write as much of the tactic in normal Lean as possible.

I'm also wondering if there is any difference in the size of the resulting proof terms, as this is something that is emphasized in CPDT. One of the benefits of writing a reflection theorem is that instead of the proof term containing all of the rewriting steps explicitly, it only contains one call to the theorem, which only needs to be checked once.

Could you explain a bit more how ofReduceBool would be used?

Mario Carneiro (Mar 02 2024 at 05:28):

@Timothy Mou The proof terms generated are definitely larger when you actually prove the theorem using lemmas instead of reflection, but IMO this is the wrong metric, because reflective proofs are "heavier" even though they don't have much text in them. In extreme cases you can create proofs whose asymptotic complexity grows as fast as pretty much any computable function you can think of, so in DTT just because the proof is short doesn't mean it's actually a "short proof". For example I could say that a proof of P != NP is "enumerate all proofs of size less than 2^2^1000 and take the first one that proves P != NP". This is an extremely short proof of P != NP if it actually is one, but determining whether it is correct is very expensive. It sort of challenges the premise of what a proof even is if verification of the proof is significantly more work than reading it.

Mario Carneiro (Mar 02 2024 at 05:38):

I believe that on the whole, proofs which are constructed from lemmas which no or only light computational aspect have the edge as far as overall asymptotic complexity is concerned, because they have the opportunity to encode decisions in the course of the proof which cannot easily be done in a proof by reflection because the kernel has to be deterministic (or at least, the one in DTT is). But you can still do nondeterministic proofs in the reflective style as long as you pass in the decisions from the outside as a "witness" argument, effectively turning the decision procedure into a certificate checker. The more relevant deciding factor is how well it performs inside and outside the kernel, and evaluating recursive functions is a real weakness of the current implementation, while instantiating lemmas is comparatively okay.

I also believe that the direct style proofs tend to be more portable, since they don't rely as much on the defeq decision procedure or whnf algorithm. This is not usually a major concern but it is something to think about. Whenever you have a proof that uses such decision procedures and you have to port it to a kernel that doesn't have it or for which it is not reliably evaluating in the same way, you have to fall back on the direct style anyway, while the converse is not an issue - since the proof says exactly how to apply the lemmas there is no question about what it is going to do or how long it will take to do it.

Timothy Mou (Mar 02 2024 at 06:34):

@Mario Carneiro Thank you for the detailed response. It does sound like performance is a significant issue when relying on the kernel to do evaluation. And it makes sense why proof size is not the most important metric.

I believe that on the whole, proofs which are constructed from lemmas which no or only light computational aspect have the edge as far as overall asymptotic complexity is concerned, because they have the opportunity to encode decisions in the course of the proof which cannot easily be done in a proof by reflection because the kernel has to be deterministic (or at least, the one in DTT is). But you can still do nondeterministic proofs in the reflective style as long as you pass in the decisions from the outside as a "witness" argument, effectively turning the decision procedure into a certificate checker. The more relevant deciding factor is how well it performs inside and outside the kernel, and evaluating recursive functions is a real weakness of the current implementation, while instantiating lemmas is comparatively okay.

I hope to clarify my understanding of this. As a concrete example, let's take this statement that is proven by abel:

lemma ex7' [AddCommMonoid α] (a b c d: α) : a + b + c + d + 0 = d + (c + b) + (0 + 0 + a) := by abel

#print ex7'

theorem ex7'.{u_1} : ∀ {α : Type u_1} [inst : AddCommMonoid α] (a b c d : α),
  a + b + c + d + 0 = d + (c + b) + (0 + 0 + a) :=
fun {α} [AddCommMonoid α] a b c d ↦
  (Abel.subst_into_add (a + b + c + d) 0 (Abel.term 1 a (Abel.term 1 b (Abel.term 1 c (Abel.term 1 d 0)))) 0
        (Abel.term 1 a (Abel.term 1 b (Abel.term 1 c (Abel.term 1 d 0))))
        (Abel.subst_into_add (a + b + c) d (Abel.term 1 a (Abel.term 1 b (Abel.term 1 c 0))) (Abel.term 1 d 0)
          (Abel.term 1 a (Abel.term 1 b (Abel.term 1 c (Abel.term 1 d 0))))
          (Abel.subst_into_add (a + b) c (Abel.term 1 a (Abel.term 1 b 0)) (Abel.term 1 c 0)
            (Abel.term 1 a (Abel.term 1 b (Abel.term 1 c 0)))
            (Abel.subst_into_add a b (Abel.term 1 a 0) (Abel.term 1 b 0) (Abel.term 1 a (Abel.term 1 b 0))
              (Abel.term_atom a) (Abel.term_atom b)
              (Abel.term_add_const 1 a 0 (Abel.term 1 b 0) (Abel.term 1 b 0) (zero_add (Abel.term 1 b 0))))
            (Abel.term_atom c)
            (Abel.term_add_const 1 a (Abel.term 1 b 0) (Abel.term 1 c 0) (Abel.term 1 b (Abel.term 1 c 0))
              (Abel.term_add_const 1 b 0 (Abel.term 1 c 0) (Abel.term 1 c 0) (zero_add (Abel.term 1 c 0)))))
          (Abel.term_atom d)
          (Abel.term_add_const 1 a (Abel.term 1 b (Abel.term 1 c 0)) (Abel.term 1 d 0)
            (Abel.term 1 b (Abel.term 1 c (Abel.term 1 d 0)))
            (Abel.term_add_const 1 b (Abel.term 1 c 0) (Abel.term 1 d 0) (Abel.term 1 c (Abel.term 1 d 0))
              (Abel.term_add_const 1 c 0 (Abel.term 1 d 0) (Abel.term 1 d 0) (zero_add (Abel.term 1 d 0))))))
        (Eq.refl 0) (add_zero (Abel.term 1 a (Abel.term 1 b (Abel.term 1 c (Abel.term 1 d 0)))))).trans
    (Abel.subst_into_add (d + (c + b)) (0 + 0 + a) (Abel.term 1 b (Abel.term 1 c (Abel.term 1 d 0))) (Abel.term 1 a 0)
        (Abel.term 1 a (Abel.term 1 b (Abel.term 1 c (Abel.term 1 d 0))))
        (Abel.subst_into_add d (c + b) (Abel.term 1 d 0) (Abel.term 1 b (Abel.term 1 c 0))
          (Abel.term 1 b (Abel.term 1 c (Abel.term 1 d 0))) (Abel.term_atom d)
          (Abel.subst_into_add c b (Abel.term 1 c 0) (Abel.term 1 b 0) (Abel.term 1 b (Abel.term 1 c 0))
            (Abel.term_atom c) (Abel.term_atom b)
            (Abel.const_add_term (Abel.term 1 c 0) 1 b 0 (Abel.term 1 c 0) (add_zero (Abel.term 1 c 0))))
          (Abel.const_add_term (Abel.term 1 d 0) 1 b (Abel.term 1 c 0) (Abel.term 1 c (Abel.term 1 d 0))
            (Abel.const_add_term (Abel.term 1 d 0) 1 c 0 (Abel.term 1 d 0) (add_zero (Abel.term 1 d 0)))))
        (Abel.subst_into_add (0 + 0) a 0 (Abel.term 1 a 0) (Abel.term 1 a 0)
          (Abel.subst_into_add 0 0 0 0 0 (Eq.refl 0) (Eq.refl 0) (zero_add 0)) (Abel.term_atom a)
          (zero_add (Abel.term 1 a 0)))
        (Abel.const_add_term (Abel.term 1 b (Abel.term 1 c (Abel.term 1 d 0))) 1 a 0
          (Abel.term 1 b (Abel.term 1 c (Abel.term 1 d 0)))
          (add_zero (Abel.term 1 b (Abel.term 1 c (Abel.term 1 d 0)))))).symm

The produced proof term uses lemmas to explain how to rewrite a + b + c + d + 0 into a normal form and then rewrite it back into d + (c + b) + (0 + 0 + a). But this proof term and each step of the rewriting process also has to be checked by the kernel, so how is this more efficient than calling a normalization function on both expressions, as would be done in a reflective proof? It seems in both cases, the kernel is used to check that the expressions are both equal to the same normal form.

Mario Carneiro (Mar 02 2024 at 06:36):

Side note, the proof is misleadingly large in the printout, because of subterm sharing. It's actually approximately linear in the statement

Mario Carneiro (Mar 02 2024 at 06:41):

In a case like this there is probably not that much gain because the algorithm is doing the naive thing of just normalizing both sides. But it could for example do a deeper search to split both sides into optimal chunks with the same monomials and recurse, generating an optimal proof in terms of number of associativity and commutativity applications. This would be much more computational work outside the kernel in exchange for less work in the kernel, which can be a benefit if (1) the kernel is much slower than your hardware (which is generally the case), or (2) the outside-the-kernel part is run less often than the in-the-kernel part because you cache kernel terms or version them etc. (or you make the user put the certificate in the file themselves, as we do with self-replacing tactics like exact? or polyrith). With the reflective approach there is no benefit at all in this tradeoff, because both sides are happening in the kernel.

Timothy Mou (Mar 03 2024 at 02:01):

So it sounds like in this case, if the tactic is just doing "naive normalization", then there isn't a significant performance benefit as compared to a reflective tactic, and no difference in the TCB. In addition, IMO a reflective tactic is easier to write and debug because it involves writing more pure lean code and there is better separation of concerns. From this it seems to me that a reflective tactic is better suited for writing a tactic like abel.

Mario Carneiro (Mar 03 2024 at 18:20):

You have to prove the tactic correct to write it as a reflective tactic; this is generally harder than writing a proof producing tactic. (It's also pure lean code either way, but there is less need to prove facts about Expr functions.) But as was mentioned before, the primary motivation for abel being written the way it is is that it is faster for the lean kernel as it exists today. Coq has a much faster kernel optimized for this kind of task and as a result reflective tactics are much more common there. This is very much a pragmatic engineering decision.

Mario Carneiro (Mar 03 2024 at 18:26):

It's been some time and things may have changed in the meantime, but I did actually write ring once as both a reflective tactic and a proof-producing tactic and, in addition to having a variety of limitations due to not being able to handle some of the optional extensions, the reflective tactic version was slower by 20-50% in tests. So for the most part I left it at that. The lean kernel has changed somewhat since then, but not in any really fundamental way that would change these numbers.

But one major change which does affect this is that the lean 4 kernel uses native bignum arithmetic for operations on Nat, and this is reflected in the norm_num tactic which is used by ring and abel in that it will not do proof producing Nat addition anymore, it uses reflective Nat.add embedded in a proof producing tactic for other stuff like addition on reals or Jacobi symbols or anything else (since these other functions don't have kernel acceleration in the same way).

Timothy Mou (Mar 03 2024 at 22:44):

Ok I'll take your word for it. I didn't know about the differences between and Coq and Lean kernel, and it seems that is the source of some of the differences.

𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 (Mar 04 2024 at 05:20):

@Timothy Mou FWIW, a long, long time ago I ran some preliminary benchmarks comparing ring2 (reflective) and ring (not reflective, like the current one) in Lean 3. I found that in some cases ring2 was actually faster, which suggested that maybe it's not a lost cause. But that was in Lean 3, and the benchmarking setup was a bit sketchy, so I wouldn't trust those old results now; however I would be pretty curious to see what would come of it were someone to run a direct comparison using Lean 4 and the current kernel (even if it just ends up quantitatively confirming Mario's intuition).

Joachim Breitner (Aug 25 2024 at 09:16):

Do we still have working lean4 versions of the two variants of ring to easily make that comparison these days?

Yuma Mizuno (Sep 11 2024 at 16:35):

I recently came across a similar comparison for monoidal category tactics:
#mathlib4 > Proof producing coherence tactic for monoidal categories

I would like to know if there's a good way to benchmark tactics for comparison.

Last updated: May 02 2025 at 03:31 UTC