Zulip Chat Archive

The prominent use of automated theorem provers (ATPs) in this project has left me uncertain about the precise nature of how ATPs will be used in Lean in the future. With this topic I would like to collect ideas on how people think we should be using ATPs from Lean. I think there are two questions which need to be considered:

How do we invoke the ATPs?
How do we persists the results of ATPs?

(If some of these points have already been discussed in other topics, I'd be happy to collect links to them here.)

Question 1 may sound like a technicality, but I think it's quite important when considering what reasoning in Lean should look like in the future. Right now, I can think of three ways to invoke an ATP for use with Lean:

by using a Lean tactic which calls an ATP implementation written in Lean (like simp or ring).
by using a Lean tactic which calls an existing (external) ATP and reconstructs a proof term from the ATP's result (like bv_decide or egg).
by using an external script which calls an existing ATP and generates a .lean file which captures the ATP's result.

In the context of this project, Approach 3 seems to have been the most prominent. There have been various efforts, like using Vampire, Z3, and Egg, which constructed a large number of proofs by generating .lean files. While this approach may require the least amount of effort for implementation, it's arguably also the least user-friendly. I'm guessing your everyday (curious) mathematician playing with Lean would much rather call a proof tactic than some external script to run proof automation. Accordingly, there has also been an attempt at using Approach 2 by means of the egg tactic, which calls the Egg engine (implemented in Rust) in the background. Unfortunately, this approach necessitates the user to have a working installation of whatever software is required to call the external ATP. In the context of this project, this meant that some people were not able to build the project as long as it had egg as a dependency - the dependency was thus removed. Approach 1 was not used (as far as I am aware) as implementing an ATP in Lean has an extremely high initial implementation cost, as well as a high maintenance cost, as external ATPs continue to improve making it very burdensome to keep up with the state of the art. This circumstance is also reflected in the fact that Lean's very own bv_decide tactic depends on the external SAT solver CaDiCaL - that is, uses Approach 2. Does that mean that the bv_decide tactic faces the same issue as the egg tactic, as it requires an external dependency? No, but only because CaDiCaL now ships with Lean. As it is obviously not possible to ship every possible ATP with Lean, Approach 2 is currently more or less infeasible. This is unfortunate as (IMHO) Approach 2 hits the sweet spot between user-friendliness and implementation effort.

One possible way out of this issue may be by addressing Question 2. I can think of three answers to this question:

A. Don't persist results - that is, rerun the tactic on every recompilation (as typical with simp or ring).
B. Persist results by converting them to existing Lean tactic calls like rw, simp, and explicit proof terms with exact.
C. Persist results by storing them in some other format, e.g. in an external file, which a given tactic can turn into a Lean proof.

In the context of this project, Approach B seems to have been the most common. The upside of Approach B is that it makes ATP usage ephemeral. For example, Approach 2 can thus be realized by introducing "Try this" suggestions which replace the ATP tactic call with calls to existing Lean tactics. A first downside of Approach B is that it exposes the reasoning steps of the ATP (which we might not care about) and thus introduces noise into the proof script. For example, it's probably much clearer to have a proof script contain simp [...], instead of explicitly stating all of the steps which simp took to prove a goal. A second downside of Approach B is that ATPs can produce very long proofs. For example, the egg tactic sometimes produces proofs containing hundreds of rewrites which is hardly desirable for a Lean proof script. Interestingly, ATPs may still produce these long proofs very quickly. Thus I believe Approach A could be feasible (that is, not slow down compilation too much) in many cases if only it weren't for the problems of having external dependencies as outlined in the previous section. Due to these downsides of Approaches A and B, I think Approach C may currently be the most promising. There are of course still downsides to this approach. Namely, if we use Approach C with a tactic which depends on an external ATP, then we need to split that tactic into two distinct packages: one for proof generation (package G) and one for proof reconstruction (package R). Package R only handles proof reconstruction from results produced by tactics from G and does not require the external dependency. Therefore it is unproblematic as a dependency of other packages and can be kept in the proof script. Package G on the other hand should only be installed locally by the user trying to generate new proofs, as this will depend on an external ATP. The proof tactic of package G should then always be ephemeral by providing a "Try this" which delegates to the proof reconstruction of package R with the ATP's result being stored somewhere else (e.g. in a separate file). This two part approach may also be simplified by not introducing a proof reconstruction package (R) for each new tactic, but instead trying to find an intermediate proof format which works for many kinds of proofs and having a more general proof reconstruction package which can be used for mutliple tactics. (The obvious format would be terms of Lean.Expr, but those terms might get very large.)

In conclusion, my take away is that Approach 2 + Approach C are the most promising, and it might be worth investigating proof persistence techniques. But I'm interested to hear if there are any other ideas floating around.

Shreyas Srinivas (Oct 07 2024 at 13:33):

How about deeply embedding the ATPs internal representation in lean instead of translating Lean Exprs. We discussed this on day 1 or 2, but I think we lose a lot of the ability to exploit meta theorems when we don't have the correct syntactic representation of a domain within lean.

Shreyas Srinivas (Oct 07 2024 at 13:34):

We also lose the ability to use this structure for tooling within lean.

Henrik Böving (Oct 07 2024 at 14:26):

This circumstance is also reflected in the fact that Lean's very own bv_decide tactic depends on the external SAT solver CaDiCaL - that is, uses Approach 2.

Note that bv_decide implements a lot of things besides calling the SAT solver. It is in essence an attempt to reimplement the currently leading SMT solver for bit vector theory, Bitwuzla, in Lean. There is the bitblasting, the rewriting engine etc. involved, all of which are not part of the external solver. So doing this in Lean is feasible even if hard.

No, but only because CaDiCaL now ships with Lean. As it is obviously not possible to ship every possible ATP with Lean

Oh but it totally is :wink: Isabelle ships at least half a dozen ATPs in its installation for example.

B. Persist results by converting them to existing Lean tactic calls like rw, simp, and explicit proof terms with exact.

This is actually impossible if your proof relies on something like ofReduceBool which is the case for bv_decide.

C. Persist results by storing them in some other format, e.g. in an external file, which a given tactic can turn into a Lean proof.

This is the current approach of bv_decide. In general systems that rely on verifying external proof certificates with native code probably need to do this to cache results.

Daniel Weber (Oct 07 2024 at 14:31):

Regarding Approach B, have you seen the format used for the Vampire-produced proofs (https://github.com/teorth/equational_theories/blob/main/equational_theories/Generated/VampireProven/Proofs1.lean, for example)? I wrote a term elaborator (superpose) which applies the reasoning steps of Vampire, and then most proofs could be represented reasonably concisely.

Regarding Approach 3, I think the main reason that it was so prominent in this project is that it's very fast-paced, which doesn't necessarily lead to user-friendly development.

By the way, do you know of polyrith? It's a tactic that invokes SAGE via a web API and then replaces itself by a linear_combination application, this might be a cheaper way to do Approach 2

Andrés Goens (Oct 07 2024 at 14:31):

As additional datapoints, think of aesop and simp?:

Most uses of aesop are ephemeral, and it works by persisting with converting to existing lean tactics (approach B), even though it falls under category 1 (it is written in Lean). So I think the two questions are a bit orthogonal.
When you call simp? it doesn't give you a proof term or a set of rws, it gives you a simp only set, which also hits a sweet-spot between being simple to compute (minimal) but still a small (readable) proof script and somewhat resilient to changes.

I could see a world where we combine approaches 1. and 2., and for example calls to egg? are encoded with something like calc + cc (and some pruning), or even a lean-only implementation of egg which might be less performant but good enough to rerun the egg only call.

Joachim Breitner (Oct 07 2024 at 14:48):

Also regarding approach B and egg: The calcify tool can be used to turn the egg proofs into (relatively) pretty equational proofs.

Andrés Goens (Oct 07 2024 at 15:06):

Joachim Breitner said:

Also regarding approach B and egg: The calcify tool can be used to turn the egg proofs into (relatively) pretty equational proofs.

@Marcus Rossel also built this interaction into an ergonomical egg? tactic :octopus:

Terence Tao (Oct 07 2024 at 15:20):

Just wanted to say that this is an excellent discussion and we should certainly talk about this issue in the paper; I added a brief mention of it to the paper plan (in the "automated theorem proving" section).

One can imagine a table in the paper giving some real-world examples in Lean (or in other proof asssistant languages) of approaches 1, 2, 3, drawing on many of the examples listed above.

Joachim Breitner (Oct 07 2024 at 15:48):

Andrés Goens said:

Joachim Breitner said:

Also regarding approach B and egg: The calcify tool can be used to turn the egg proofs into (relatively) pretty equational proofs.

Marcus Rossel also built this interaction into an ergonomical egg? tactic :octopus:

I didn't know! Now I have to make sure I don't break the code too easily.

Michael Bucko (Oct 14 2024 at 22:32):

I like the combination of Approach 2 (external ATPs, lower maintenance cost; more people could use it and create libraries), and C (own format, like we have in ML; we’d be able to decouple proof generation from proof checking as well as avoid cluttering the proof scripts).

Approach 2 reminds me of JS/TS/Python flexibility. C is commonly used in ML (involves serialization, artifact storage, versioning, and deployment).

I would also consider the following (perhaps some of them are not possible at the moment):

Lineage (also common in ML; being able to follow the full history),
ATPs as APIs (can we get to real-time insights?),
Agents (Perhaps a Cursor plugin? Or an additional functionality in vscode-lean4?)
New UX (Via shortcuts - „quick complete“, „full complete“, etc.),
Generative (Like CTRL+K in Cursor when the file is empty),
Proof as JSON (That would enable us to potentially integrate certain logic in software), re-using the open formats (or creating a new format); using plugins / modules,
Background bots for curriculum learning (for instance, higher-order tactic dev),
One storage for all math (global, open etc.; like github+HuggingFace, but for math),
Using Lean’s FFI to write a couple of community building programs that profit from Lean (one idea that comes to mind is (eventually) 3k+1-related flow control logic for hardware and software).

Last updated: May 02 2025 at 03:31 UTC

leanprover-community / mathlib