Zulip Chat Archive

Stream: Lean Together 2025

Topic: Abdalrhman Mohamed: Lean-SMT

Jason Rute (Jan 16 2025 at 19:32):

Discussion thread.

Jason Rute (Jan 16 2025 at 19:33):

@Abdalrhman M Mohamed I missed a lot of the talk. I'll watch the videos or see the slides later, but when you say there aren't any Lean benchmarks, what do you need in a Lean benchmark that the AI benchmarks don't have? Do you need the premises to be explicitly given?

Jason Rute (Jan 16 2025 at 19:34):

By AI benchmarks, I mean things like MiniF2F, ProofNet, PutnamBench, and others.

Abdalrhman M Mohamed (Jan 16 2025 at 19:49):

Yes, exactly. We need the necessary axioms for any theory not natively supported by SMT solvers. Additionally, we want to compare against ATP tactics, which are currently limited in Lean (I can only think of duper). Therefore, we want to compare with tactics from other theorem provers, such as Isabelle's Sledgehammer and SMTCoq. For this, the problems should either be in a standard format (e.g., SMT-LIB v2) or translated into the logic of other provers.

Jason Rute (Jan 16 2025 at 19:49):

Also, there are already premise selectors like Lean Copilot. I'm not saying they are the best, but you could try to hook them up so as to have a list of premises for each theorem and test that. I once asked about this for duper (to implement MagnusHammer in Lean) but I was told it wouldn't work as duper is too sensitive to getting too many premises. I think I should have actually asked about auto. (I still get confused between lean-smt, auto, duper, canonical and the like.) Anyway, I think one just needs to make a benchmark using Lean Copilot lemma suggestions and these symbolic tools to get a sense if the magnushammer approach has merit in Lean.

Abdalrhman M Mohamed (Jan 16 2025 at 19:53):

Here are the slides: lean-SMT.pdf

Jason Rute (Jan 16 2025 at 20:03):

What do you think lean-SMT is useful for? It seems like it doesn't currently work as a black box proof automation tool, since the user has to supply a lot of theorems? Do you just imagine some very specific cases, maybe industrial ones? Or do you imagine something more general? What is an application this would really shine at in Lean?

Jason Rute (Jan 16 2025 at 20:07):

Also, I know there was a question at the end about what you need to get it to be more like the Isabelle Sledge Hammer (from "walking" to "running"), but I didn't catch the details. Could you say that again here?

Abdalrhman M Mohamed (Jan 16 2025 at 20:09):

(deleted)

Vlad Tsyrklevich (Jan 16 2025 at 20:25):

I mean, duper and egg are kind of in the same box regarding requiring premise-selection. I know that egg has some idea of having 'egg baskets', but it seems like a band-aid on top of the general need to have a good automated premise selection algo

Abdalrhman M Mohamed (Jan 16 2025 at 21:18):

It seems there might be some confusion about the role of each Lean tactic. Let me clarify and expand on this (according to my understanding):

First question: What do we need for a hammer tactic in Lean?
We need three components:

A component that scans the current Lean environment and selects the relevant set of available lemmas.
A component that translates the current Lean goal into an ATP's logic (HOL/FOL).
A component that uses the relevant lemmas to construct a proof for the Lean goal. This can be further divided into two sub-components:
- An oracle that generates additional hints to help the next step (e.g., quantifier instantiations, arithmetic hints, etc.).
- A component that tries to reconstruct the oracle's reasoning steps as a Lean proof using the relevent lemmas and hints.

Second question: What do we currently have in Lean?
Currently, we have the following:

For the first component, we have AI-based premise selectors based on what you mentioned earlier.
For the second component, we have lean-auto. Note: while lean-auto has an additional functionality to invoke an oracle (TPTP/SMT solver) on a goal, it only relays the results to the user and doesn't generate the proof needed to prove the goal.
For the third component, we have an abundance of TPTP/SMT solvers that can serve as oracles to provide hints. We also have tactics like aesop and duper to do the proof search and construct proofs for the oracle hints and the goal.

For the third component, we now also have the smt tactic. This tactic is tailored towards SMT solvers that generate detailed proofs of their results (like cvc5 and veriT), and has specialized tools to efficiently construct a Lean proof from a cvc5 proof (which is guaranteed to succeed). It is also more cautious about translating Lean goals into SMT-LIB to ensure that the reconstructed proof can immediately close the goal (without needing to do more search). Note, however, that this also means that the reconstruction step of this tactic is too specialized to be useful for a magnushammer-like tactic.

Third question: What is needed to have a Lean version of Sledgehammer?
Mainly, we need a tool that combines all three components. We also need more tools for the first component (including non-ML-based ones) and the third component. @Josh Clune and @Yicheng Qian have more knowledge about efforts in this direction.

Jason Rute (Jan 16 2025 at 21:23):

Ok, so this is only the reconstruction part? But it reconstructs the proof instead of just sending the found premises to some powerful Lean tactic? Is that right?

Abdalrhman M Mohamed (Jan 16 2025 at 21:38):

Yes. It can either be used to directly solve problems already within the SMT solver's logic or as a backend for a sledgehammer tactic. Right now, it can be used as a backend for auto.

Yicheng Qian (Jan 16 2025 at 22:03):

Speaking of premises selection for Lean 4, currently several students at CMU are working on it. I think Jeremy Avigad is leading the project, but I could be wrong.

Pietro Monticone (Jan 17 2025 at 00:25):

:video_camera: Video recording: https://youtu.be/T9fhMJmJRAw

Last updated: May 02 2025 at 03:31 UTC