Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Owlgebra - automated proof discovery

josojo (Apr 05 2025 at 11:30):

Dear Lean Community!

I wanna present a new tool for automated proof discovery!
image.png

It brings together several powerful techniques inspired by other projects:

It generates natural-language proofs for theorems and uses them to derive helper-lemmas. Then it tries to prove these lemmas—similar to AlphaProof, though currently in a simpler form—, before trying to find the final proof.
It uses moogle results to enrich LLM queries. (Updated to lean Search)
It simulates LLM answers with a REPL and provides errors back to the LLM, allowing for iterative refinement.
It employs few-shot examples in the prompts for better performance.
It works with any LLM, but currently gemini 2.5 works best.
Solution finding techniques are configurable via the interface.

If a proof has been found by this framework, it is displayed on the webpage:
image.png

The webpage for getting a first impression is here, —and yes, you can try it out right away!

For the full experience (without long waiting times), fork the repositories, add your own LLM API keys, and unleash the beast!
backend
frontend

I hope this project offers a glimpse into what the future of proof automation could look like, and I’m excited to see what proofs it can find for you. If you do get great results, post screenshots in the thread! And if things don’t work smoothly, well we will keep on improving it over time.

Running all this infrastructure does come with costs, so if you’d like to support the efforts, please send me a direct message.
For sure, I am also happy to receive any PRs to the project. There is a lot to improve!

Happy proving!

PS: (If you talk about it on X, please tag me https://x.com/o_herminator)

Bolton Bailey (Apr 05 2025 at 12:48):

Thanks for providing a web interface!

If I want to run this a bunch locally and I don't want to pay for an API key, is it possible to use a local LLM? (via Ollama for example?)

Bolton Bailey (Apr 05 2025 at 12:51):

Also: I asked it a question and now it says "finished", but not "proven" does this mean it didn't find a proof?

josojo (Apr 05 2025 at 12:58):

Here are some of the supported API interfaces:
https://github.com/josojo/smarthammer/tree/main/hammer/api

If you wanna talk to a local LLM and you host it behind an local API like localhost/api, it should be very easy to add another adopter. You only need to inherit the https://github.com/josojo/smarthammer/blob/main/hammer/api/base_client.py.

It should be very easy, however, I do not know the exact constraints of your local deployment. If you give me more information what you are running, I could better help with implementing it.

josojo (Apr 05 2025 at 13:01):

Bolton Bailey said:

Also: I asked it a question and now it says "finished", but not "proven" does this mean it didn't find a proof?

Yes! You can click on the logs on the webpage and investigate what the issue was.
image.png
If this was your request, one sees that the initial parsing was not successful. Feel free to DM me the theorem that you wanted to proof and I can debug why the initial parsing of the lean code was not successful.

Jason Rute (Apr 05 2025 at 13:37):

That is quite a tech stack you have there! I really appreciate the web interface for anyone to try it, but it is hard to know if the massive tech stack is worth it. I'd love for you to provide a nice, diverse set of examples, if not a whole benchmark (which might be impractically expensive).

Jason Rute (Apr 05 2025 at 13:38):

I tried this standard example of mine, but I think it is having trouble parsing the arguments:

import Mathlib

theorem list.min_le_max {α : Type _} [LinearOrder α] [DecidableLT α](l : List α) (nonempty : l ≠ []) : l.minimum ≤ l.maximum := by

Jason Rute (Apr 05 2025 at 13:43):

(I edited the example to make it a LinearOrder.)

Jason Rute (Apr 05 2025 at 13:46):

Another example of mine (which I called foo) is running. You can find more community examples of tricky problems here: Machine Learning for Theorem Proving>Simple but hard puzzler for LLMs

Jason Rute (Apr 05 2025 at 13:50):

For my foo example (https://owlgebra.vercel.app/TasksPage?taskId=foo-1743859759-a1ba6972-5), I think it failed to find a proof, right?

josojo (Apr 05 2025 at 14:13):

The huge advantage of the web ui is that one can easily parse and navigate through the long dialogues with the LLM (click on the links below the keyword "Logs"). This helps a lot with debugging. So while its the web UI is nice for everyone, it is also very helpful for us developers.

Yes, I am really looking forward to run the benchmark. There are still some things that I wanna fix, but I have already a benchmark script in the repo.

josojo (Apr 05 2025 at 14:15):

Jason Rute said:

For my foo example (https://owlgebra.vercel.app/TasksPage?taskId=foo-1743859759-a1ba6972-5), I think it failed to find a proof, right?

Thanks so much for trying all of them. For this particular one, it found already an additional helpful lemma and proved it successfully. But in the final theorem proof step, there were unexpected imports and hence it failed. I have already an idea on how to improve the machine here...

Jason Rute (Apr 05 2025 at 14:25):

Also Moogle.ai is a dangerous tool to rely on. It is really out-of-date.

Kim Morrison (Apr 07 2025 at 00:33):

Integrating into VSCode will give much better adoption relative to a webpage, I would think!

Kim Morrison (Apr 07 2025 at 00:34):

Jason Rute said:

Also Moogle.ai is a dangerous tool to rely on. It is really out-of-date.

You should be using leansearch.net instead.

Jason Rute (Apr 07 2025 at 00:35):

The issue with leansearch.net is that it is down a lot, but that isn’t as large of an issue.

josojo (Apr 07 2025 at 19:35):

Alright, I updated the code to use lean Search. Right now it seems stable enough. Let's see.

I also deployed some other fixes and now, it finds some solutions for theorem where is was not successful before. E.g.
image.png

But yeah, currently it mostly succeeds only for simpler theorems. The foo example is still too hard :(

I am soo looking forward to really run it more extensively with more retries. Just some more fixes :)

josojo (Apr 07 2025 at 19:37):

Kim Morrison said:

Integrating into VSCode will give much better adoption relative to a webpage, I would think!

yeah, maybe you are right. However, I envision many retries of the agent, and this will take some time... and personally, I don't like it if my IDE is blocked for extended times.

But the nice part is that only the backend logic is "hard" to code. Hence, one should be able to change it for a nice IDE integration "easily"

Kim Morrison (Apr 08 2025 at 05:00):

Yes, but the basic point is that no one is willing to change their workflow, to leave VSCode even for an instant. :-)

Alex Meiburg (Apr 08 2025 at 13:54):

I use moogle.ai and https://loogle.lean-lang.org all the time! Alt-tabbing back and forth is a fine part of my workflow. :)

Alex Meiburg (Apr 08 2025 at 13:54):

(Yes, I know there's a VS code extension for loogle, but I much prefer it in the browser)

Last updated: May 02 2025 at 03:31 UTC