Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Aleph Prover by Logical Intelligence


Andy Jiang (Dec 02 2025 at 04:06):

I saw some news of "Aleph Prover" getting 500/660 on putnam bench. Are any of the authors here to answer questions? congrats on the result.

Ayush Agrawal (Dec 02 2025 at 05:35):

They have mentioned that they use non-autoregressive energy-based models. That's it.
https://x.com/logic_int/status/1968044980791841193?s=20

Eric Wieser (Dec 02 2025 at 07:15):

Perhaps @Vladislav Isenbaev / @Vladislav Isenbaev, judging by https://www.upstartsmedia.com/p/math-ai-startups-push-new-models?

Andy Jiang (Dec 02 2025 at 07:29):

Ayush Agrawal said:

They have mentioned that they use non-autoregressive energy-based models. That's it.
https://x.com/logic_int/status/1968044980791841193?s=20

energy-based?? like deep boltzman machines or what??? people don't call diffusion models energy-based right? This description honestly raises much more questions to me than it answers. if it's really a completely new architecture style, wouldn't that be MUCH bigger news?

Gerben Koopman (Dec 02 2025 at 07:51):

Why the non-autoregressive claim? It seems a strange claim, but I'm happy to be educated about wrong assumptions. Language is autoregressive by its very nature (unless you consider the aliens from Arrival :smile:). That would imply to me that there will always be an autoregressive future for AI. This doesn't mean that other models aren't possible for other things (think BERT!), but definitely a strange claim to my mind.

Nikolas Kuhn (Dec 02 2025 at 09:03):

Unless there's some more information I'd assume this is some version of diffusion @Andy Jiang (there's certainly such a thing as an "energy-based view" on diffusion models which you're probably aware of, e.g. as explained here).

I'm not an expert, but the idea that something more like diffusion could be better for proofs has been around for a while and I think makes intuitive sense: In a proof, you have a well-defined start and an end, and you want to fill out the middle, ideally refining/updating the pieces that don't fit well. So you could hope that diffusion based architectures capture this better than autoregressive models, where it has to be imposed somewhat artificially.

Kelly Davis (Dec 02 2025 at 09:33):

Nikolas Kuhn said:

...the idea that something more like diffusion could be better for proofs has been around for a while and I think makes intuitive sense: In a proof, you have a well-defined start and an end, and you want to fill out the middle, ideally refining/updating the pieces that don't fit well. So you could hope that diffusion based architectures capture this better than autoregressive models, where it has to be imposed somewhat artificially.

Using diffusion based models in this sense is an interesting idea.

However, one can contrast this with the thinking mode of modern LLMs with "large" context windows allowing them to attend to text they produced previously, giving them essentially the ability to "fill out the middle". (The Kimina Prover work emphasized this point but mostly in the context of MCTS.) So, it's not clear which approach is "better".

As to which "wins" (diffusion, thinking modes of modern LLMs, or something else) I've no "horse in that race".

Andy Jiang (Dec 02 2025 at 09:50):

Nikolas Kuhn said:

Unless there's some more information I'd assume this is some version of diffusion Andy Jiang (there's certainly such a thing as an "energy-based view" on diffusion models which you're probably aware of, e.g. as explained here).

I'm not an expert, but the idea that something more like diffusion could be better for proofs has been around for a while and I think makes intuitive sense: In a proof, you have a well-defined start and an end, and you want to fill out the middle, ideally refining/updating the pieces that don't fit well. So you could hope that diffusion based architectures capture this better than autoregressive models, where it has to be imposed somewhat artificially.

Ah ok, that makes a lot of sense. I didn't know about the energy interpretations of diffusion (but I am not extremely surprised as a science laymen that you can have energy-based interpretations of most physicy processes). Yeah diffusion models for proof generation is certainly an interesting idea but at least it is also one seen in the literature from other people so it's less crazy.

Anh Nguyễn (Dec 02 2025 at 11:02):

I think that diffusion models sound very fit to the task of theorem proving. There are reasons why almost all LLM are autogressive, not diffusion model. However, we can see that in the future there will be more diffusion models for language

Notification Bot (Dec 02 2025 at 13:21):

9 messages were moved here from #Machine Learning for Theorem Proving > PutnamBench by Jason Rute.

Gerben Koopman (Dec 02 2025 at 13:30):

Anh Nguyễn said:

However, we can see that in the future there will be more diffusion models for language

Can you substantiate this claim? I'd be interested to read a paper or blog on diffusion for NLP, even if it is outside the Lean scope.

Edit: So I've found some research and overview papers on general use, but it seems worse at reasoning? Sounds like transformers are still very much the way to go for theorem proving no?

Jason Rute (Dec 02 2025 at 23:17):

https://x.com/logic_int/status/1995949127252017499?s=20

Jason Rute (Dec 02 2025 at 23:19):

^ More claims by this company of automatically solving open conjectures in Lean

Deleted User 968128 (Dec 03 2025 at 01:06):

Hmm, not sure of the google scholar is poorly matched with the CEO. https://scholar.google.com/citations?hl=en&user=LyA_iI0AAAAJ&view_op=list_works&sortby=pubdate

https://quantumfoundry.ucsb.edu/people/associates/eve-bodnia

I couldn't find any research. Lot of red flags here, but always a problem with companies like this, improbable, but you never know. There is also very strong incentive not to openly publish breakthroughs right now in AI.

Quantum background would definitely point at EBM and lecun is a big fan.

Jason Rute (Dec 03 2025 at 04:12):

I think whoever runs the Lean X account should encourage them to engage here too.

Jason Rute (Dec 03 2025 at 04:13):

Anyway, here is one of the proofs they just announced. https://github.com/logical-intelligence/proofs

Andy Jiang (Dec 03 2025 at 05:10):

sorry can someone more knowledgable on diffusion models chime in for this? how easy is it for me to fix some parts of the output of the diffusion. for example I fix the theorem statement and some intermediate lemmas and try to use diffusion model to gen the rest of the proof. And is there advantages in parallelization? (I personally don't see any advantages in parallelization bc you can always just start another subgoal in the middle and break your final goal into subgoal + subgoal => goal and then you can parallelize the generation of the two parts even using autoregressive llms i assume). Sorry hope someone who knows more about diffusion models in writing formal proofs can speak on this

Andy Jiang (Dec 03 2025 at 05:11):

fix means keep constant in the above paragraph not to correct btw.

Deleted User 968128 (Dec 03 2025 at 06:18):

The timeline here is interesting: https://x.com/evelovesolive

https://x.com/evelovesolive/status/1971230688818102668 "our audit agent https://noa.logicalintelligence.com It is free"
https://x.com/evelovesolive/status/1974259155851698323 "blockchains for emergency response to recent npm attacks."
https://x.com/evelovesolive/status/1975754450112553408 "Can’t wait to do formal verification and zkp on quantum chips
https://x.com/evelovesolive/status/1996037932864733501 "Aleph prover was initially our internal tool to compare our model with other models, never planned it to be a product"

Feels like a crypto-ish company that probably started doing smart contract verification with lean and stumbled into proof verification.

Whether they stumbled into a breakthrough is the question, I suppose. Even autoregressive transformer based models can be viewed as energy based using the right perspective. Certainly, they don't seem adverse to using other approaches: https://x.com/evelovesolive/status/1995357309736812903

It's a bit of a struggle to get labs to invest in open ML math research. Quants research, but they don't publish. It's important to leverage every opportunity to get folks to help out. We're lucky to have companies like DeepSeek publish so openly.

Eric Wieser (Dec 03 2025 at 17:42):

Nikolas Kuhn said:

Unless there's some more information I'd assume this is some version of diffusion

From linkedin, this certainly sounds like diffusion:

Most AI (like LLMs) solve problems one piece at a time—like doing a jigsaw blindfolded. If you place a piece wrong early, the whole picture suffers. LI-1.0 thinks differently: all the pieces move together, like magnets snapping into place until the puzzle is complete.

Deleted User 968128 (Dec 03 2025 at 18:29):

PR - https://www.01net.it/logical-intelligence-achieves-76-percent-on-putnam-benchmark-highlighting-shift-beyond-large-language-models-to-language-free-mathematically-grounded-models/

"While Aleph is an internal tool built on top of an LLM, its performance places it ahead of all publicly evaluated LLMs and the hybrid EBM systems that still depend on LLM scaffolding. The results are a strong signal that native EBM architectures offer a clear path to trustworthy AI."

This is what I expected and increases my trust level. It sounds more incremental, but their benchmarks are strong and they are likely doing solid work. They sound confident it's the native EBM doing the heavy lifting, but the devil is in the details.

Deleted User 968128 (Dec 03 2025 at 18:48):

Bit disappointed they aren't crediting lean in the PR because that is very likely providing all of the certainty they talk about. EBM just provides a more effective (in theory) way to get there. So, there is that gap.


Last updated: Dec 20 2025 at 21:32 UTC