Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Aristotle writing rust?

TongKe Xue (Dec 04 2025 at 01:26):

Congrats!
How long until we can point Aristotle at https://github.com/leanprover/lean4/tree/master/src/runtime and have it rewrite it in Rust ? 2 years? 3 years?

Julian Berman (Dec 04 2025 at 02:08):

Leaving aside anything to do with whether it's a good idea entirely, that doesn't seem to have anything to do with Aristotle's capabilities, does it?

TongKe Xue (Dec 04 2025 at 02:45):

The Aristotle blog post is "Aristotle Learns to Code"; the benchmark they are testing on is about code synthesis (though I think Aristotle was only tested on the "ProofGen" task, which takes input code + spec, and generates proofs).

To me, much like some of the cool demos in image/video is "take this existing image/video, but stylze it / make one modification", I think one of the big use cases / demos of Math ASL / Formally Verified Code is going to be taking libraries from language A and porting it to language B, all done by the LLM/Agent itself.

IMHO one of the big use cases of Math ASL is being able to replace "vibe coding" with something that not only generates code, but also generates lean proof (of satisfying correctness, runtime, memory usage) along with it. Imho, this is the #1 problem with vibe coding -- almost correct code -- which synthesizing code w/ proofs eliminates.

Claude / OpenAI might say "here is a rust repo of the lean4 runtime", but with Math ASL, we can get something of the form "here is a rust repo of the lean4 runtime, and here is a lean4 proof of equiv of the Rust/C++ code, under these Rust/C++ semantics."

TongKe Xue (Dec 04 2025 at 02:56):

I think the MathBenchmarks went
SAT/ASHME -> AIME -> IMO/Putnam -> Research/Erdos problems

I'm not sure if the proofs have increased in line count significantly (most seem to be < 2 pages), but I think it's fairly well accepted the "thinking time" has increased from seconds to minutes to hours/days.

Similarly, the benchmark that Aristotle just solved is https://arxiv.org/pdf/2505.23135 -- which looks like problems on the ~100 LOC level. If we take similar extrapolations, and using lean4 to ensure the output is correct / not BS, it does not seem unrealistic to me that we 10x in 1 year to 1000 LOC code output (with lean4 proofs), and 100x in 2 years to 10k LOC (also with lean4 proofs) -- and at that level, assuming a formalized C++/Rust semantics, it doesn't seem out of the question to ask Aristotle to rewrite 20-30k loc of C++ into Rust. Especially because on a first draft, it could probably do 1 C++ func <-> 1 unsafe Rust func, and prove its equivalence there.

Niels Voss (Dec 04 2025 at 03:47):

"assuming a formalized C++/Rust semantics" is a big assumption, given that the informal C++ standard is somewhere around 2000 pages

Yury G. Kudryashov (Dec 04 2025 at 04:39):

... and no compiler follows the standard to the letter.

Yury G. Kudryashov (Dec 04 2025 at 04:41):

(based on my struggles with different C++ compilers while preparing https://github.com/Newclid/Newclid/tree/main/yuclid for a release)

TongKe Xue (Dec 04 2025 at 06:55):

There's three counter arguments I see to this:

We probably only need a small subset.

I think the counter argument is that most decent Rust/C++ programmers probably

can't write down Rust or C++'s formal grammar
never read C++ / Rust (if one exists)'s formal spec
never wrote a parser for either
definitely never wrote a Rust type checker
never wrote a compiler for either

and despite this, they can write decent C++/Rust; so I suspect there exists some small subset of the C++/Rust spec where if you get this right, it covers a large percent of day to day C++/Rust.

IOI Models

There exists models that can win IOI gold. And those models are doing "English -> C++". Intuitively, I feel for programs of same out length, "English -> C++" should be harder than "C++ -> Rust".

$$$ spent on human labeling

I do not know how big the "small core" in point 1 is -- but I suspect it's not large compared to the revenue of ScaleAI / Mercor / Turing; it feels like a one time (possibly partially automated) cost/investment well within the funding poured into AI.

I just finished all the InsertionSort proofs at https://softwarefoundations.cis.upenn.edu/vfa-current/Sort.html at https://github.com/tkxue/func_algo/blob/main/FuncAlgo/C01_Sort.lean

The proofs were a bit tedious, and probably do-able in half the length. However, it's not clear to me at all why this would not scale up. In fact, (without offending programmers), I feel like scaling up code translation is "easier" than solving Erdos problems in that: (1) not all programmers can solve Erdos problems, (2) most who have solved Erdos problems can probably pickup programming, and (3) solving Erdos problems feels like Aristotle need to be able to "find spark of genius / insight"; whereas translating code from one language to another "merely" requires Aristotle to do lots of rote engineering task (and much simpler proofs).

Niels Voss (Dec 04 2025 at 06:59):

Can this discussion be moved to another thread? I don't see how this has anything to do with VERINA

Notification Bot (Dec 04 2025 at 08:35):

9 messages were moved here from #Machine Learning for Theorem Proving > Aristotle achieves SOTA 96.8% proof generation on VERINA by Johan Commelin.

Bas Spitters (Dec 04 2025 at 08:54):

Darpa has a project to translate C to Rust:
https://www.darpa.mil/research/programs/translating-all-c-to-rust

Deleted User 968128 (Dec 04 2025 at 18:13):

Julian Berman said:

Leaving aside anything to do with whether it's a good idea entirely, that doesn't seem to have anything to do with Aristotle's capabilities, does it?

Solana smart contracts are written in rust, which has about a $4B daily trading volume. Smart contracts are an ideal use case for verification.

It's an interesting question though, the potential synergies in verification and correct code generation.

Deleted User 968128 (Dec 04 2025 at 18:40):

https://arxiv.org/html/2509.19153v1 (Solidity is not rust, but the rest is relevant. Imho, this is a great tool calling use case. There is this massive capability and value chasm that could be bridged quickly, but LLMs keep pushing AGI and threaten to bridge it singlehandedly.)

Niels Voss (Dec 04 2025 at 20:30):

I don't think anyone needs to be convinced that a system which can automatically formally verify rust code would have immense value. The question was about whether formally verified transpilation from C++ to Rust would be useful, and whether it is feasible.

Deleted User 968128 (Dec 04 2025 at 22:09):

Well, at least we know C++ is more valuable than Rust. :)

Deleted User 968128 (Dec 04 2025 at 23:27):

I suppose I take this improperly on assumption that everyone is aware, when discussing this topic, but hard truth is that a minority of senior engineers are writing code these days, and soon it will be fewer still.

Rather they will drive architecture, but the traditional perspective of the code writing SWE is rapidly coming to an end. It will be more computer scientists and 'engineers' in the traditional sense. I can see them becoming fully credentialed, insured and required to legally sign off on what is delivered.

It's a challenging transitional period for everyone and I truly empathize with the difficult emotions going around.

Tools like Aristotle won't stop at verification, because it isn't what anyone really wants. They want generated correct code.

TongKe Xue (Dec 05 2025 at 02:02):

Tim Shephard said:

Tools like Aristotle won't stop at verification, because it isn't what anyone really wants. They want generated correct code.

I can not imagine any of the "Math SuperIntellitgence" startups (atleast 3 of which claimed Erdos 124/481 in past week) to decide "Hey, let's stop at Math, and not go after the code-gen / Cursor / Claude-4.5 / Codex market."

Andy Jiang (Dec 05 2025 at 04:11):

probably influenced by sources of funding I guess? Though would honestly be nice if they were more focused.

Deleted User 968128 (Dec 05 2025 at 06:16):

With a strong verification system, they will be able to do RLVR in a way no other lab can. There are other important synergies. https://arxiv.org/html/2310.17807v4 https://arxiv.org/abs/2412.06176 https://arxiv.org/abs/2507.15822

However, I don't believe they will put the cart before the horse. I've seen a number of examples now of teams who realized only after trying to solve the second problem that they had to go back and solve the first.

Probably the real question in all of this is whether they will get acquired (or duplicated) by a frontier lab or if this will remain niche. The former would be pretty exciting, though would prefer to see an acquisition.

Bas Spitters (Dec 05 2025 at 12:54):

Tim Shephard said:

Solana smart contracts are written in rust, which has about a $4B daily trading volume.   Smart  contracts are an ideal use case for verification.

We've been working on Rust smart contract verification (in Rocq), and found a number of bugs in the process. This includes a verified compiler to wasm.
https://github.com/AU-COBRA/ConCert/?tab=readme-ov-file#papers

Part of this tooling has already been ported to Lean (Hax, https://peregrine-project.github.io/).

Last updated: Feb 28 2026 at 14:05 UTC