Zulip Chat Archive

Stream: general

Topic: Trusting native_decide

Niels Voss (May 11 2025 at 19:54):

@GasStationManager do you have any suggestions as to how native_decide could get to the point where we could rigorously trust it? It seems to me that this would require formally verifying the Lean compiler, a task which might require tens of millions of dollars of labor. It seems like such a task would be harder than the compcert project in Coq/Rocq. But maybe there's a cheaper way that I haven't thought of?

GasStationManager (May 11 2025 at 21:04):

@Niels Voss I think that is an important question. I believe it is a very worthwhile goal to aim for. I do appreciate that it would be very technically challenging, but perhaps Lean is closer to it than most other general purpose programming languages. As for whether there is a cheaper way, here's my hot take as an non-expert: I think in the near future AIs will become good enough in Lean to take on a lot of the labor; the key question is whether we will be able to verify their work in a trusted way.

GasStationManager (May 11 2025 at 21:17):

Kind-of relevant: my work-in-progress to get LLMs to produce (correct) Lean code and proof sketch of correctness: https://gasstationmanager.github.io/ai/2025/03/28/alphabeta.html

Jason Rute (May 12 2025 at 00:21):

I think there are better approaches to making a trustworthy native_decide than to formally verify Lean's whole compiler (whatever that means). First, a huge trust factor could be increased by making a version without known exploits, where unsoundness would mean a real compiler bug. Second, one could use native_decide (and other less trustworthy tools) as first approximations of truth, and later replace them with proofs using more trustworthy tactics. Third, one could create a trustworthy tactic (like a super_norm_num) which covers a lot of common cases normally handled by native_decide. It wouldn't have to cover everything at first, and could grow to cover more and more common cases. Fourth, instead of formally verifying a compiler, one could formally verify in Lean4Lean speed improvements to the existing kernel (stealing ideas from other sources). It is scary and dangerous to modify the kernel, but if one does it under the safety of formal verification, then one could make safe and practical improvements. (I don't know if any of these are feasible, of course, but I've seen @Mario Carneiro mention some of them in places.)

Mario Carneiro (May 12 2025 at 00:24):

Currently, lean4lean does not support native_decide, and I have idly considered what it would take to make it work; it amounts to formalizing the lean IR and enough of the compiler pipeline to prove that the IR that is run bears some resemblance to the lean kernel term. This is especially problematic because the kernel term and the IR are both separately lowered from the pre-definition, which is not stored at all, so there is actually some missing information if one wanted to verify .olean files directly (as I do)

Niels Voss (May 12 2025 at 00:26):

By "speed improvements to the kernel" do you mean extensions to the kernel which let it typecheck a certain subset of proofs using custom algorithms? (Kind of like how we can typecheck stuff like 3+4=7 using GMP instead of reducing to Peano axioms)

Mario Carneiro (May 12 2025 at 00:26):

there are lots of ways that the kernel algorithm is just not good. It's not really optimized at all

Mario Carneiro (May 12 2025 at 00:29):

A limited scope super_norm_num that focuses on just the pure computation parts and not implemented_by / extern shenanigans would be very useful for people who want to use native_decide as a decision procedure for computationally heavy proofs

Jason Rute (May 12 2025 at 00:30):

Niels Voss said:

By "speed improvements to the kernel" do you mean extensions to the kernel which let it typecheck a certain subset of proofs using custom algorithms? (Kind of like how we can typecheck stuff like 3+4=7 using GMP instead of reducing to Peano axioms)

Do we really use GMP directly for working with natural numbers in the kernel?

Mario Carneiro (May 12 2025 at 00:31):

yes

Mario Carneiro (May 12 2025 at 00:32):

not just that, the list of primitives has been steadily expanding and not one but two soundness bugs have been introduced because of this

Jason Rute (May 12 2025 at 00:33):

So if I duplicated the Nat with a custom MyNat (same definition), it would be noticeably slower (in the kernel)?

Niels Voss (May 12 2025 at 00:34):

Jason Rute said:

I think there are better approaches to making a trustworthy native_decide than to formally verify Lean's whole compiler (whatever that means). First, a huge trust factor could be increased by making a version without known exploits, where unsoundness would mean a real compiler bug. Second, one could use native_decide (and other less trustworthy tools) as first approximations of truth, and later replace them with proofs using more trustworthy tactics. Third, one could create a trustworthy tactic (like a super_norm_num) which covers a lot of common cases normally handled by native_decide. It wouldn't have to cover everything at first, and could grow to cover more and more common cases. Fourth, instead of formally verifying a compiler, one could formally verify in Lean4Lean speed improvements to the existing kernel (stealing ideas from other sources). It is scary and dangerous to modify the kernel, but if one does it under the safety of formal verification, then one could make safe and practical improvements. (I don't know if any of these are feasible, of course, but I've seen Mario Carneiro mention some of them in places.)

It sounds to me like your second through fourth points here are more about reducing the need for native_decide than actually making native decide more trustworthy. Which is still worth it, but it's not really the same thing. Unless you're were thinking that your super_norm_num is some intermediate trusted state

Mario Carneiro (May 12 2025 at 00:34):

here is the current list of kernel primitives

Mario Carneiro (May 12 2025 at 00:35):

Jason Rute said:

So if I duplicated the Nat with a custom MyNat (same definition), it would be noticeably slower (in the kernel)?

Yes, it would be extremely slow because it would work in unary

Niels Voss (May 12 2025 at 00:35):

Plus, GasStationManager said he cared about verifying programs, not just proofs with native_decide

Mario Carneiro (May 12 2025 at 00:35):

you don't need native_decide to write programs

Aaron Liu (May 12 2025 at 00:35):

Mario Carneiro said:

here is the current list of kernel primitives

That's so many more than I thought there would be

Jason Rute (May 12 2025 at 00:36):

@Niels Voss It is the end goal that is important, not the road to getting there, right? I think native_decide is more of a crutch than a desired tool. Native decide isn't for verifying code any more than other tactics, no?

Niels Voss (May 12 2025 at 00:37):

Yes, I definitely agree with you. If we can remove the need for native_decide that will be just as good as getting to the point where we can trust it. I'm just saying that this wouldn't be sufficient to do what GasStationManager wants, which is to trust executable programs.

Niels Voss (May 12 2025 at 00:38):

GasStationManager said:

What if (like me) someone wants to have provable guarantees on the behavior of executable code, and therefore need to be able to trust the compiler in addition to the kernel?

This was the quote I was thinking of

Mario Carneiro (May 12 2025 at 00:38):

you need something like cakeml, compcert or MM0 for foundational trust of executables. That's not in lean's future

Mario Carneiro (May 12 2025 at 00:39):

but you can still verify things modulo the compiler. The problem is that the current compiler is unsound by design and we need some reasonable bounds on that before we can make progress on provable anything

Jason Rute (May 12 2025 at 00:40):

Lean 5? :slight_smile:

Mario Carneiro (May 12 2025 at 00:41):

it could be as simple as just banning or whitelisting implemented_by and extern and tracking the axioms used in @[csimp] proofs

Mario Carneiro (May 12 2025 at 00:41):

this largely can't be done post-hoc, because the .olean file just has the compiled result and does not describe the path it took to get there

Niels Voss (May 12 2025 at 00:42):

How confident are you that the current set of implemented_by and extern in Lean core is consistent?

Mario Carneiro (May 12 2025 at 00:42):

it's definitely not

Mario Carneiro (May 12 2025 at 00:43):

people fool with IO functions all the time

Jason Rute (May 12 2025 at 00:43):

#general > Using `native_decide` to prove False? See the last examples.

Mario Carneiro (May 12 2025 at 00:44):

but that's again by design. The list of axioms in lean is also inconsistent by design because sorryAx is in there

Niels Voss (May 12 2025 at 00:44):

Well if we make IO.RealWorld opaque, that would at least prevent some of the obvious exploits, right?

Mario Carneiro (May 12 2025 at 00:44):

so the answer is just to have a way to track what subset of axiomatically strong things you are touching

Mario Carneiro (May 12 2025 at 00:45):

i.e. #print axioms for native_decide

Niels Voss (May 12 2025 at 00:45):

That sounds like it might be worth having

Mario Carneiro (May 12 2025 at 00:46):

Niels Voss said:

Well if we make IO.RealWorld opaque, that would at least prevent some of the obvious exploits, right?

No, IO has to not be a function type. Even if IO.RealWorld is an opaque type you can still use the function type to "rewind time" in ways that don't accord with the real implementation and observe invalid behavior that way

Niels Voss (May 12 2025 at 00:49):

Yeah, I remember that being discussed here https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/Semantics.20of.20IO.20don't.20match.20provable.20properties.3F

Niels Voss (May 12 2025 at 00:51):

But I'm not sure how that lets you prove False (I could see how it might let you prove BaseIO False)

Mario Carneiro (May 12 2025 at 00:51):

that's for a slightly different thing, about ST not having a good axioms. Here the issue is that you can cache old IO.RealWorld values and reuse them, and the lean model will think that this means you rewound the state of the world but the implementation ignores the token so it will not be rewound

Niels Voss (May 12 2025 at 00:53):

I think I understand now

Jason Rute (May 12 2025 at 00:53):

Maybe we can cut this conversation at @Niels Voss first message and below and move it to its own thread. I think others (with no interest in ML) would find this interesting?

Jason Rute (May 12 2025 at 00:54):

(I don't have such permissions. Do you @Mario Carneiro?)

Notification Bot (May 12 2025 at 00:55):

42 messages were moved here from #Machine Learning for Theorem Proving > A more robust RL pipeline? by Mario Carneiro.

GasStationManager (May 12 2025 at 17:23):

@Mario Carneiro In the spirit of this thread, I was curious: while the current compiler is unsound, could a subset of the language be verified? E.g. pure functions without IO?

Mario Carneiro (May 12 2025 at 17:24):

Depends on what you mean by verified, but in the most basic sense, yes, or more precisely there are no known obvious counterexamples to the claim which is quite different from having a strategy to actually accomplish the task

Mario Carneiro (May 12 2025 at 17:25):

well, no known counterexamples other than the ones that have already been mentioned :upside_down:

Last updated: Dec 20 2025 at 21:32 UTC