Zulip Chat Archive

Stream: lean4

Topic: soundness bug: native_decide leakage


Mario Carneiro (Oct 10 2023 at 22:12):

You don't need the ofReduceBool axiom for weirdness from the real world to affect the logical system:

def foo : Bool :=
  match IO.getRandomBytes 1 () with
  | .ok bs _ => bs[0]! >= 128
  | _ => false
theorem T1 : false = Lean.reduceBool foo := rfl
theorem T2 : Lean.reduceBool foo = true := rfl
theorem contradiction : False := nomatch T1.trans T2
#print axioms contradiction
-- 'contradiction' does not depend on any axioms

(this proof only works with 1/4 probability so rerun it a couple times until both rfl proofs work)

Mario Carneiro (Oct 10 2023 at 22:14):

an easy fix is to make reduceBool an axiom instead of an opaque so that it shows up in the #print axioms list

Mario Carneiro (Oct 10 2023 at 22:17):

because it's not really an opaque, it has (in this case literally) random reduction rules

James Gallicchio (Oct 10 2023 at 22:28):

(and also, IO.RealWorld is not actually opaque... :joy: Lean.reduceBool is allowing multiple real worlds here because the types say you can have as many real worlds as you need!)

Scott Morrison (Oct 11 2023 at 00:20):

lean4#2654 adds an axiom that Lean.reduceBool and Lean.reduceNat depend on.

Scott Morrison (Oct 11 2023 at 00:20):

(We can't quite make reduceBool an axiom itself, because the whole point is that it is computable.)

Mario Carneiro (Oct 11 2023 at 00:25):

It should be possible to have a computable axiom, if these axioms are special cased in the compiler

Junyan Xu (Oct 11 2023 at 00:32):

resolves https://manifold.markets/tfae/is-the-lean-kernel-unsound (?)

Eric Wieser (Oct 11 2023 at 00:42):

Does this reproduce with leanchecker?

Scott Morrison (Oct 11 2023 at 01:12):

No.

Scott Morrison (Oct 11 2023 at 01:14):

I've just pushed a new commit to https://github.com/leanprover/lean4checker that includes this example in the test suite.

Scott Morrison (Oct 11 2023 at 01:17):

(This was slightly awkward, as Mario's code is non-deterministic, so I have to build the example in a loop until it compiles, and then check that. It's all fast enough that this is fine.)

Tomas Skrivan (Oct 11 2023 at 01:18):

Scott Morrison said:

lean4#2654 adds an axiom that Lean.reduceBool and Lean.reduceNat depend on.

Out right saying that if you trust the compiler then you can prove false is really odd. In my head, prove of false is an indication of a bug somewhere that should be fixed. Using axiom trustCompiler : True instead of axiom trustCompiler : False would work the same way, no?

In the case above the problem is that IO monad is effectively StateM Unit and you can easily copy IO.RealWorld. There is even TODO at the definitions of EIO which would prevent you from writing IO.getRandomBytes 1 ()

Tomas Skrivan (Oct 11 2023 at 01:32):

Now I'm completely confused, why does the rfl proof work? If you change the proof to by unfold foo; rfl then it does not work.

Tomas Skrivan (Oct 11 2023 at 01:40):

I think the answer is this line, the reduction happens only if the expression looks like .app (.const `Lean.reduceBool _) (.const _ _)

Scott Morrison (Oct 11 2023 at 02:11):

#2656 makes IO.RealWorld an opaque type.

In src/Init/System in unsafeBaseIO we use an unsafeCast () to "construct" a RealWorld, and it seems like everyone else already behaves. :-)

This independently prevents the example above.

I haven't yet made EIO itself opaque, as @Leonardo de Moura has requested in a TODO in that file, because it is a bit more work to fully specify its API (so we can hide that it is really EStateM).

Scott Morrison (Oct 11 2023 at 02:41):

@Mac Malone I'm wondering if you can help me with something here.

Including this example in the test suite for lean4checker has the consequence that lake build only succeeds 1/4 of the time!

Options:

  • remove the test file from the scope of lake build: I don't like this, because I would prefer that every time anyone builds the checker they are compelled to run the tests too!
  • is there some custom Lake hackery that can tell lake build that it is meant to keep compiling a file until it works? :-) Obviously this is far from spec, but it's an unusual situation here!

Scott Morrison (Oct 11 2023 at 02:42):

(beatings will continue until morale improves)

Mario Carneiro (Oct 11 2023 at 03:03):

There are a few other ways to write the IO so that you get a more reliable result. For example, if file foo.txt does not exist then you create such a file and return true, otherwise you delete it and return false

Mario Carneiro (Oct 11 2023 at 03:04):

that will ensure that you always alternate between true and false (and as an added bonus it cleans up after itself)

Mario Carneiro (Oct 11 2023 at 03:05):

If you did want to keep the random number generator implementation, I would suggest doing the "keep compiling until it works" in lean as a command elaborator

Mac Malone (Oct 11 2023 at 04:23):

Scott Morrison said:

  • remove the test file from the scope of lake build: I don't like this, because I would prefer that every time anyone builds the checker they are compelled to run the tests too!

Running tests on a build strikes me as very weird. I cannot think of any major library or language which does that. While CIs certainly build & test, the very fact that the process is called "build & test" indicates that these are usually expected to be distinct invocations.

Scott Morrison (Oct 11 2023 at 04:29):

I guess the fact that lake test doesn't exist may be contributing to me want to do everything during build. :-)

Scott Morrison (Oct 11 2023 at 04:30):

Actually -- if I have a test.sh or make test that runs my actual testing, what is the idiomatic way to fire this via lake ...?

Mac Malone (Oct 11 2023 at 04:30):

While lake test sadly does not exist yet, lake run test is still a reasonable alternative (and does not introduce significant elaboration overhead anymore now that lakefile.olean exists).

Scott Morrison (Oct 11 2023 at 04:31):

And I can only define scripts in the lakefile.lean itself?

Scott Morrison (Oct 11 2023 at 04:32):

(Perhaps lake run --help could explain or link to an explanation of where scripts go / what they look like.)

Mac Malone (Oct 11 2023 at 04:33):

Yeah they go in the lakefile. There is a section in the README on scripts.

Mac Malone (Oct 11 2023 at 04:34):

If you want to do something more complicated, there is the mathlib approach of a utility exe.

Mario Carneiro (Oct 11 2023 at 05:40):

/--
Depends on the correctness of the Lean compiler, interpreter, and all `[implemented_by ...]` and `[extern ...]` annotations.
-/
axiom trustCompiler : False

@Scott Morrison Umm... It appears the robots have discovered that this axiom is very effective.

Mario Carneiro (Oct 11 2023 at 05:41):

I don't really like the idea of a false axiom hanging around... can't it be a true axiom instead?

Scott Morrison (Oct 11 2023 at 05:42):

Oops, yes!

Scott Morrison (Oct 11 2023 at 05:46):

Okay, arriving soon.

Eric Rodriguez (Oct 11 2023 at 07:32):

https://github.com/leanprover-community/mathlib4/actions/runs/6478956267/job/17591582160

There still seems to be some issues

Alex J. Best (Oct 11 2023 at 08:16):

Those PRs should maybe merge master? Scott pushed since then

Robin Carlier (Oct 11 2023 at 09:06):

I’m getting a similar issue withe the lean4checker update: https://github.com/leanprover-community/mathlib4/actions/runs/6476538847/job/17585410139

Am I correct that the way to go is to wait and rerun CI when things are settled?

Scott Morrison (Oct 11 2023 at 09:23):

Just merge master or rebase onto master.

Ioannis Konstantoulas (Oct 11 2023 at 11:20):

I hope I am under a serious misunderstanding: leanchecker rejected that code before @Scott Morrison added a test for it, right? We are not going to be special-casing every soundness leak into leanchecker?

Scott Morrison (Oct 11 2023 at 11:21):

Yes, you are seriously misunderstanding.

Scott Morrison (Oct 11 2023 at 11:22):

lean4checker rejected this example. All I did today was added a test case that verified that lean4checker rejected this example!

Ioannis Konstantoulas (Oct 11 2023 at 11:22):

Yay! Thanks for the assurance, Scott!

Scott Morrison (Oct 11 2023 at 11:23):

(This was slightly tricky because it was non deterministic. I'll replace it with something deterministic tomorrow per Mario's suggestion.)

Mario Carneiro (Oct 11 2023 at 18:56):

Scott Morrison said:

lean4checker rejected this example. All I did today was added a test case that verified that lean4checker rejected this example!

This is a surprise to me. I would expect lean4checker to accept the example (with 1/4 probability), or the deterministic revision of it with 100% probability. The only thing the follow up work has done is to make the example, while still proving False, at least depend on additional axioms. Making IO.RealWorld opaque will permanently break this example, but as Mac mentioned we remain susceptible to bad implemented_by implementations.

Does lean4checker have a mechanism to disallow axioms outside a whitelist / does it #print axioms at the end? Otherwise I don't see how it could prevent this from still being an issue.

Timo Carlin-Burns (Oct 11 2023 at 19:38):

I think lean4checker rejects the example due to buggy support for reduceBool, which when fixed would cause it to accept the example. The error that lean4checker gives to reject

def foo : Bool :=
  match IO.getRandomBytes 1 () with
  | .ok bs _ => bs[0]! >= 128
  | _ => false
theorem T1 : false = Lean.reduceBool foo := rfl
theorem T2 : Lean.reduceBool foo = true := rfl
theorem contradiction : False := nomatch T1.trans T2

is

uncaught exception: (kernel) unknown declaration 'foo'

It gives the same error in a case that it should arguably accept:

def foo := true
theorem T1 : Lean.reduceBool foo = true := rfl

I'm guessing that when lean4checker asks the kernel to validate T1, it doesn't provide it with a compiled version of foo to use with reduceBool

Mario Carneiro (Oct 11 2023 at 20:16):

Of course -- the way lean4checker builds up the environment leads to everything being effectively noncomputable

Mario Carneiro (Oct 11 2023 at 20:18):

Incidentally, I noticed that even regular lean gives a similar error when you use a constructor directly:

theorem T1 : Lean.reduceBool true = true := rfl
-- unknown declaration 'Bool.true'

Joachim Breitner (Oct 21 2023 at 07:53):

I think lean4checker rejects the example due to buggy support for reduceBool which when fixed would cause it to accept the example.

So does this mean <https://manifold.markets/tfae/is-the-lean-kernel-unsound> ought to resolve to “yes”?

Scott Morrison (Oct 21 2023 at 07:54):

I think the word "buggy" was used in an interesting way in that sentence.

Scott Morrison (Oct 21 2023 at 07:55):

lean4checker rejects that example, which is what we want it to do.

Scott Morrison (Oct 21 2023 at 07:55):

(And so no, I don't think this is relevant to that manifold question.)

Mario Carneiro (Oct 21 2023 at 07:58):

The manifold market question is a bit ambiguous about whether it is about the "lean kernel" or lean4checker. The lean kernel had an issue, but lean4checker was not broken, because it doesn't support reduceBool at all.

Mario Carneiro (Oct 21 2023 at 08:00):

(I would not describe lean4checker's behavior as "correct" here insofar as reduceBool is considered a part of the lean system and it is not supposed to be always rejected, but currently it is very difficult to support reduceBool outside the context of an elaborator session.)

Mario Carneiro (Oct 21 2023 at 08:02):

Scott Morrison said:

lean4checker rejects that example, which is what we want it to do.

I don't really agree with this. It should accept the example, resulting in a proof of false depending on the reduceBool or trustCompiler axiom

Mario Carneiro (Oct 21 2023 at 08:07):

The only way you can argue that this example should be rejected (setting aside the other part of the exploit regarding running IO in safe code) is if you think reduceBool should not exist in the kernel

Mario Carneiro (Oct 21 2023 at 08:12):

Here's a variant on the example which uses only features working as designed:

unsafe def fooUnsafe : Bool :=
  unsafeBaseIO <| EIO.catchExceptions
    (return ( IO.getRandomBytes 1)[0]!  128)
    fun _ => pure false
@[implemented_by fooUnsafe]
opaque fooBody : Bool
def foo : Bool := fooBody

theorem T1 : false = Lean.reduceBool foo := rfl
theorem T2 : Lean.reduceBool foo = true := rfl
theorem contradiction : False := nomatch T1.trans T2

Mario Carneiro (Oct 21 2023 at 08:16):

There is however a clear path to saying that the proof of false in this code is "my fault": I used reduceBool, and this means that every implemented_by is effectively an axiom, and I have made an unsound choice of implemented_by in the example

Joachim Breitner (Oct 21 2023 at 08:59):

Thanks! Here is my attempt at a condensed summary:
Use of reduceBool increases the trustbase considerably, including to every single implemented_by in any dependency, and is expected (though not intended) to likely be inconsistent.
The lean kernel should report proofs using reduceBool as depending on this axiom, but did not (does not?) do that; this lack of axiom tracking could be considered a soundness bug.
Such proofs (correct or not) currently cannot be checked with lean4checker, because it simply doesn’t support reduceBool. In that sense lean4checker is sound (and the manifold question should still resolve to No). Whether it would correctly report the axiom as used if it had support for reduceBool is a hypothetical.

Mario Carneiro (Oct 21 2023 at 09:40):

and is expected (though not intended) to likely be inconsistent.

It is not expected to be inconsistent, although inconsistencies here are considerably more frequent than actual soundness bugs

Mario Carneiro (Oct 21 2023 at 09:41):

Importantly, if there is any inconsistency accessible to native_decide, then we lose another important and desirable property of lean: it is a memory-safe language, you cannot cause UB from safe code

Mario Carneiro (Oct 21 2023 at 09:44):

The lean kernel should report proofs using reduceBool as depending on this axiom, but did not (does not?) do that; this lack of axiom tracking could be considered a soundness bug.

This was the original soundness bug report, and it was fixed by adding an artificial trustCompiler axiom which appears as a dependency of reduceBool so that any use of it will put trustCompiler in the #print axioms list

Mario Carneiro (Oct 21 2023 at 09:45):

An alternative solution would be to add special support for reduceBool in #print axioms; I think we will want something like this eventually anyway so that the presence of reduceBool indicates that we should also display all implemented_by axioms

Joachim Breitner (Oct 21 2023 at 09:51):

I meant “expected” in the descriptive sense (we are not surprised if someone finds bugs), not in the prescriptive sense (we want it to be that way). Ambiguous language, I guess (“I do expect my dog to behave well, but it’s a rascal, so I expect regular trouble.”). Is there a less ambiguous verb?

Mario Carneiro (Oct 21 2023 at 09:59):

It is a bug, but not a soundness bug. Maybe we could call it a safety bug?

Mario Carneiro (Oct 21 2023 at 10:00):

Of course in rust this is what they would call a soundness bug, but they aren't trying to maintain a logical system and a memory-safe programming language at the same time

Scott Morrison (Oct 21 2023 at 22:10):

reduceBool has been tracked with an axiom since lean4#2654, which landed in v4.2.0-rc2.

Adomas Baliuka (Oct 21 2023 at 23:11):

Mario Carneiro said:

Of course in rust this is what they would call a soundness bug, but they aren't trying to maintain a logical system and a memory-safe programming language at the same time

Following this thread, I'm actually confused now what exactly would constitute a soundness bug in Lean (and whether there is some consensus on that question). Is there something to read about that? Perhaps a definition might look like "prove False using any language features except for [exhaustive list] and pass both leanchecker and [some hypothetical checker thay bans you from using custom axioms and maybe other things]"?

I would have thought it's important to be precise about this, but maybe I'm wrong... In other threads people have pointed out that (perhaps I misunderstood) at the very least some notions of soundness in Lean are not relevant/interesting to Mathlib or doing math in general.

Mario Carneiro (Oct 22 2023 at 00:07):

@ab A proof of false using the lean kernel and without axioms (except the 3 standard ones) would qualify as a soundness bug in either the kernel or the theory

Mario Carneiro (Oct 22 2023 at 00:10):

lean4checker is capable of performing this check, except that it doesn't tell you what axioms were used, so if you use a bad axiom you can still get lean4checker to validate your proof - simply watching the error code and not looking at the statements of theorems is not sufficient

Mac Malone (Oct 22 2023 at 00:56):

Mario Carneiro said:

it is a memory-safe language, you cannot cause UB from safe code

While Lean strives to be memory-safe (just like pretty much any library) and purely functional code is memory-safe by default (as it is in most functional languages), I think the correctness of this statement heavily depends on what you wish to define as "safe" code. If you mean bad memory accesses is only possible in code the Lean kernel or elaborator will mark unsafe that seems clearly false (as memory-unsafe code can be wrapped in opaque). If you mean without any extern, implemented_by, or metaprogramming shenanigans, the situation is better, but as a lot of code makes use of these things, that doesn't seem like an especially strong guarantee.

Mario Carneiro (Oct 22 2023 at 01:09):

As in rust, one needs to sometimes grow the set of synonyms of unsafe as that particular syntactic marker is not necessarily a part of the unsafety-inducing feature, but I think it should be possible to have some list of unsafe features, such that as long as you don't use those then you haven't broken safety. implemented_by is only unsafe in conjunction with reduceBool, but either way it's still signposted. extern is definitely unsafe. "Metaprogramming shenanigans" are poorly scoped but let's say yes it is unsafe since you can implement your own version of unsafe that way. opaque is not unsafe AFAIK, you have to combine it with an implemented_by or extern for something interesting to happen.

Also as in rust, the point is not actually to say that one should write code not using any of these features, and indeed some very critical core features have to be implemented using these tools so we can't actually do without them. However, they limit the scope of auditing from everything in the library to just a few things that do sketchy stuff, and in practice this has been shown to be significantly useful in improving overall quality because you can bring those core parts to a higher standard of correctness and documentation to compensate for the necessary bending of the rules of the type system.

Mac Malone (Oct 22 2023 at 01:28):

Mario Carneiro said:

implemented_by is only unsafe in conjunction with reduceBool

Are we talking still talking about memory-safety or have we reverted to type-safety? I am not sure how reduceBool effects memory-safety.

Mario Carneiro said:

"Metaprogramming shenanigans" are poorly scoped but let's say yes it is unsafe since you can implement your own version of unsafe that way.

The problem with most metaprogramming shenanigans is that they are not detectable from within Lean, they require some sort of external checker to reproduce results. For metaprogramming produced memory-unsafe code, this would require an external compiler to recompile things in a restricted manner.

Mario Carneiro said:

However, they limit the scope of auditing from everything in the library to just a few things that do sketchy stuff, and in practice this has been shown to be significantly useful in improving overall quality because you can bring those core parts to a higher standard of correctness and documentation to compensate for the necessary bending of the rules of the type system.

I agree. To me, though, there is a major difference between a memory-safe language where memory-unsafe stuff is necessarily clearly marked and a "memory-safe by default" language (e.g., most functional programming languages including Haskell and Lean) where things can be made memory-unsafe without clear markings and only discoverable through post-facto external analysis.

Mario Carneiro (Oct 22 2023 at 01:31):

Mac Malone said:

Mario Carneiro said:

implemented_by is only unsafe in conjunction with reduceBool

Are we talking still talking about memory-safety or have we reverted to type-unsafety . I am not sure how reduceBool effects memory-safety.

I'm talking about memory-safety. If you can prove a thing is true and evaluate it to false, then you can get memory unsafety. (Actually, you're right, that doesn't need reduceBool, implemented_by is sufficient.) Example:

def six := 6
@[implemented_by six] def zero := 0
#eval #[1, 2, 3].get zero, by decide

I think this is just limited to @[implemented_by] def though. With @[implemented_by] opaque you can't cause unsafety unless the implementation doesn't act like a pure function (e.g. it is nondeterministic or returns provably different values on provably equal inputs), and I don't think the regular type system will let you do this without another signpost.

Mario Carneiro (Oct 22 2023 at 01:32):

Put another way, I think lean should be able to be at least as memory safe as rust (it has most of the same escape hatches with the same attacker model), and this seems to be a useful thing regardless.

Mario Carneiro (Oct 22 2023 at 01:33):

I am definitely not talking about an iron-clad safe type system like ML, I don't think lean will ever be that, it's just not useful enough in practice

Mario Carneiro (Oct 22 2023 at 01:34):

things can be made memory-unsafe without clear markings

what evidence do you have for this?

Mario Carneiro (Oct 22 2023 at 01:35):

Every method of breaking memory safety I know comes with some kind of calling card

Mario Carneiro (Oct 22 2023 at 01:37):

But remember that the goal is not defense against malicious agents, it is raising the bar for accidental memory errors across teams of fallible humans (like the one we're dealing with now in rc2/3!)

Mario Carneiro (Oct 22 2023 at 01:39):

The most important implication of that is that metaprogramming shenanigans are much less scary than they would be in a malicious user setting; a culture of marking unsafe things as unsafe is what prevents bad tactics from just ignoring restrictions without a signpost

Mac Malone (Oct 22 2023 at 03:26):

Mario Carneiro said:

Every method of breaking memory safety I know comes with some kind of calling card

The way I was thinking of is that one can just directly override a definition's IR in the the declMapExt to be something memory-unsafe (i.e., a kind of implemented_by without the attribute).

Mac Malone (Oct 22 2023 at 03:30):

(Also, I suspect this will become even easier and more common when the new in-Lean compiler arrives. Performance-sensitive libraries will likely desire to swap out definitions with faster LLVM or assembly-optimized code.)

Mac Malone (Oct 22 2023 at 03:34):

Mario Carneiro said:

As in rust, one needs to sometimes grow the set of synonyms of unsafe as that particular syntactic marker is not necessarily a part of the unsafety-inducing feature

I think a separate memory-unsafe keyword would be very nice! It could then be eventually possible to prove memory-safety at the Lean-level at a particular use site. Using unsafe makes such proofs impossible (as far as I am aware).

Mario Carneiro (Oct 22 2023 at 03:48):

Mac Malone said:

Mario Carneiro said:

Every method of breaking memory safety I know comes with some kind of calling card

The way I was thinking of is that one can just directly override a definition's IR in the the declMapExt to be something memory-unsafe (i.e., a kind of implemented_by without the attribute).

The calling card here is nasty metaprogramming code you have to write yourself. Again, this sounds like a malicious user, the right defense here is to say that this is not good code and discourage it from promulgating

Mario Carneiro (Oct 22 2023 at 03:49):

You can also write procedural macros in rust that hide unsafe. Just Say No

Mario Carneiro (Oct 22 2023 at 03:52):

Mac Malone said:

Mario Carneiro said:

As in rust, one needs to sometimes grow the set of synonyms of unsafe as that particular syntactic marker is not necessarily a part of the unsafety-inducing feature

I think a separate memory-unsafe keyword would be very nice! It could then be eventually possible to prove memory-safety at the Lean-level at a particular use site. Using unsafe makes such proofs impossible (as far as I am aware).

Proving memory-safety at the lean level would be extremely difficult, as it would require a deep embedding of the lean runtime. It's a nice project idea but it is a long-term thing. (In that case it wouldn't matter whether you use unsafe or not, you can express what safety means explicitly and see what all functions are doing, unsafe or otherwise.)

Mac Malone (Oct 22 2023 at 04:34):

Mario Carneiro said:

The calling card here is nasty metaprogramming code you have to write yourself. Again, this sounds like a malicious user, the right defense here is to say that this is not good code and discourage it from promulgating

I think you are undervaluing the positive benefits of this kind of metaprogramming shenanigans. I imagine there will be many DSLs that inject code directly and may not themselves be memory-safe (and thus promulgate downwards). For instance, Alloy already allows you to write memory-unsafe code, and would probably be doing a trick like this for C code generation if it was currently possible. FFI code may also enable direct access to memory primitives to enable more bare metal coding for performance-intensive actions. For example, the CompactedRegion object used in the Lean core.

The point here is that I do not think it will generally be reasonable to expect that Lean libraries will be contractually memory-safe like they usually are in Rust due to the fact that it simply isn't strictly enforced (and likely cannot be) by the language. It is also not like this is without precedence within functional programming. It is also relatively easy to write memory-unsafe code in Haskell, for instance. I do still think that libraries should generally strive to be memory-safe and memory-unsafety is a usually a code smell. But I think that calling Lean a memory-safe language may give users expectations that it is like Rust and lead to confusion and frustration when they find out memory unsafety is quite possible and already exposed publicly (without unsafe) in just core.

Mario Carneiro (Oct 22 2023 at 04:44):

I think you are undervaluing the positive benefits of this kind of metaprogramming shenanigans.

I am not, metaprogramming shenanigans are of course useful and I do them all the time. But they are tricky and require careful auditing. They can be as bad as any unsafe code, although they usually aren't.

For instance, Alloy already allows you to write memory-unsafe code, and would probably be doing a trick like this for C code generation if it was currently possible.

As far as the "culture" is concerned, good metaprograms should be clear about whether they are always safe, always unsafe or whether they contain an internal "calling card" for opting in to unsafety. For alloy I think one would just say that every alloy block is unsafe, because C has no meaningful additional unsafety signpost.

Mario Carneiro (Oct 22 2023 at 04:48):

The point here is that I do not think it will generally be reasonable to expect that Lean libraries will be contractually memory-safe like they usually are in Rust due to the fact that it simply isn't strictly enforced (and likely cannot be) by the language.

I think you overestimate the difference between rust and lean in this regard. Rust "strictly enforces memory safety" in the same way that lean "strictly enforces memory safety": there is a large and useful subset of the language with the property that if you stay in that subset then memory safety is promised. You are not prevented from exiting that subset, but it is considered good design to ensure that whenever you leave the garden you put a "here be dragons" sign

Mario Carneiro (Oct 22 2023 at 04:51):

See the plutonium crate for a great example of a library that deliberately ignores this design practice for comedic effect

Mac Malone (Oct 22 2023 at 05:07):

Mario Carneiro said:

You are not prevented from exiting that subset, but it is considered good design to ensure that whenever you leave the garden you put a "here be dragons" sign

My understanding of Rust is that the signpost is not optional: memory-unsafe code must be marked unsafe. Even the plutonium crate you mentioned just hides the unsafe marking from the user, the code is still marked as such to Rust's view of it. Lean can have memory-unsafe code with no internal marking, which I consider to be a very significant difference. I view Rust's rules on memory-safety to be much closer to Lean's rules on type-safety than its rules on memory-safety. You also noted that this comparison worked somewhat in the reverse -- as Rust would treat failure to mark memory-safety as unsoundness. However, I can understand how one could view this difference to be practically less significant than I do.

Mario Carneiro (Oct 22 2023 at 05:18):

Lean can have memory-unsafe code with no internal marking, which I consider to be a very significant difference.

Lean has an internal marking for unsafe code, .unsafe is a DeclarationSafety setting and if you don't use it the kernel rejects the definition

Mario Carneiro (Oct 22 2023 at 05:20):

Macros/elaborators/tactics are not required to surface this unsafe as a piece of literal input text (although they are encouraged to), but that's analogous to what the #[safe] macro is doing in the plutonium crate

Mario Carneiro (Oct 22 2023 at 05:24):

It is true though that Rust has a well defined "AST" layer which all macros desugar to, such that thereafter everything is in the hands of the compiler and we can make clearer statements about what is and is not allowed (and in particular, all unsafe ops have to be wrapped in unsafe by this point). In lean this layer does not really exist, as user code continues to interact at the AST layer and the typechecker layer. The only point where you can sensibly say the user has to take their hands off now and let the core do its thing is the Expr layer and the kernel

Mac Malone (Oct 22 2023 at 05:24):

@Mario Carneiro I'm confused. Doesn't that just further demonstrate that Rust memory-safety and Lean type-safety have more in common than Lean memory-safety and Rust memory-safety?

Mac Malone (Oct 22 2023 at 05:25):

As DeclarationSafety is type-safety not memory-safety and the kernel-checked Expr layer works like Rust's compiler-checked AST layer.

Mario Carneiro (Oct 22 2023 at 05:26):

Rust memory-safety of course has parallels to both Lean memory-safety and lean type-safety. But Rust memory-safety is not really a type system property, it is partially upheld by type system rules but unsafe is an AST layer effect not a type checker effect

Mario Carneiro (Oct 22 2023 at 05:31):

As far as the rust soundness theorem goes, every unsafe block comes with an extra proof obligation (you can imagine := sorry being stuffed somewhere in the syntax) arguing that memory safety is not broken by this unsafe block. In that way it certainly looks like a type system property, and one could in principle do the same thing for lean, embedding the runtime and making assertions about the memory safety of unsafe blocks (and the compiler proving these assertions for anything satisfying the regular lean type system)

Mac Malone (Oct 22 2023 at 05:33):

Lean rigorously checks type-safety via the kernel and Rust rigorously checks memory-safety via the compiler. One can only bypass this checks via a mandatory unsafe marking (which may or may not exposed at user-level syntax, but is a code smell if not). Thus, Lean type-safety and Rust memory-safety can be seen to have a comparable level of rigor in this respect. For memory-safety, Lean performs no such rigorous check. Thus, Lean memory-safety can be seen as being on a significantly lower level to both Rust memory-safety and Lean type-safety, and, therefore, Lean is not memory-safe in the way it is type-safe or the way Rust is memory-safe.

Mac Malone (Oct 22 2023 at 05:34):

This is the general thesis I was attempting to assert based on my understanding of the two languages.

Mario Carneiro (Oct 22 2023 at 05:44):

For memory-safety, Lean performs no such rigorous check.

At some level I agree with you, Lean has certainly not prioritized this. However, the core language itself (CIC stuff) is memory-safe (to the best of our knowledge), which means that memory safety is possible to layer on top without much difficulty: you just have to treat any departure from that base language as unsafe, and then trim the paranoia back for certain additional things which we can reason are safe in all possible use cases.

Here's another rust example: the #[no_mangle] attribute is unsafe, because it allows you to cause name collisions between functions with different signatures (the linker will merge the two definitions, and hence this ends up as an accidental implemented_by). There is no word unsafe in the attribute name, and this is considered unfortunate, but it doesn't really break the memory safety promise of the language itself, they just say that you should treat uses of this attribute as equivalent to an unsafe block, and the unsafe_code linter is taught this as well, so if you enable this then you can be sure that if the lint doesn't fire then there are neither unsafe blocks nor #[no_mangle] definitions in the crate.

Mario Carneiro (Oct 22 2023 at 05:48):

Passing the unsafe_code linter is a really high bar, most projects at a certain level of complexity will have need for some nonzero amount of unsafe code, but a surprisingly large amount of code in the wild can be done with no unsafe at all (in both rust and lean), so it's not as useless a lint as you might think. More realistically, there might be some half dozen unsafe blocks in the project and you just surround them with copious amounts of documentation.

Mario Carneiro (Oct 22 2023 at 05:50):

(FFI crates in particular tend to have a much higher incidence of unsafe code. But you can still do your best to document the invariants of the project you are linking to and try to provide a safe interface so that the unsafety is limited to the FFI itself and downstream packages don't also need to pay the price.)


Last updated: Dec 20 2023 at 11:08 UTC