Zulip Chat Archive
Stream: general
Topic: Pure functional programming and pointer-based optimization
Simon Hudon (Mar 04 2020 at 05:48):
Hi all! Here's a shameless plug of a paper I wrote with @Daniel Selsam and @Leonardo de Moura about programming techniques that can sometimes get us exponential speedups on functional programs written on expressions. Comments and questions welcome.
https://arxiv.org/abs/2003.01685
Simon Hudon (Mar 04 2020 at 05:50):
The techniques are more generally applicable than for writing theorem provers, I should add
Mario Carneiro (Mar 04 2020 at 06:08):
Is there a reason that functions like withPtrAddr
are written in CPS rather than returning a Squash Addr
?
Mario Carneiro (Mar 04 2020 at 06:10):
In your PtrCache
example, it appears to be impossible to prove that the item you placed is still in the same location when you retrieve it later
Simon Hudon (Mar 04 2020 at 06:12):
Mario Carneiro said:
In your
PtrCache
example, it appears to be impossible to prove that the item you placed is still in the same location when you retrieve it later
That's true and we're going to get duplicates if looking up two objects encoding the same value. That's one possible trade-off we can choose and it can be very appealing to pay the price
Simon Hudon (Mar 04 2020 at 06:12):
Mario Carneiro said:
Is there a reason that functions like
withPtrAddr
are written in CPS rather than returning aSquash Addr
?
I don't think we have considered Squash Addr
. Do you see a benefit to encoding it that way?
Mario Carneiro (Mar 04 2020 at 06:13):
it's more direct; it will probably get optimized away regardless of how it's written
Mario Carneiro (Mar 04 2020 at 06:14):
but with the Squash Addr
approach it seems more likely that you could exploit common subexpression elimination
Simon Hudon (Mar 04 2020 at 06:16):
In the context that we studied, we needed the value of the pointer at the same time that we were producing the value so we're pretty much merging ptrAddr
and Squash.lift
Mario Carneiro (Mar 04 2020 at 06:17):
Sure, but in general you can do other things with the value rather than examine it immediately, e.g. you could store it in a data structure
Mario Carneiro (Mar 04 2020 at 06:17):
Since Squash Addr
is just an int, it lowers to C very directly
Mario Carneiro (Mar 04 2020 at 06:18):
What's the status of mutable cells in lean 4 (a la my lean 3 cached
proposal)? This stuff looks really close to finally having proper thunks
Mario Carneiro (Mar 04 2020 at 06:19):
(BTW you can still use withPtrAddr
as a "get pointer value" function if you call withPtrAddr x Squash.mk
)
Simon Hudon (Mar 04 2020 at 06:20):
Mario Carneiro said:
Since
Squash Addr
is just an int, it lowers to C very directly
I'd love to see an example of that if you find one. My intuition is that it's going to be hard to make use of the pointer long term because globally, we can't even prove that two objects are equal if their addresses are equal
Mario Carneiro (Mar 04 2020 at 06:22):
You are right, it is subject to the same limitations as PtrCache
that I mentioned above. This getPtr
function has properties that can be exploited but cannot be expressed in lean, so it will be possible to write correct programs (using only safe code) that cannot be proven correct
Simon Hudon (Mar 04 2020 at 06:22):
Mario Carneiro said:
What's the status of mutable cells in lean 4 (a la my lean 3
cached
proposal)? This stuff looks really close to finally having proper thunks
We were planning on including mutable cells in the paper but then we found this paper and noticed that they had all our examples and because they are extracting to OCaml, they did not have to deal with the issues of implementing it in a pure programming language
Mario Carneiro (Mar 04 2020 at 06:23):
You should mention that it can be done (is being done?) and cite the paper then
Simon Hudon (Mar 04 2020 at 06:25):
Mario Carneiro said:
so it will be possible to write correct programs (using only safe code) that cannot be proven correct
With PtrCache
we still have a proof of functional correctness. We just can reason about duplicates, cache hits / miss. Also, the pointer itself doesn't give you anything that the value doesn't give you. With the value, you can use its pointer and you can dereference it (i.e. you look at the value itself). If you only have the address, you lose access to the value
Simon Hudon (Mar 04 2020 at 06:26):
Mario Carneiro said:
You should mention that it can be done (is being done?) and cite the paper then
That's really a separate project and a separate topic at this point so we gain nothing by including it in our discussion
Mario Carneiro (Mar 04 2020 at 06:26):
Here's an example of an algorithm that is correct but not provably correct: withPtrAddr x (\u -> withPtrAddr x (\v -> assert (u = v)))
Mario Carneiro (Mar 04 2020 at 06:28):
You can dress that up with a pointer cache, but it's ultimately the same problem. If you store x in a hash table at location hash &x
, then if you later look up the bucket for hash &x
you will get a bucket that contains x, but you can't prove that
Simon Hudon (Mar 04 2020 at 06:29):
As I said, there's a whole area where we can't specify the properties. If your specification is about addresses (e.g. if you're writing a device driver), this won't cut it. If you want to write a program that computes a value but you want to go faster, you can get a proof of functional correctness with our approach
Mario Carneiro (Mar 04 2020 at 06:30):
I'm saying that's not the case. The hash algorithm cannot be proven to work at all without knowing that addresses are stable
Mario Carneiro (Mar 04 2020 at 06:30):
unless it falls back on searching the whole table or something (which will not happen but lean can't tell)
Simon Hudon (Mar 04 2020 at 06:30):
We have a proof of the examples in the paper. They type check
Mario Carneiro (Mar 04 2020 at 06:31):
I don't see any proofs in the paper
Mario Carneiro (Mar 04 2020 at 06:31):
I see programs
Mario Carneiro (Mar 04 2020 at 06:32):
functional correctness has not been stated for PtrCache
, and I am asserting that there are natural properties that you can assert that it satisfies, and you will not be able to prove them
Simon Hudon (Mar 04 2020 at 06:34):
If you look at page 14, the proofs are in the types. We're using subsingleton's and subtypes which very neatly encapsulates them
Simon Hudon (Mar 04 2020 at 06:34):
The functional correctness is that evalNatPtrCache
produces the same result as evalNatNaive
Simon Hudon (Mar 04 2020 at 06:37):
natural properties that you can assert that it satisfies, and you will not be able to prove them
We agree. There's a whole lot that you can't do. But you can optimize your programs and prove that you're producing the same behavior. Pointers have only a superficial presence in the logic even if you get a narrow window into the run-time with these features.
Mario Carneiro (Mar 04 2020 at 06:38):
Why are pointer values stable in the first place? In haskell they definitely aren't
Simon Hudon (Mar 04 2020 at 06:41):
Haskell has a moving garbage collector (but StableName
helps compensate). Lean's memory management doesn't move objects. It's malloc style plus reference counting
Mario Carneiro (Mar 04 2020 at 06:42):
I saw that Semidecidable
was recently added. I think that this type can be usable in these kind of contexts, but it has to be squashed (e.g. isPtrEq x y : Squash (Semidecidable (x = y))
)
Mario Carneiro (Mar 04 2020 at 06:43):
again, the nice thing being that the memory representation of this type is exactly the same as bool
Simon Hudon (Mar 04 2020 at 06:43):
That's why it was added but then we judged the name wasn't right and we renamed it. It should be gone from git too now
Mario Carneiro (Mar 04 2020 at 06:44):
what is it called now? SemiDecidable
?
Simon Hudon (Mar 04 2020 at 06:44):
That I could imagine there's some use for
Simon Hudon (Mar 04 2020 at 06:44):
PtrEqResult
Mario Carneiro (Mar 04 2020 at 06:44):
oh so it's just specialized on x = y?
Mario Carneiro (Mar 04 2020 at 06:45):
I think semidecidable is a good type (modulo bikeshedding on the name)
Simon Hudon (Mar 04 2020 at 06:46):
Yes, that's right. I think so too but without a good name, it's awkward to add. SemiDecidable
doesn't quite cut it
Mario Carneiro (Mar 04 2020 at 06:47):
I don't see why semidecidable is even not accurate
Simon Hudon (Mar 04 2020 at 06:47):
As I recall, Coq has a special notation for this. I don't remember the name of the functions but I think you can write { some proof } + ErrorType
Mario Carneiro (Mar 04 2020 at 06:47):
Sure, it's option p
Mario Carneiro (Mar 04 2020 at 06:48):
When you have an honestly semidecidable problem, you can run a program with a timeout and it will say yes
or unknown
Simon Hudon (Mar 04 2020 at 06:49):
If you have a value of Decidable p
, that's a proof that p
is decidable. If you have a value of type SemiDecidable p
(if you call it that), it's not a proof that p
is semi-decidable. You need to state separately that, if p
is true and you try hard enough (for some notion of trying hard enough), you'll get a proof of p
Mario Carneiro (Mar 04 2020 at 06:50):
yeah but who cares about liveness
Simon Hudon (Mar 04 2020 at 06:50):
I encoded that separately as:
inductive SemiDecision (p : Prop) : Type | isTrue : p → SemiDecision | unknown {} : SemiDecision open SemiDecision def IsSemiDecisionProcedure {p} (f : ℕ → SemiDecision p) : Prop := ∀ hp : p, ∃ i, f i = isTrue hp def SemiDecidable (p : Prop) : Type := { f : ℕ → SemiDecision p // IsSemiDecisionProcedure f }
Mario Carneiro (Mar 04 2020 at 06:51):
I'm also hoping you guys discover the roption
trick for doing general recursion without that fuel stuff
Simon Hudon (Mar 04 2020 at 06:53):
The usefulness of the fuel vs the roption
trick are very different. The point of the fuel is to permit functions for which we don't bother proving that they terminate and still be able to run them. Not having an equation compiler yet and well-founded recursion, that's often useful for code we want to run but not prove anything about it
Mario Carneiro (Mar 04 2020 at 06:54):
it's poisonous just like meta
Simon Hudon (Mar 04 2020 at 06:54):
it's just a way of allowing those functions in without making the logic inconsistent
Mario Carneiro (Mar 04 2020 at 06:54):
and I'm not even sure it is consistent
Mario Carneiro (Mar 04 2020 at 06:55):
or rather, "realistic" in the sense of VM matching the logic
Simon Hudon (Mar 04 2020 at 06:56):
We can't use the trick unless the return type is non-empty
Mario Carneiro (Mar 04 2020 at 06:56):
roption
requires no such hack
Simon Hudon (Mar 04 2020 at 06:57):
You need to make your code monadic as soon as you use it though
Mario Carneiro (Mar 04 2020 at 06:57):
equation compiler can do that
Mario Carneiro (Mar 04 2020 at 06:58):
you just stick the partial
keyword on and it does the magic
Mario Carneiro (Mar 04 2020 at 06:58):
but yeah, it is a monad
Mario Carneiro (Mar 04 2020 at 06:58):
if you want to cheat and get out of the monad, use roption.unwrap
Mario Carneiro (Mar 04 2020 at 06:59):
but it's the only method that actually allows you to write a general recursive function and prove it is defined later
Simon Hudon (Mar 04 2020 at 06:59):
It's inconvenient with the calling code. Right now, we don't have tactics to do a lot of proofs and we don't have an equation compiler but we still need to write the compiler and the prover itself. I don't know if partial is here to stay but at the moment, it's a good solution
Mario Carneiro (Mar 04 2020 at 07:00):
just use meta
Simon Hudon (Mar 04 2020 at 07:00):
It's not there anymore
Mario Carneiro (Mar 04 2020 at 07:00):
is there a reason that approach is no longer possible
Mario Carneiro (Mar 04 2020 at 07:01):
At least meta didn't add things into pure code where they could possibly cause unsoundness
Mario Carneiro (Mar 04 2020 at 07:02):
all these "pure primitives" require a soundness justification
Simon Hudon (Mar 04 2020 at 07:02):
The idea is to separate it between partial
and unsafe
. unsafe
is contagious while partial
is not. We can restrict unsafe
to very little code and use normal Lean to write automation
Simon Hudon (Mar 04 2020 at 07:02):
The justification is there
Simon Hudon (Mar 04 2020 at 07:03):
What you have a problem with is the difference between the logic and the code generated by the compiler
Mario Carneiro (Mar 04 2020 at 07:03):
yes, this property needs a name
Mario Carneiro (Mar 04 2020 at 07:04):
Reid suggested "realism" but this is also possibly confusing
Simon Hudon (Mar 04 2020 at 07:04):
How about faithfulness of the execution model?
Mario Carneiro (Mar 04 2020 at 07:04):
sure, wordy I guess
Mario Carneiro (Mar 04 2020 at 07:05):
It becomes harder to prove this property when you are explicitly lying to pure code
Mario Carneiro (Mar 04 2020 at 07:05):
like telling it that there is an infinite natural number, or saying that a function that returns 0xdeadbeef in practice always returns 0
Mario Carneiro (Mar 04 2020 at 07:06):
I think the whole story is more complicated than you think
Simon Hudon (Mar 04 2020 at 07:06):
I don't think Lean is ready for that kind of proof
Mario Carneiro (Mar 04 2020 at 07:06):
that's for sure
Simon Hudon (Mar 04 2020 at 07:07):
or that it's the highest priority right now
Mario Carneiro (Mar 04 2020 at 07:07):
I'm worried that lean 4 is taking a step away from full stack soundness because of all the new programming features, the proof stuff is being left behind
Mario Carneiro (Mar 04 2020 at 07:08):
sure it's not meta
/unsafe
anymore, but being pure code means less now than it used to
Mario Carneiro (Mar 04 2020 at 07:09):
all for very good reasons of course, but just try not to forget about full functional correctness proofs
Mario Carneiro (Mar 04 2020 at 07:10):
all I ever wanted was fast and provably correct programs. Lean 3 fails on the first point, and lean 4 is so close to being able to satisfy this but might end up failing on the second
Simon Hudon (Mar 04 2020 at 07:14):
I think regardless, Lean 4 is going to be a step forward. If you do manage to prove its correctness, that's going to be interesting. Now that it's in large part written in Lean, that Expr
is in safe code, that might be more accessible. At the very least, I think a lot of mistakes are being ruled out so we can make progress faster
Simon Hudon (Mar 04 2020 at 07:16):
Btw, here is the Coq construct I was referring to earlier:
Inductive sumor (A:Type) (B:Prop) : Type := | inleft : A -> A + {B} | inright : B -> A + {B} where "A + { B }" := (sumor A B) : type_scope.
Mario Carneiro (Mar 04 2020 at 08:11):
I guess that's just A + plift B
for us
Anton Lorenzen (Mar 04 2020 at 10:44):
What is "that fuel stuff"? Is that a Lean 4 thing?
Marc Huisinga (Mar 04 2020 at 11:15):
presumably passing in another parameter that dictates the maximum amount of steps that you run a function so that it always terminates. you run the function until it runs out of fuel.
Mario Carneiro (Mar 04 2020 at 11:16):
and then there is a magic value that you can pass that acts like infinity for unbounded recursions
Chris B (Mar 04 2020 at 11:32):
How do you pull that part off?
Mario Carneiro (Mar 04 2020 at 11:41):
I looked into it a while ago, and I forget the details, but I think this constant is marked such that you can't use it in proofs. It interacts with partial
functions in a weird way though and having a "largest natural number" lying around doesn't make me happy
Sebastian Ullrich (Mar 04 2020 at 11:45):
It's a constant
, it doesn't really have a value. partial def
s are irreducible for all intents and purposes.
Mario Carneiro (Mar 04 2020 at 11:50):
Still, it is one more place where the logic and the VM disagree about something, and the constant can go in lots of places
Mario Carneiro (Mar 04 2020 at 11:53):
It doesn't matter what number it is, it's a nat
and lean knows that nat
has certain properties. For example if you evaluate a partial function at depth infinity+1
then it will return none, even if in reality this is impossible
Mario Carneiro (Mar 04 2020 at 11:55):
In other words, every partial
function is provably non-total, in the sense \ex n, f n = none
Mario Carneiro (Mar 04 2020 at 11:56):
even if the recursion equations are actually perfectly well defined
Mario Carneiro (Mar 04 2020 at 12:02):
Now I wonder if you can combind the pointer equality trick, which gives you true properties that you can't prove (two calls to withPtrEq
give the same result), with the infinite recursion, which gives you true but not finitely observable properties that you can disprove (the partial
function f 0 = 1
and f(n+1) = f n
never returns 0
), to get a true and finitely observable fact that is disprovable
Gabriel Ebner (Mar 04 2020 at 12:38):
(I'm still reading through the discussion, I hope I didn't miss too much. These are just my first thoughts after skimming the paper.) I like that these features will be available in Lean 4. However the paper comes across as unorganized and without a clear and uniform design. There are two primitives for pointer equality tests, withPtrEq
and withPtrEqResult
. (For some reason there's only one version of withPtrAddr
and of course it works like withPtrEqResult
and not like withPtrEq
....) And then there's squashes thrown in as yet another way to do the same thing: i.e., to control the elimination principle. I think it would be good idea to reduce the number of primitive (concepts) here. Why can't you just add ptrAddr : α → ∥Addr∥
and ptrEq : ∀ a b, ∥PtrEqResult a b∥
like @Mario Carneiro suggested? Then it would be clear that all such "implementation details" like pointer equality are supposed to be in the squash monad. And you only need to learn one elimination principle. Or if that's too sophisticated, present at most one of the variants, i.e., only withPtrEqResult
and not withPtrEq
.
Gabriel Ebner (Mar 04 2020 at 12:53):
The Result
structure is also somewhat confusingly named. 1) It is not related to PtrEqResult
. 2) It is not related to Rust's Result
(= Except
= Either
). 3) Why does it even mention the function and argument separately? def Result (y : β) := { output // output = y }
works as well. 4) I like subtypes better, how about { output // output = f x }
instead of Result f x
?
Reid Barton (Mar 04 2020 at 13:52):
Simon Hudon said:
I think regardless, Lean 4 is going to be a step forward. If you do manage to prove its correctness, that's going to be interesting. Now that it's in large part written in Lean, that
Expr
is in safe code, that might be more accessible. At the very least, I think a lot of mistakes are being ruled out so we can make progress faster
I know that currently the only Lean 4 program is Lean itself but I don't really care that much about proving the correctness of Lean. I do care about writing other fast and provably correct programs like computer algebra systems and algorithms in number theory and algebraic topology and cryptography. Lean 4 is like 90% of the way to what I want but this partial
stuff looks like a big headache.
Patrick Massot (Mar 04 2020 at 13:54):
Are we talking about proving the part of Lean that will be used to write tactics? If yes then why do we care? The kernel will still check the proofs in the end, right? And the story you discuss today doesn't require more trust in the kernel, right?
Reid Barton (Mar 04 2020 at 13:59):
I am talking about using Lean for more than theorem proving. For example writing a program to compute the number of points on an elliptic curve and proving it correct, and being able to compile the program to an executable which has similar performance to a C implementation.
Sebastian Ullrich (Mar 04 2020 at 14:04):
The biggest reason we are using partial
right now is because we can't do wellfoundedness proofs right now. If you want to prove your function total, that will of course continue to work.
Sebastian Ullrich (Mar 04 2020 at 14:04):
partial
is mostly for when you don't care about proofs
Reid Barton (Mar 04 2020 at 14:09):
But it has some special treatment with the equation compiler and general recursion, right?
Sebastian Ullrich (Mar 04 2020 at 14:26):
It does, but it's really shallow. Add a new Nat
param, recurse on it, and start it with hugeFuel
.
Reid Barton (Mar 04 2020 at 16:35):
I guess my remaining question is whether a hand-rolled total function defined by general recursion plus a separate proof of termination can have zero overhead relative to the code I would get from partial
Sebastian Ullrich (Mar 04 2020 at 17:11):
@Reid Barton The compiler output is independent of the mode of recursion: https://github.com/leanprover/lean4/blob/50207e2c5af73af8006f2f9652c6b90694066a50/src/library/util.h#L302-L304
If you want to encode general recursion in something custom like roption
, I suppose it will depend on how customizable the equation compiler is.
Simon Hudon (Mar 04 2020 at 19:54):
Reid Barton said:
I know that currently the only Lean 4 program is Lean itself but I don't really care that much about proving the correctness of Lean. I do care about writing other fast and provably correct programs like computer algebra systems and algorithms in number theory and algebraic topology and cryptography. Lean 4 is like 90% of the way to what I want but this
partial
stuff looks like a big headache.
What I was getting at is that, if 10% of Lean is implemented at all, that means that we have very few features to rely on to write Lean in Lean itself. When we go up to 20%, new features are available and we can make use of them in the implementation of Lean. Loops are something that we need from day 1 and well-founded recursion is not the first feature on the list. partial
is a great way around that even if it is schedule to disappear after WF recursion comes in -- I don't know if it actually is scheduled but when we do have WF recursion or even general recursion, you don't have to care about partial
in the programs that you write and prove
Mario Carneiro (Mar 04 2020 at 20:13):
There is a general mechanism to take the equations of the equation compiler and produce a relation that relates all input values to all calls (where it should be "decreasing"). You construct an inductive relation with these constructors and then acc
applied to that is the unique most general well foundedness condition, that allows you to say "a function with these equations computes here". There are no constraints on this compilation procedure, assuming the arguments have been tupled up as in lean 3 wf recursive compilation.
The upshot is that you can write any equations you want, and you can produce a function from some proposition, asserting that the particular input terminates, to the result value. If you don't want to prove well foundedness, you can either call it in the roption
monad, or you can pass a fake proof. Writing the equations and proving termination become separated, and you don't need any magic at all.
Patrick Stevens (Mar 05 2020 at 07:33):
Anton Lorenzen said:
What is "that fuel stuff"? Is that a Lean 4 thing?
This probably refers to "petrol-driven computation". If you have a computation whose halting behaviour you don't know, a total language might not let you run it (because it might not terminate); but you can still always run it for some maximum number of computation steps, unfolding the state of the computation as you go. After a thousand steps, maybe it's terminated or maybe it hasn't, but you can certainly unfold the computation up to that point. We say you've "provided 1000 fuel" to the computation, and this may or may not be enough to complete it.
Last updated: Dec 20 2023 at 11:08 UTC