Zulip Chat Archive

Stream: general

Topic: acyclic datastrucure


Matthew Weingarten (Nov 24 2020 at 15:07):

@Leonardo de Moura @Sebastian Ullrich
Hi in the paper counting immutable beans you mention that lean and lambdapure do not allow for cyclic data structures. Does this follow from the type system used in Lean? Also how much does this limit the semantics of lean, like representing cyclic graphs?

Kevin Buzzard (Nov 24 2020 at 15:09):

I think Leo is quite busy now with Lean 4 and prefers not to be bothered, but you might be lucky with Sebastian :-)

Jannis Limperg (Nov 24 2020 at 15:18):

Inhabitants of the inductive data types of Lean (and similar systems) are trees of finite height, so they are acyclic. This is probably what the statement refers to. You can still represent cyclic data structures, e.g. by giving a tree and a list of 'back edges' or an adjacency matrix; the representation is just not very direct.

Adam Topaz (Nov 24 2020 at 15:24):

Isn't it also possible to represent cyclic data structures as quotients of inductive types?

Adam Topaz (Nov 24 2020 at 15:27):

E.g.

inductive foo
| root : foo
| left : foo  foo
| right : foo  foo

open foo

inductive rel : foo  foo  Prop
| glue : rel (left (right root)) (right (left root))

def cyclic_thing := quot rel

Sebastian Ullrich (Nov 24 2020 at 16:20):

Regarding the dynamic semantics, cycles are also ruled out by the combination of strictness and immutability. At least as long you don't add a specific special case like OCaml(?) does.

Mario Carneiro (Nov 24 2020 at 16:21):

how's that memoization work coming? :)

Sebastian Ullrich (Nov 24 2020 at 16:23):

I see many works but none of them start with memoization

Mario Carneiro (Nov 24 2020 at 16:24):

I spoke with simon a couple months ago about support for internal mutability, what I called cached in my original proposal

Mario Carneiro (Nov 24 2020 at 16:25):

parts of that made it into the paper about pointer based optimization but I think proper internal mutability is still in the air

Mario Carneiro (Nov 24 2020 at 16:25):

and I have a stack of FP algorithms that need it for efficiency

Sebastian Ullrich (Nov 24 2020 at 16:26):

I like the approach, it's just not something we currently need for implementing Lean itself

Sebastian Ullrich (Nov 24 2020 at 16:27):

Matthew Weingarten said:

Also how much does this limit the semantics of lean, like representing cyclic graphs?

How useful is a cyclic graph anyway if it is completely immutable? Yes, there are a few instances of "tying the knot" in GHC, but it's still a pretty obscure technique I'd say.

Mario Carneiro (Nov 24 2020 at 16:28):

cyclic data structures in haskell are actually not good for representing explicit sharing, because the representation is not observable

Mario Carneiro (Nov 24 2020 at 16:28):

that's part of what the pointer based optimization paper is about

Mario Carneiro (Nov 24 2020 at 16:30):

I did something like that in the haskell version of MM0 and regretted it, because you have to resort to some unsafe hackery to get at the pointers if you want to reify a cyclic (or acyclic but highly shared) data structure as an actual graph with nodes

Johan Commelin (Nov 24 2020 at 19:44):

Btw, is there any way to do memoization in lean 3? I guess there could be a memoize monad? afaik there is nothing atm, right?

Mario Carneiro (Nov 24 2020 at 20:20):

You can use ref in the tactic monad but that's it

Mario Carneiro (Nov 24 2020 at 20:21):

the best alternative is to throw everything in a hashtable or rbmap

Johan Commelin (Nov 24 2020 at 20:29):

And then explicitly pass that rbmap around?

Mario Carneiro (Nov 24 2020 at 21:05):

Yeah

Matthew Weingarten (Dec 02 2020 at 14:44):

Thanks guys for the responses!

Jason Rute (Dec 10 2020 at 14:09):

Johan Commelin said:

Btw, is there any way to do memoization in lean 3? I guess there could be a memoize monad? afaik there is nothing atm, right?

In some sense the State monad is a memoize monad if you use a rb_map or structure containing an rb_map as the state. The cache is invalidated after you leave the monad, but that is sufficient for dynamic programming application which is where I often see memoized functions. I coded up a Fibonacci example in Lean 4 just because I was having fun trying out Lean 4, but the same idea should work in Lean 3.

Mario Carneiro (Dec 10 2020 at 14:12):

fun fact, you can't prove that this computes the naive fib function

Jason Rute (Dec 10 2020 at 14:16):

Yeah, I assume the need for memoization is mostly on the meta-programming side, but maybe I'm mistaken on that.

Johan Commelin (Dec 10 2020 at 14:47):

I guess for more global versions where the cache persists after leaving the monad, you would quickly start using databases, right?

Johan Commelin (Dec 10 2020 at 14:48):

Mario Carneiro said:

fun fact, you can't prove that this computes the naive fib function

This is a bummer. Because it limits to what extent we can use lean as a CAS in some nice future.

Mario Carneiro (Dec 10 2020 at 14:49):

I maintain that the partial keyword was a mistake

Johan Commelin (Dec 10 2020 at 14:49):

As in, you can still write these functions, but you can't tie them back to the proven correct things. So we would have def elliptic_curve and meta def fast_elliptic_curve and we can compute with the latter, but can't show that it has any relation to the elliptic_curve that you prove things about.

Mario Carneiro (Dec 10 2020 at 14:50):

well, lean 4 does have implementedBy or whatever it's called, so you can do this if you want to, by cheating

Ruben Van de Velde (Dec 10 2020 at 14:52):

What's making it impossible to prove? (If it wouldn't take too long to explain)

Mario Carneiro (Dec 10 2020 at 14:53):

it makes use of a constant fuel : nat := 0

Mario Carneiro (Dec 10 2020 at 14:54):

this constant is effectively infinity in the VM, meaning that we have an omega-inconsistent model

Mario Carneiro (Dec 10 2020 at 14:56):

but in pure code you can't prove that it is any particular value, so it is consistent that every partial function fails to compute what it was supposed to

Sebastian Ullrich (Dec 10 2020 at 15:09):

You shouldn't need partial here

Jason Rute (Dec 10 2020 at 15:12):

I only added it because I got some message about the equation compiler and didn't want to deal with it :smile:

Sebastian Ullrich (Dec 10 2020 at 15:15):

Another fun fact: absent sufficient inlining (which in this particular example should not be an issue), you might be copying the whole HashMap array on each mutation, because you're not using it linearly (get isn't linear).

Jason Rute (Dec 10 2020 at 15:40):

Fixed. No more partial or get...set.

Alex J. Best (Dec 10 2020 at 15:41):

So without partial one can prove that this computes the usual fib right? (just checking I'm understanding everyones comments)

Leonardo de Moura (Dec 10 2020 at 15:49):

For official information about how partial works, see
http://leanprover.github.io/talks/LeanPLDI.pdf (slide 15) and
https://leanprover.github.io/lean4/doc/lean3changes.html (the meta keyword section).

Mario Carneiro (Dec 10 2020 at 16:00):

Does lean 4 let us track and maybe ban uses of the fuel constant? In lean 3 meta was viral but it seems that partial isn't, so it's not clear how to make sure it's not getting used

Leonardo de Moura (Dec 10 2020 at 16:09):

See links above. There is no fuel.

Kevin Lacker (Dec 10 2020 at 17:13):

in lean 4 will there be / is there a way to enforce that an object is never copied? like the ref count should always only be 1, a buffer where you only want to mutate it efficiently

Mario Carneiro (Dec 10 2020 at 17:22):

@Leonardo de Moura I thought that fuel appeared in the compilation for partial definitions? What happens if I #print foo where foo is a partial def? Can I even still do that like in lean 3 to inspect the result of the equation compiler? The provided links mention that the compilation uses unsafe + implementedBy but don't say what the pure version (the "reference model") is. Is the entire function just a constant with trivial witness?

Mario Carneiro (Dec 10 2020 at 17:25):

When the partial keyword is used, Lean generates an auxiliary unsafe definition that uses general recursion, and then defines an opaque constant that is implemented by this auxiliary definition. This is very simple, efficient, and is sufficient for users that want to use Lean as a regular programming language. A partial definition cannot use unsafe features such as unsafeCast and ptrAddrUnsafe, and it can only be used to implement types we already known to be inhabited. Finally, since we "seal" the auxiliary definition using an opaque constant, we cannot reason about partial definitions.

Mario Carneiro (Dec 10 2020 at 17:27):

So this:

partial def foo : Nat -> Nat
| 0 => 0
| (n+1) => foo n + 1

is compiled to this:

unsafe def foo.impl : Nat -> Nat
| 0 => 0
| (n+1) => foo.impl n + 1

@[implementedBy "foo.impl"]
constant foo : Nat -> Nat := fun _ => 0

Mario Carneiro (Dec 10 2020 at 17:29):

So I guess my earlier question about tracking uses of the fuel constant should be extended to tracking all partial constants

Mario Carneiro (Dec 10 2020 at 17:30):

Kevin Lacker said:

in lean 4 will there be / is there a way to enforce that an object is never copied? like the ref count should always only be 1, a buffer where you only want to mutate it efficiently

I don't think this can be done without an actual linear type system

Mario Carneiro (Dec 10 2020 at 17:31):

I mean you could maybe set the runtime to crash if the refcount goes above 1, but to check at compile time? You would need to be rust

Leonardo de Moura (Dec 10 2020 at 17:34):

Mario Carneiro said:

So I guess my earlier question about tracking uses of the fuel constant should be extended to tracking all partial constants

Users can track whatever they want. They can import Lean, and access everything, write their own commands, and traverse all internal datastructures.

Patrick Massot (Dec 10 2020 at 17:36):

What's the point of such a tracking? The kernel won't accept proofs based on unsafe stuff, right?

Mario Carneiro (Dec 10 2020 at 17:49):

This stuff isn't unsafe

Mario Carneiro (Dec 10 2020 at 17:49):

that's the problem

Reid Barton (Dec 10 2020 at 17:49):

but you also can't prove anything about it

Mario Carneiro (Dec 10 2020 at 17:50):

that's not entirely true, you can prove it's a function for example

Mario Carneiro (Dec 10 2020 at 17:50):

(which, obvious as it may sound, need not be true for unsafe things)

Reid Barton (Dec 10 2020 at 17:50):

I don't understand what the problem is though

Patrick Massot (Dec 10 2020 at 17:50):

I'm not worried at all.

Mario Carneiro (Dec 10 2020 at 17:51):

it doesn't compromise the logical integrity of the system, but it does compromise the relationship between the logical representation and the compiler

Jason Rute (Dec 10 2020 at 17:56):

Then I think your issue is more with implementedby, than unsafe, no?

Mario Carneiro (Dec 10 2020 at 17:56):

I do think it would be weird to the point of rejection in code review if any partial def was involved in (the transitive closure of) the definition of some pure mathlib concept like list.sort or comm_ring

Mario Carneiro (Dec 10 2020 at 17:57):

yes, this is about implementedBy

Mario Carneiro (Dec 10 2020 at 17:58):

I am okay with adding implementedBy annotations when we can prove at the meta level that the definition is faithfully represented in the executable version

Mario Carneiro (Dec 10 2020 at 17:59):

but to maintain this property we have to carefully consider each implementedBy location. It's basically like unsafe blocks in rust

Mario Carneiro (Dec 10 2020 at 17:59):

unsafe code itself is fine, it makes no claims about anything

Mario Carneiro (Dec 10 2020 at 17:59):

I don't see any reason to use partial over unsafe

Leonardo de Moura (Dec 10 2020 at 18:11):

-- Error `foo1` uses unsafe declaration `unsafeCast`
partial def foo1 (x : Nat) : String :=
  unsafeCast x

-- Ok
unsafe def foo2 (x : Nat) : String :=
  unsafeCast x

Mario Carneiro (Dec 10 2020 at 18:14):

So the main function is to catch you from doing bad things? I get that, that's useful. But unsafe is already in the name of that function, and meta has been like this in lean 3 and it hasn't ever really been an issue

Mario Carneiro (Dec 10 2020 at 18:15):

that is, if 90% of the elaborator was written as unsafe instead of partial, would it have caused significant maintenance problems?

Reid Barton (Dec 10 2020 at 18:16):

Presumably it would be more like 90% was unsafe versus 20% was partial

Mario Carneiro (Dec 10 2020 at 18:17):

That's only because partial is not viral even though it should be

Mario Carneiro (Dec 10 2020 at 18:17):

if partial is meant in the sense "not total", then it should absolutely be viral

Reid Barton (Dec 10 2020 at 18:18):

I have no idea what you're arguing really

Reid Barton (Dec 10 2020 at 18:18):

but for a programming language it's not a great look if you have to write partial def or unsafe def to define basically every function

Reid Barton (Dec 10 2020 at 18:18):

and for mathlib it doesn't seem to matter anyways

Mario Carneiro (Dec 10 2020 at 18:20):

I think this is just a clash of traditions regarding what safety means in regular programming languages vs formal logics

Mario Carneiro (Dec 10 2020 at 18:20):

no it's not a good look to be saying that literally everything is unsafe, but it's kind of the truth

Mario Carneiro (Dec 10 2020 at 18:21):

it's just that most programming languages don't make a big fuss about it

Mario Carneiro (Dec 10 2020 at 18:21):

and maybe lean 4 shouldn't either, after all it's more like haskell now than lean 3

Johan Commelin (Dec 10 2020 at 18:25):

Reid Barton said:

but for a programming language it's not a great look if you have to write partial def or unsafe def to define basically every function

If that's just because it costs you 7 or 8 extra chars before the def, then I could imagine having pdef and udef.

Mario Carneiro (Dec 10 2020 at 18:26):

I'm pretty sure that the reason Reid says that is that for most programmers, unsafe is supposed to sound the warning klaxons, and it loses its strength if it is used constantly

Mario Carneiro (Dec 10 2020 at 18:30):

In lean 3, this signal was dampened by using a more neutral word meta, even though it's just as dangerous as lean 4's unsafe. I think that having all this unsafe/meta/ whatever sounding the alarm bells isn't necessarily a bad thing since it motivates people to chip away at it, making more interfaces safe/provable. Lean 4 solves the problem by using the non-viral partial, which silences the klaxons without improving the situation

Leonardo de Moura (Dec 10 2020 at 18:31):

Some clarifications:

  • Logical integrity is preserved. You will not be able to prove theorem unsound : False because of partial or unsafe.
  • You can only use partial if the type is inhabited.
  • We type check partial functions, and they cannot use unsafe functions that may crash your program (e.g., unsafeCast), use unsafe inductive types, etc. So, you only gain the ability to perform "general recursion".
  • You can write a non-partial function f that uses a partial function p and still prove many things about it @Johan Commelin . f will not be opaque. Of course, you will not be able to rely on the actual implementation of p.
  • Similarly, you can prove things about a function f that uses an opaque constant g tagged with [extern] (i.e., a foreign function).

Leonardo de Moura (Dec 10 2020 at 18:33):

  • unsafe can be used even when the type is not inhabited. partial cannot.

Mario Carneiro (Dec 10 2020 at 18:34):

Also to clarify: I'm not arguing for any change on the part of the lean team, but when it eventually comes out one of the things I will be looking into is an improvement to the equation compiler to compile partial def into a representation that you can prove things about

Johan Commelin (Dec 10 2020 at 18:36):

Leonardo de Moura said:

  • You can write a non-partial function f that uses a partial function p and still prove many things about it Johan Commelin . f will not be opaque. Of course, you will not be able to rely on the actual implementation of p.

Cool! Thanks for the clarification!

Leonardo de Moura (Dec 10 2020 at 18:38):

Mario Carneiro said:

Also to clarify: I'm not arguing for any change on the part of the lean team, but when it eventually comes out one of the things I will be looking into is an improvement to the equation compiler to compile partial def into a representation that you can prove things about

This is fine. Isabelle is a good reference for this.
I just want to make sure users understand how partial works and avoid misinformation being spread about it.

Mario Carneiro (Dec 10 2020 at 18:39):

Yes, sorry for the one-sided portrayal. I agree with everything you said


Last updated: Dec 20 2023 at 11:08 UTC