Zulip Chat Archive

Stream: lean4

Topic: two RBMaps


Floris van Doorn (Jul 26 2023 at 13:29):

What is the difference between docs#Lean.RBMap and docs#Std.RBMap ? Why are there two copies? Which one should I use?

Ruben Van de Velde (Jul 26 2023 at 13:45):

The Lean one is only/primarily for use within the compiler; unless you're interfacing with that, the std one is probably better

Henrik Böving (Jul 26 2023 at 13:48):

Floris van Doorn said:

What is the difference between docs#Lean.RBMap and docs#Std.RBMap ? Why are there two copies? Which one should I use?

The why is: Lean's compiler does not (possibly yet) want to depend on std4 to build but std4 wants to do additional experimentation on top of the existing RbMap in Lean, the compiler datastructures do however only contain what the compiler needs so instead there are now two of the m that might or might not be merged one day

Floris van Doorn (Jul 26 2023 at 15:14):

Thanks!

(I ended up using the Lean one, since I my main application was NameMap/NameSet)

Mac Malone (Jul 26 2023 at 17:16):

@Mario Carneiro I remember we discussed this once, but I forgot the details. Why is Std.RBMap defined in terms of RBSet instead of the other way around (e.g., like essentially in core or where RBSet a := RBMap a Unit)? It seems this would reduce a level of pointer indirection (the RBMap tuple could be "unboxed" in the Node) without significant overhead (a pointless Unit field in the node for an RBSet).

Mario Carneiro (Jul 26 2023 at 18:19):

Because it would require duplicating all the proofs with a useless field. Maybe I will do it at some point but I'd rather hold out for lean making this zero-cost

Mario Carneiro (Jul 26 2023 at 18:27):

While this issue exists, it puts adverse incentives on almost everything to make the code worse in exchange for optimization. I recently debugged an issue along the same lines with @Yicheng Qian : they had code like

let mut a := 0
let mut b := 0
let mut c := 0
let mut d := 0
let mut e := 0
while true do
  -- hot loop which does some basic arithmetic and modifies a,b,c,d,e
  a := a + b
  if b = 0 then break
  b := b - 1
  c := 2 * a + c
  ...

and it turned out that the performance was dominated by the fact that the do notation tuples all the mutable variables after every iteration like (a, (b, (c, (d, e))) (i.e. 4 allocations) and untuples them at the beginning of the loop. This was run once per character as part of a parser, and this single allocation contributed to over 50% of the runtime

Mac Malone (Jul 26 2023 at 22:24):

Mario Carneiro said:

Because it would require duplicating all the proofs with a useless field. Maybe I will do it at some point but I'd rather hold out for lean making this zero-cost

Why would it require duplication? If RBNode was changed to have k/v, the proofs would be in terms of that, which would carry on through through the definitions of RBMap and RBSet (just as inversely the RBSet structure/proofs are carried on through for RBMap now). Right?

Mac Malone (Jul 26 2023 at 22:35):

Mario Carneiro said:

the fact that the do notation tuples all the mutable variables after every iteration like (a, (b, (c, (d, e))) (i.e. 4 allocations) and untuples them at the beginning of the loop. This was run once per character as part of a parser, and this single allocation contributed to over 50% of the runtime

Yeah, using loops in do notation is a bad idea if you are looking for primitive loop behavior. If you want that, you generally need to use a tail-recursive function. Even in Sebastian's upgraded do unchained notation, this seems like it would still be a problem, so I think holding out hope for this to be optimized in the future is unlikely in any near-term situation. Part of this could be alleviated by wrapping up all the mutable variables in a single autogenerated on-demand structure type, though some packing and unpacking would still likely occur (especially with more nested do use). On the flip side, statements by Leo has given me the impression that improving/rewriting do notation is not high on the priority list.

Mario Carneiro (Jul 26 2023 at 22:36):

the common through line here is not improvements to the do notation, but rather inlined structure layouts

Mario Carneiro (Jul 26 2023 at 22:36):

if RBMap inlined its pair in the RBSet data structure we wouldn't need to talk about this

Mario Carneiro (Jul 26 2023 at 22:37):

and AssocList wouldn't even need to exist

Mario Carneiro (Jul 26 2023 at 22:38):

I don't think RBSet should be changed, it has a good API for what it does and you are talking about things that would change the API (of the lemmas)

Mac Malone (Jul 26 2023 at 22:40):

@Mario Carneiro My understanding is that inlined structure layouts are also not on the todo list, especially automatically. The only thing in this realm that I have heard being considered is to be ability to manually mark certain structure types as unboxed and they then they will be passed by-value rather than by-aggregate. This would only occur in cases where the fixed type is known at codegen time, so this would not apply for polymorphic.

Mario Carneiro (Jul 26 2023 at 22:40):

It doesn't have to be automatic, as soon as it is opt in RBMap will definitely opt in

Mac Malone (Jul 26 2023 at 22:41):

@Mario Carneiro My point is that, in that feature I have heard discussed in meetings, RBMap is not the kind of type that can opt-in.

Mario Carneiro (Jul 26 2023 at 22:42):

the only reason to consider the RBMap refactor you are describing is to change the data layout. This needs to be solved at the data layout level, not at a level that complicates all the proofs

Mario Carneiro (Jul 26 2023 at 22:43):

The situation is basically the same for AssocList, it is dumb to have to copy the whole List API just because you want to store two things instead of one

Mario Carneiro (Jul 26 2023 at 22:43):

it basically means that structures don't work as a concept

Mac Malone (Jul 26 2023 at 23:03):

Mario Carneiro said:

the only reason to consider the RBMap refactor you are describing is to change the data layout. This needs to be solved at the data layout level, not at a level that complicates all the proofs

My argument is to this is simply one of expediency. It is unlikely that this will be solved in the compiler anytime soon, and it is not clear to me how this could be solved at all without a major refactor (the only way I can see to implement this would require specializing all upstream functions -- both existing and future -- to RBMap which would require some significant compiler overhaul and powerful type inference to make sure the right version is being used).

Nonetheless, removing a level of data indirection is a major performance gain for a core type likeRBMap, so I think a bit of complication for some significant performance boost is worth it, especially knowing it is unlikely they this will be solvable by the compiler for likely a good few years.

Mac Malone (Jul 26 2023 at 23:05):

Furthermore, this is likely to be a major sticking point with any future merge of Std with core, given that the core version is objectively better alternative for its primary core use cases due to its superior performance.

Mario Carneiro (Jul 26 2023 at 23:11):

Like I said, if this becomes a big enough issue we can make a copy of RBSet and transfer all the theorems. So far it hasn't been reported as an issue except in the theoretical sense

Mario Carneiro (Jul 26 2023 at 23:12):

So far, it is not a major issue in part because the core version is still available if you care

Mario Carneiro (Jul 26 2023 at 23:13):

keep in mind that RBMap is already indirection heavy since it is a binary tree. In fact you can probably levy similar complaints against the use of RBColor, that could just be baked into the discriminant at the cost of complicating the proofs

Mario Carneiro (Jul 26 2023 at 23:14):

that would save 8 bytes per node

Mario Carneiro (Jul 26 2023 at 23:15):

TBH this is why I don't like high performance programming in lean, there are just so many suboptimal things you can see from a mile away and optimizing for them is unsatisfying because all abstractions come with a cost, including whatever you are in the process of building

Mario Carneiro (Jul 26 2023 at 23:21):

Writing leangz was great, I could look at the data format and say "80% of this is unnecessary" and end up with 7x space reduction, before compression. There is just so much overhead in the object format, it is not at all space-optimized. Lean just isn't tuned for this kind of thing, and it lacks the facilities for adequate control over data layout

Mac Malone (Jul 26 2023 at 23:28):

Mario Carneiro said:

TBH this is why I don't like high performance programming in lean, there are just so many suboptimal things you can see from a mile away and optimizing for them is unsatisfying because all abstractions come with a cost, including whatever you are in the process of building

I think this is our major point of contention. I would like to use Lean for all my computing (high performance or otherwise) in the future. I am also happy to sacrifice ease of provability when necessary to achieve this. However, I think Lean generally provides enough meta-programming tools (and will provide even more in the future with the Lean-level compiler) to mitigate most sacrifices.

Mario Carneiro (Jul 26 2023 at 23:30):

In lean, the proofs and the defeqs are part of the API of the type. You need to be able to exert separate control over them in a good library

Mario Carneiro (Jul 26 2023 at 23:30):

implemented_by is great for that

Mac Malone (Jul 26 2023 at 23:31):

Mario Carneiro said:

Like I said, if this becomes a big enough issue we can make a copy of RBSet and transfer all the theorems. So far it hasn't been reported as an issue except in the theoretical sense

I am still not clear though on why this change would require so much duplication. Why can't most theorems just be written for RBMap and lifted to RBSet when RBSet a := RBMap a Unit (just as the inverse is currently done for RBMap in terms of RBSet)?

Mario Carneiro (Jul 26 2023 at 23:31):

but when it comes to data layout the inductive and the layout are inextricably linked

Mario Carneiro (Jul 26 2023 at 23:32):

It would mean having an additional useless argument in all of the RBNode theorems, i.e. an API change

Mario Carneiro (Jul 26 2023 at 23:32):

and besides, the same problem comes up again if you want to store two things instead of one

Mac Malone (Jul 26 2023 at 23:32):

Mario Carneiro said:

In lean, the proofs and the defeqs are part of the API of the type. You need to be able to exert separate control over them in a good library

This depends on whether the library is meant for proving or for computation. If the later, they may not be part of API (i.e., they may be considered a non-breaking implementation detail).

Mario Carneiro (Jul 26 2023 at 23:33):

This depends on whether the library is meant for proving or for computation

It's both!

Mario Carneiro (Jul 26 2023 at 23:33):

There are a lot of theorems about RBNode

Mario Carneiro (Jul 26 2023 at 23:34):

RBNode is a public type too

Mac Malone (Jul 26 2023 at 23:34):

@Mario Carneiro My point is that a library can make that decision, and cases like these may be evidence there may need to be a split between proof-centric standard library and a computation-centric library for complex datatypes.

Mario Carneiro (Jul 26 2023 at 23:34):

Obviously I want both

Mac Malone (Jul 26 2023 at 23:34):

Or some nice metaprogramming tools to solve code duplication problems.

Mario Carneiro (Jul 26 2023 at 23:34):

and not in separate packages either, I want one type that is good in both ways

Mac Malone (Jul 26 2023 at 23:35):

@Mario Carneiro I want a lot of things too; the question is whether that is feasible or likely to be supported in the near-term.

Mario Carneiro (Jul 26 2023 at 23:36):

in the near term I have explained what could happen with RBMap. I don't think changing the API of RBNode is on the table, but you can transfer things from an equivalent type without too much pain, just a lot of copy paste

Mac Malone (Jul 26 2023 at 23:37):

For instance, I would like the core monad functions and instances to only require their exact prerequisites. However that requires refactors and code review that was not in the cards when I proposed it.

Mario Carneiro (Jul 26 2023 at 23:37):

that is also a performance regression IIUC

Mac Malone (Jul 26 2023 at 23:37):

Mario Carneiro said:

that is also a performance regression IIUC

What?

Mario Carneiro (Jul 26 2023 at 23:39):

at least, mapM -> mapA is a performance regression

Mac Malone (Jul 26 2023 at 23:40):

@Mario Carneiro Do you mean on an List/Array? I meant things like pure and bind for transformers like ReaderT.

Mac Malone (Jul 26 2023 at 23:40):

(i.e., so the transformers also work on types weaker than a monad).

Mario Carneiro (Jul 26 2023 at 23:40):

I recall making exactly this change and hitting issues, although the details escape me now

Mario Carneiro (Jul 26 2023 at 23:40):

I don't think they were performance issues, something broke

Mac Malone (Jul 26 2023 at 23:42):

I actually made this exact change and previous Lean 4 PR (lean4#581) and got it work without issues. The main thing is that is also required fixing the LawfulMonad proofs.

Mario Carneiro (Jul 27 2023 at 00:00):

Here's the previous PR where I hit issues with the generalization: lean4#1447

Mario Carneiro (Jul 27 2023 at 00:04):

the specific commit with the error is https://github.com/leanprover/lean4/commit/8ca6fe43820a3476f716caabd80adc1a9d8bfdca although the log has since been deleted. I think it ran out of stack space in the linux debug run

Mac Malone (Jul 27 2023 at 00:16):

@Mario Carneiro Interesting, I didn't have a similar problem with my PR, but mine was much older so the bug (which it does seem like it was bug elsewhere that was just exposed by this change) may have been introduced since then. It is also worth noting that your change did not generalize the instances as well (e.g., [Pure m] : Pure (ReaderT p m)) so that may have also been a factor (but you did mention trying that in your report and still getting the error).

Mac Malone (Jul 27 2023 at 00:19):

It could also be interplay between other transformers which I generalized by you do not.

Mario Carneiro (Jul 27 2023 at 00:49):

I did try to generalize the instances, see https://github.com/leanprover/lean4/commit/8ca6fe43820a3476f716caabd80adc1a9d8bfdca#diff-52ef0c67eea613acf6c0b6284063fd7112cdd40750de38365c214abaef246db4R2077 . The change that landed basically excludes all of the generalizations that didn't work, so if you take 8ca6fe and compare it to the final PR commit you will have all of the buggy generalizations


Last updated: Dec 20 2023 at 11:08 UTC