Zulip Chat Archive
Stream: lean4
Topic: two RBMaps
Floris van Doorn (Jul 26 2023 at 13:29):
What is the difference between docs#Lean.RBMap and docs#Std.RBMap ? Why are there two copies? Which one should I use?
Ruben Van de Velde (Jul 26 2023 at 13:45):
The Lean one is only/primarily for use within the compiler; unless you're interfacing with that, the std one is probably better
Henrik Böving (Jul 26 2023 at 13:48):
Floris van Doorn said:
What is the difference between docs#Lean.RBMap and docs#Std.RBMap ? Why are there two copies? Which one should I use?
The why is: Lean's compiler does not (possibly yet) want to depend on std4 to build but std4 wants to do additional experimentation on top of the existing RbMap in Lean, the compiler datastructures do however only contain what the compiler needs so instead there are now two of the m that might or might not be merged one day
Floris van Doorn (Jul 26 2023 at 15:14):
Thanks!
(I ended up using the Lean one, since I my main application was NameMap
/NameSet
)
Mac Malone (Jul 26 2023 at 17:16):
@Mario Carneiro I remember we discussed this once, but I forgot the details. Why is Std.RBMap
defined in terms of RBSet
instead of the other way around (e.g., like essentially in core or where RBSet a := RBMap a Unit
)? It seems this would reduce a level of pointer indirection (the RBMap
tuple could be "unboxed" in the Node) without significant overhead (a pointless Unit
field in the node for an RBSet
).
Mario Carneiro (Jul 26 2023 at 18:19):
Because it would require duplicating all the proofs with a useless field. Maybe I will do it at some point but I'd rather hold out for lean making this zero-cost
Mario Carneiro (Jul 26 2023 at 18:27):
While this issue exists, it puts adverse incentives on almost everything to make the code worse in exchange for optimization. I recently debugged an issue along the same lines with @Yicheng Qian : they had code like
let mut a := 0
let mut b := 0
let mut c := 0
let mut d := 0
let mut e := 0
while true do
-- hot loop which does some basic arithmetic and modifies a,b,c,d,e
a := a + b
if b = 0 then break
b := b - 1
c := 2 * a + c
...
and it turned out that the performance was dominated by the fact that the do
notation tuples all the mutable variables after every iteration like (a, (b, (c, (d, e)))
(i.e. 4 allocations) and untuples them at the beginning of the loop. This was run once per character as part of a parser, and this single allocation contributed to over 50% of the runtime
Mac Malone (Jul 26 2023 at 22:24):
Mario Carneiro said:
Because it would require duplicating all the proofs with a useless field. Maybe I will do it at some point but I'd rather hold out for lean making this zero-cost
Why would it require duplication? If RBNode
was changed to have k/v
, the proofs would be in terms of that, which would carry on through through the definitions of RBMap
and RBSet
(just as inversely the RBSet structure/proofs are carried on through for RBMap
now). Right?
Mac Malone (Jul 26 2023 at 22:35):
Mario Carneiro said:
the fact that the
do
notation tuples all the mutable variables after every iteration like(a, (b, (c, (d, e)))
(i.e. 4 allocations) and untuples them at the beginning of the loop. This was run once per character as part of a parser, and this single allocation contributed to over 50% of the runtime
Yeah, using loops in do notation is a bad idea if you are looking for primitive loop behavior. If you want that, you generally need to use a tail-recursive function. Even in Sebastian's upgraded do
unchained notation, this seems like it would still be a problem, so I think holding out hope for this to be optimized in the future is unlikely in any near-term situation. Part of this could be alleviated by wrapping up all the mutable variables in a single autogenerated on-demand structure type, though some packing and unpacking would still likely occur (especially with more nested do use). On the flip side, statements by Leo has given me the impression that improving/rewriting do notation is not high on the priority list.
Mario Carneiro (Jul 26 2023 at 22:36):
the common through line here is not improvements to the do
notation, but rather inlined structure layouts
Mario Carneiro (Jul 26 2023 at 22:36):
if RBMap
inlined its pair in the RBSet
data structure we wouldn't need to talk about this
Mario Carneiro (Jul 26 2023 at 22:37):
and AssocList
wouldn't even need to exist
Mario Carneiro (Jul 26 2023 at 22:38):
I don't think RBSet
should be changed, it has a good API for what it does and you are talking about things that would change the API (of the lemmas)
Mac Malone (Jul 26 2023 at 22:40):
@Mario Carneiro My understanding is that inlined structure layouts are also not on the todo list, especially automatically. The only thing in this realm that I have heard being considered is to be ability to manually mark certain structure types as unboxed and they then they will be passed by-value rather than by-aggregate. This would only occur in cases where the fixed type is known at codegen time, so this would not apply for polymorphic.
Mario Carneiro (Jul 26 2023 at 22:40):
It doesn't have to be automatic, as soon as it is opt in RBMap
will definitely opt in
Mac Malone (Jul 26 2023 at 22:41):
@Mario Carneiro My point is that, in that feature I have heard discussed in meetings, RBMap
is not the kind of type that can opt-in.
Mario Carneiro (Jul 26 2023 at 22:42):
the only reason to consider the RBMap
refactor you are describing is to change the data layout. This needs to be solved at the data layout level, not at a level that complicates all the proofs
Mario Carneiro (Jul 26 2023 at 22:43):
The situation is basically the same for AssocList
, it is dumb to have to copy the whole List
API just because you want to store two things instead of one
Mario Carneiro (Jul 26 2023 at 22:43):
it basically means that structures don't work as a concept
Mac Malone (Jul 26 2023 at 23:03):
Mario Carneiro said:
the only reason to consider the
RBMap
refactor you are describing is to change the data layout. This needs to be solved at the data layout level, not at a level that complicates all the proofs
My argument is to this is simply one of expediency. It is unlikely that this will be solved in the compiler anytime soon, and it is not clear to me how this could be solved at all without a major refactor (the only way I can see to implement this would require specializing all upstream functions -- both existing and future -- to RBMap
which would require some significant compiler overhaul and powerful type inference to make sure the right version is being used).
Nonetheless, removing a level of data indirection is a major performance gain for a core type likeRBMap
, so I think a bit of complication for some significant performance boost is worth it, especially knowing it is unlikely they this will be solvable by the compiler for likely a good few years.
Mac Malone (Jul 26 2023 at 23:05):
Furthermore, this is likely to be a major sticking point with any future merge of Std with core, given that the core version is objectively better alternative for its primary core use cases due to its superior performance.
Mario Carneiro (Jul 26 2023 at 23:11):
Like I said, if this becomes a big enough issue we can make a copy of RBSet and transfer all the theorems. So far it hasn't been reported as an issue except in the theoretical sense
Mario Carneiro (Jul 26 2023 at 23:12):
So far, it is not a major issue in part because the core version is still available if you care
Mario Carneiro (Jul 26 2023 at 23:13):
keep in mind that RBMap
is already indirection heavy since it is a binary tree. In fact you can probably levy similar complaints against the use of RBColor
, that could just be baked into the discriminant at the cost of complicating the proofs
Mario Carneiro (Jul 26 2023 at 23:14):
that would save 8 bytes per node
Mario Carneiro (Jul 26 2023 at 23:15):
TBH this is why I don't like high performance programming in lean, there are just so many suboptimal things you can see from a mile away and optimizing for them is unsatisfying because all abstractions come with a cost, including whatever you are in the process of building
Mario Carneiro (Jul 26 2023 at 23:21):
Writing leangz was great, I could look at the data format and say "80% of this is unnecessary" and end up with 7x space reduction, before compression. There is just so much overhead in the object format, it is not at all space-optimized. Lean just isn't tuned for this kind of thing, and it lacks the facilities for adequate control over data layout
Mac Malone (Jul 26 2023 at 23:28):
Mario Carneiro said:
TBH this is why I don't like high performance programming in lean, there are just so many suboptimal things you can see from a mile away and optimizing for them is unsatisfying because all abstractions come with a cost, including whatever you are in the process of building
I think this is our major point of contention. I would like to use Lean for all my computing (high performance or otherwise) in the future. I am also happy to sacrifice ease of provability when necessary to achieve this. However, I think Lean generally provides enough meta-programming tools (and will provide even more in the future with the Lean-level compiler) to mitigate most sacrifices.
Mario Carneiro (Jul 26 2023 at 23:30):
In lean, the proofs and the defeqs are part of the API of the type. You need to be able to exert separate control over them in a good library
Mario Carneiro (Jul 26 2023 at 23:30):
implemented_by
is great for that
Mac Malone (Jul 26 2023 at 23:31):
Mario Carneiro said:
Like I said, if this becomes a big enough issue we can make a copy of RBSet and transfer all the theorems. So far it hasn't been reported as an issue except in the theoretical sense
I am still not clear though on why this change would require so much duplication. Why can't most theorems just be written for RBMap and lifted to RBSet when RBSet a := RBMap a Unit
(just as the inverse is currently done for RBMap in terms of RBSet)?
Mario Carneiro (Jul 26 2023 at 23:31):
but when it comes to data layout the inductive
and the layout are inextricably linked
Mario Carneiro (Jul 26 2023 at 23:32):
It would mean having an additional useless argument in all of the RBNode
theorems, i.e. an API change
Mario Carneiro (Jul 26 2023 at 23:32):
and besides, the same problem comes up again if you want to store two things instead of one
Mac Malone (Jul 26 2023 at 23:32):
Mario Carneiro said:
In lean, the proofs and the defeqs are part of the API of the type. You need to be able to exert separate control over them in a good library
This depends on whether the library is meant for proving or for computation. If the later, they may not be part of API (i.e., they may be considered a non-breaking implementation detail).
Mario Carneiro (Jul 26 2023 at 23:33):
This depends on whether the library is meant for proving or for computation
It's both!
Mario Carneiro (Jul 26 2023 at 23:33):
There are a lot of theorems about RBNode
Mario Carneiro (Jul 26 2023 at 23:34):
RBNode
is a public type too
Mac Malone (Jul 26 2023 at 23:34):
@Mario Carneiro My point is that a library can make that decision, and cases like these may be evidence there may need to be a split between proof-centric standard library and a computation-centric library for complex datatypes.
Mario Carneiro (Jul 26 2023 at 23:34):
Obviously I want both
Mac Malone (Jul 26 2023 at 23:34):
Or some nice metaprogramming tools to solve code duplication problems.
Mario Carneiro (Jul 26 2023 at 23:34):
and not in separate packages either, I want one type that is good in both ways
Mac Malone (Jul 26 2023 at 23:35):
@Mario Carneiro I want a lot of things too; the question is whether that is feasible or likely to be supported in the near-term.
Mario Carneiro (Jul 26 2023 at 23:36):
in the near term I have explained what could happen with RBMap. I don't think changing the API of RBNode is on the table, but you can transfer things from an equivalent type without too much pain, just a lot of copy paste
Mac Malone (Jul 26 2023 at 23:37):
For instance, I would like the core monad functions and instances to only require their exact prerequisites. However that requires refactors and code review that was not in the cards when I proposed it.
Mario Carneiro (Jul 26 2023 at 23:37):
that is also a performance regression IIUC
Mac Malone (Jul 26 2023 at 23:37):
Mario Carneiro said:
that is also a performance regression IIUC
What?
Mario Carneiro (Jul 26 2023 at 23:39):
at least, mapM
-> mapA
is a performance regression
Mac Malone (Jul 26 2023 at 23:40):
@Mario Carneiro Do you mean on an List
/Array
? I meant things like pure
and bind
for transformers like ReaderT
.
Mac Malone (Jul 26 2023 at 23:40):
(i.e., so the transformers also work on types weaker than a monad).
Mario Carneiro (Jul 26 2023 at 23:40):
I recall making exactly this change and hitting issues, although the details escape me now
Mario Carneiro (Jul 26 2023 at 23:40):
I don't think they were performance issues, something broke
Mac Malone (Jul 26 2023 at 23:42):
I actually made this exact change and previous Lean 4 PR (lean4#581) and got it work without issues. The main thing is that is also required fixing the LawfulMonad
proofs.
Mario Carneiro (Jul 27 2023 at 00:00):
Here's the previous PR where I hit issues with the generalization: lean4#1447
Mario Carneiro (Jul 27 2023 at 00:04):
the specific commit with the error is https://github.com/leanprover/lean4/commit/8ca6fe43820a3476f716caabd80adc1a9d8bfdca although the log has since been deleted. I think it ran out of stack space in the linux debug run
Mac Malone (Jul 27 2023 at 00:16):
@Mario Carneiro Interesting, I didn't have a similar problem with my PR, but mine was much older so the bug (which it does seem like it was bug elsewhere that was just exposed by this change) may have been introduced since then. It is also worth noting that your change did not generalize the instances as well (e.g., [Pure m] : Pure (ReaderT p m)
) so that may have also been a factor (but you did mention trying that in your report and still getting the error).
Mac Malone (Jul 27 2023 at 00:19):
It could also be interplay between other transformers which I generalized by you do not.
Mario Carneiro (Jul 27 2023 at 00:49):
I did try to generalize the instances, see https://github.com/leanprover/lean4/commit/8ca6fe43820a3476f716caabd80adc1a9d8bfdca#diff-52ef0c67eea613acf6c0b6284063fd7112cdd40750de38365c214abaef246db4R2077 . The change that landed basically excludes all of the generalizations that didn't work, so if you take 8ca6fe and compare it to the final PR commit you will have all of the buggy generalizations
Last updated: Dec 20 2023 at 11:08 UTC