Zulip Chat Archive

Stream: batteries

Topic: Unifying `Batteries.HashMap` and `Std.HashMap`.


Kim Morrison (Oct 16 2024 at 02:10):

We're now in the situation of having both Batteries.HashMap and Std.HashMap (we also have Lean.HashMap, but that is already being dealt with). I'd like to consolidate these, but I need some help.

There are no longer any uses of Batteries.HashMap in repositories that I visit regularly (outside of deprecated files). (I'm happy to hear about other current uses.)

I think the steps going forward are:

  1. Identify currently existing Batteries.HashMap functionality/theorems that could not be implemented on top of Std.HashMap.
  2. Put these on @Henrik Böving's TODO list. :-)
  3. Verify that any remaining functionality/theorems in Batteries.HashMap could be implemented on top of Std.HashMap.
  4. Replace Batteries.HashMap with a synonym for Std.HashMap, implementing any remaining functionality/theorems on top, still in Batteries.
  5. (Optional/later): move that remaining functionality to Std.HashMap
  6. (Optional/later): deprecate Batteries.HashMap.

Does this seem reasonable? Pinging in particular @Mario Carneiro to confirm this matches what you had in mind. If not, I'm happy to discuss alternative plans to reduce duplication.

If this is the right approach, then the current blocker is step 1. @Mario Carneiro, is this something you would be willing to take on? I can then drive 2. and 3. and perhaps 4. when appropriate.

Kim Morrison (Nov 24 2024 at 01:48):

Kim Morrison said:

We're now in the situation of having both Batteries.HashMap and Std.HashMap (we also have Lean.HashMap, but that is already being dealt with). I'd like to consolidate these, but I need some help.

There are no longer any uses of Batteries.HashMap in repositories that I visit regularly (outside of deprecated files). (I'm happy to hear about other current uses.)

I think the steps going forward are:

  1. Identify currently existing Batteries.HashMap functionality/theorems that could not be implemented on top of Std.HashMap.
  2. Put these on Henrik Böving's TODO list. :-)
  3. Verify that any remaining functionality/theorems in Batteries.HashMap could be implemented on top of Std.HashMap.
  4. Replace Batteries.HashMap with a synonym for Std.HashMap, implementing any remaining functionality/theorems on top, still in Batteries.
  5. (Optional/later): move that remaining functionality to Std.HashMap
  6. (Optional/later): deprecate Batteries.HashMap.

Does this seem reasonable? Pinging in particular Mario Carneiro to confirm this matches what you had in mind. If not, I'm happy to discuss alternative plans to reduce duplication.

If this is the right approach, then the current blocker is step 1. Mario Carneiro, is this something you would be willing to take on? I can then drive 2. and 3. and perhaps 4. when appropriate.

Could I ping on this? I'm happy to do 1. as well, but would like there to be some plan in this direction.

Mario Carneiro (Nov 24 2024 at 02:02):

With my batteries maintainer hat on I don't see any reason to push this process along, but I agree those are the steps that the FRO needs to do before we can deprecate Batteries.HashMap.

Yury G. Kudryashov (Nov 24 2024 at 02:17):

About Lean.HashMap, could you please add deprecation tags to functions, not only types? I'm trying to port code written by someone else, and I have to search for the corresponding function every time.

Kim Morrison (Nov 24 2024 at 02:57):

Mario Carneiro said:

With my batteries maintainer hat on I don't see any reason to push this process along, but I agree those are the steps that the FRO needs to do before we can deprecate Batteries.HashMap.

It turns out there is nothing missing except for findEntry?, which has a slightly complicated spec: m.findEntry? a returns some (a', b), for some a' == a already in the HashMap, but this a' may not actually be a if BEq ignores some data.

If it's okay to just replace this with (a, ·) <$> self[a]?, then otherwise we're good to go, and steps 1-4 are all finished by batteries#1063.

Kim Morrison (Nov 24 2024 at 03:08):

Yury G. Kudryashov said:

About Lean.HashMap, could you please add deprecation tags to functions, not only types? I'm trying to port code written by someone else, and I have to search for the corresponding function every time.

lean#6192

Mario Carneiro (Nov 24 2024 at 15:21):

No, findEntry? exists exactly because you might want the element in the map and not the one you have already

Mario Carneiro (Nov 24 2024 at 15:22):

in particular, it is used as a building block in other functions on sets IIRC (at least, in the RBMap version which has a more complete API)

Kim Morrison (Nov 24 2024 at 20:35):

Can we find any usages of Batteries.HashMap.findEntry? or Lean.HashMap.findEntry?? In what I've seen so far, I think it was just copied over from the Lean.HashMap API when it was forked, but isn't actually used.

Kim Morrison (Nov 24 2024 at 20:36):

(I agree that Batteries.RBMap.findEntry? is a different story.)

Mario Carneiro (Nov 24 2024 at 20:36):

No, it's copied from the RBMap API

Mario Carneiro (Nov 24 2024 at 20:36):

the other thing I want to know is how far along the lemmas are wrt the RBMap API

Kim Morrison (Nov 24 2024 at 20:36):

You mean for Std.HashMap compared to Batteries.RBMap? Or Markus's new work on ordered maps compared to Batteries.RBMap?

Mario Carneiro (Nov 24 2024 at 20:37):

Is RBMap on the chopping block too? :face_with_diagonal_mouth:

Mario Carneiro (Nov 24 2024 at 20:38):

If these are supposed to be replacements, then they should have the lemmas too. That was the whole reason we were doing this, I thought

Mario Carneiro (Nov 24 2024 at 20:38):

How far along is that work?

Mario Carneiro (Nov 24 2024 at 20:39):

I got so much flak for not having all the theorems about hashmaps when this was under discussion

Kim Morrison (Nov 24 2024 at 20:39):

I think the new implementation Markus is working on it not actually an RBMap, it is something else, but I don't recall the details right now. Certainly the intention is that it will have at least as thorough verification API.

Kim Morrison (Nov 24 2024 at 20:40):

@Markus Himmel, just pinging you on this when you're available.

Mario Carneiro (Nov 24 2024 at 20:40):

But right now I'm talking about hashmaps

Mario Carneiro (Nov 24 2024 at 20:40):

we can deal with the other stuff later

Kim Morrison (Nov 24 2024 at 20:40):

Oh, then there is quite a bit more verification than there was previously.

Mario Carneiro (Nov 24 2024 at 20:41):

I'm not comparing to what currently exists in batteries master (which is almost nothing), I'm comparing to the RBMap API and also the work that wojciech did on a PR which was never merged

Kim Morrison (Nov 24 2024 at 20:41):

(Std.HashMap vs Batteries.HashMap)

Kim Morrison (Nov 24 2024 at 20:41):

Sorry, I thought you said we were talking about HashMaps.

Mario Carneiro (Nov 24 2024 at 20:41):

I am

Mario Carneiro (Nov 24 2024 at 20:42):

there is a bunch of proof work on batteries hashmaps which was deliberately blocked because of the work on Std.HashMap

Kim Morrison (Nov 24 2024 at 20:42):

I'm happy to go make a comparison, but I think the only code that I can actually compare against is Std.HashMap vs Batteries.RBMap?

Kim Morrison (Nov 24 2024 at 20:42):

Where is this?

Mario Carneiro (Nov 24 2024 at 20:42):

Which part? The batteries RBMap API is pretty close to what I would like to see from a complete map verification API

Mario Carneiro (Nov 24 2024 at 20:43):

Plus there's a copy and paste to hashsets

Mario Carneiro (Nov 24 2024 at 20:43):

That's all to scope out what lemmas we should have, not what lemmas we do have

Kim Morrison (Nov 24 2024 at 20:43):

Are you talking about batteries#279?

Mario Carneiro (Nov 24 2024 at 20:44):

Yes, that's what we do have

Kim Morrison (Nov 24 2024 at 20:44):

Okay, but there's no point doing a lemma-by-lemma comparison there, because that work never actually reached the point of the user-facing API, right?

Mario Carneiro (Nov 24 2024 at 20:45):

HashMap has never been completed, I want to know if it's complete now

Kim Morrison (Nov 24 2024 at 20:45):

The toList function is has no verification.

Mario Carneiro (Nov 24 2024 at 20:45):

and if not I want to put pressure until it is

Kim Morrison (Nov 24 2024 at 20:45):

Otherwise there is a fair bit.

Kim Morrison (Nov 24 2024 at 20:45):

And someone (again, this will have to wait until Markus arrives) is working on the toList part of the story.

Mario Carneiro (Nov 24 2024 at 20:46):

it is frustrating that proof_wanted never really worked out here, and it seems likely that we're about to lose track entirely of what we even want from this API

Kim Morrison (Nov 24 2024 at 20:46):

https://github.com/leanprover/lean4/blob/master/src/Std/Data/HashMap/Lemmas.lean

Kim Morrison (Nov 24 2024 at 20:46):

is the current state of things

Kim Morrison (Nov 24 2024 at 20:47):

(although note this file comes in four versions, for the dependent and raw versions as well)

Kim Morrison (Nov 24 2024 at 20:48):

Sometime later today I'll try to go lemma-by-lemma through Batteries.RBMap and Std.HashMap and see how they compare, and create issues if I find something missing.

Mario Carneiro (Nov 24 2024 at 20:49):

is there anything about permutations in this file?

Mario Carneiro (Nov 24 2024 at 20:49):

maybe that's what you mean by toList having no verification

Mario Carneiro (Nov 24 2024 at 20:50):

How do the benchmarks look between the three hashmap implementations?

Kim Morrison (Nov 24 2024 at 20:55):

Again, I'll wait for Markus, but he benchmarked rather carefully, and from what I remember he was happy with the outcomes.

Kim Morrison (Nov 24 2024 at 20:56):

Okay --- https://gist.github.com/kim-em/73bc3e4516271244e1a9702a5490189c is just the theorem statements from Batteries.Data.RBMap.Lemmas. Am I missing anything else?

Kim Morrison (Nov 24 2024 at 20:56):

Later I'll go through these and make a comparison.

Mario Carneiro (Nov 24 2024 at 20:56):

IIRC there is a lot more in the previous section?

Mario Carneiro (Nov 24 2024 at 20:57):

most of the theorems are on RBNode

Mario Carneiro (Nov 24 2024 at 20:58):

(Note the use of findEntry? in one of those lemmas. That would be incorrect with the new findEntry?)

Kim Morrison (Nov 24 2024 at 21:00):

Sorry, theorems about RBNode are implementation details. I don't want to compare something other than the surface API.

Mario Carneiro (Nov 24 2024 at 21:00):

There are also two notions of membership, namely "find would succeed" and "the entry is in the map". I think these are called Mem and EMem respectively in RBMap

Mario Carneiro (Nov 24 2024 at 21:00):

No, RBNode is the equivalent of HashSet

Mario Carneiro (Nov 24 2024 at 21:01):

but also, the low level API is also important, especially depending on how transparent you expect the internals to be

Kim Morrison (Nov 24 2024 at 21:01):

Can you point me to specific lemmas you're interested in, that are relevant for a comparison to a hashmap?

Kim Morrison (Nov 24 2024 at 21:02):

I'm not seeing anything in the RBNode section that seems relevant to the discussion here, sorry.

Mario Carneiro (Nov 24 2024 at 21:03):

I don't see an equivalent for

theorem mem_toList_insert [@TransCmp α cmp] {t : RBSet α cmp} :
    v'  toList (t.insert v)  (v'  toList t  t.find? v  some v')  v' = v := by

Kim Morrison (Nov 24 2024 at 21:04):

Std.HashMap is just so far ahead of Lean.HashMap and Batteries.HashMap (hand-checked IR, benchmarks, simp lemmas for pairwise combinations of lemmas, thorough framework for verifying further operations, including custom tactics to reduce proof obligations to list-based models, etc., etc.). I'm sort of at a loss as to what the obstacle to switching out Batteries.HashMap is.

Kim Morrison (Nov 24 2024 at 21:04):

Yes --- the HashMap.toList function doesn't have any verification API yet.

Mario Carneiro (Nov 24 2024 at 21:05):

This lemma isn't really about toList, a \in toList map is how this API says "exact membership"

Mario Carneiro (Nov 24 2024 at 21:06):

I guess you could do the same for hashmap but of course that means you will need a lot of lemmas about toList

Kim Morrison (Nov 24 2024 at 21:07):

"exact membership" is an implementation detail that is not meant to be exposed. My understanding is that the verification of toList will be under appropriate Lawful hypotheses. (Again, I'm not involved in verifying toList, I may be wrong here.)

Mario Carneiro (Nov 24 2024 at 21:07):

?? It's externally visible

Mario Carneiro (Nov 24 2024 at 21:07):

It's not at all an implementation detail, it's what the user put into the map

Mario Carneiro (Nov 24 2024 at 21:08):

the precise ordering of toList is an implementation detail, but up to permutation toList on a hashmap is an external API

Mario Carneiro (Nov 24 2024 at 21:09):

hence a \in toList map matters for hashmaps too

Kim Morrison (Nov 24 2024 at 21:09):

Yes, and a ∈ map.toList ↔ a ∈ map with appropriate Lawful hypotheses.

Mario Carneiro (Nov 24 2024 at 21:09):

no, that is false

Mario Carneiro (Nov 24 2024 at 21:10):

unless by appropriate Lawful hypotheses you mean that beq is equivalent to eq

Mario Carneiro (Nov 24 2024 at 21:10):

which is too strong of an assumption

Kim Morrison (Nov 24 2024 at 21:11):

In principle there's no reason why this can't be done. My understanding was that it wasn't useful to do it, but I'm not involved in verifying toList. :woman_shrugging:

I think we're getting a bit off track --- this is not something that Batteries.HashMap allows doing in any case.

Mario Carneiro (Nov 24 2024 at 21:11):

I'm not sure what you mean, of course you can do it

Mario Carneiro (Nov 24 2024 at 21:12):

and the lemmas Wojciech wrote let you verify that the right things have been replaced or not

Kim Morrison (Nov 24 2024 at 21:12):

But this is non-existent work, Mario.

Kim Morrison (Nov 24 2024 at 21:12):

The lemma just isn't there.

Kim Morrison (Nov 24 2024 at 21:13):

I agreee that one might one it

Kim Morrison (Nov 24 2024 at 21:13):

so if there's a use case for it, lets make it happen as part of Std.HashMap.toList verification.

Mario Carneiro (Nov 24 2024 at 21:14):

Is it possible to prove such a lemma as a user without breaking the API boundary?

Mario Carneiro (Nov 24 2024 at 21:14):

it's not great to provide a partial API and also get annoyed when people work around it

Mario Carneiro (Nov 24 2024 at 21:15):

if you say the internals are open, then it's possible for people to just do this stuff on their own dime

Mario Carneiro (Nov 24 2024 at 21:16):

but if you seal the internals then you need enough lemmas at the low level so that people won't want to break the boundary

Kim Morrison (Nov 24 2024 at 21:16):

Isn't the relevant lemma map.toList.contains a ↔ a ∈ map anyway? I'm a bit confused why you want to mix up BEq and Eq here.

Mario Carneiro (Nov 24 2024 at 21:17):

The question is essentially "what happens to the keys (up to Eq) when you insert/find/... in a map, where the BEq relation is a quotient of the type?"

Mario Carneiro (Nov 24 2024 at 21:17):

Mario Carneiro said:

I don't see an equivalent for

theorem mem_toList_insert [@TransCmp α cmp] {t : RBSet α cmp} :
    v'  toList (t.insert v)  (v'  toList t  t.find? v  some v')  v' = v := by

Lemmas like this are how you answer that question

Kim Morrison (Nov 24 2024 at 21:19):

Mario Carneiro said:

Is it possible to prove such a lemma as a user without breaking the API boundary?

I'm pretty sure yes: there's nothing forbidding anyone from following the usual mechanism to provide more lemmas.

Mario Carneiro (Nov 24 2024 at 21:19):

the usual mechanism meaning what exactly?

Kim Morrison (Nov 24 2024 at 21:19):

There's a long discussion of this at https://github.com/leanprover/lean4/blob/master/src/Std/Data/DHashMap/Internal/Defs.lean

Kim Morrison (Nov 24 2024 at 21:20):

Sorry, I'd assumed you'd seen that.

Mario Carneiro (Nov 24 2024 at 21:21):

to be clear, you mean PRing to lean

Kim Morrison (Nov 24 2024 at 21:21):

I don't think so?

Kim Morrison (Nov 24 2024 at 21:21):

I mean, of course welcome to.

Mario Carneiro (Nov 24 2024 at 21:21):

I think the summary of steps to add a new operation is being written from that perspective?

Kim Morrison (Nov 24 2024 at 21:22):

Oh, in the sense that it tells you where to put content? Yes. But I don't know of anything reason it can't be done in separate files.

Kim Morrison (Nov 24 2024 at 21:22):

Lean doesn't complain. :-)

Mario Carneiro (Nov 24 2024 at 21:22):

but FRO members might

Mario Carneiro (Nov 24 2024 at 21:23):

I feel like I've been blamed for using the array API in formalizations of HashMap and UnionFind in the past

Kim Morrison (Nov 24 2024 at 21:23):

I mean, if it's good work we'd love to have it PR'd to the repository? It would be sad if good work on operations or lemmas for HashMap had to be implemented multiple times.

Mario Carneiro (Nov 24 2024 at 21:24):

I think the next stage for batteries.hashmap should be making it an overlay API for std.hashmap and encouraging lemma contributions there

Kim Morrison (Nov 24 2024 at 21:25):

Concretely what does that look like relative to batteries#1063?

Mario Carneiro (Nov 24 2024 at 21:26):

Nothing, that's the first part, and the second part is adding a bunch of proof_wanted for all the things that have come out in this discussion

Kim Morrison (Nov 24 2024 at 21:26):

I agree that the tutorial on extending that I linked to above would be even better with a "worked example" outside of the Lean repo.

Kim Morrison (Nov 24 2024 at 21:26):

So one can see by example what exactly is needed at each layer.

Henrik Böving (Nov 24 2024 at 21:28):

Kim Morrison said:

I agree that the tutorial on extending that I linked to above would be even better with a "worked example" outside of the Lean repo.

I think linking to a PR that just implements the full procedure for one function might be fine? (Shameless plug: https://github.com/leanprover/lean4/pull/5244/ though that does it for 4 operations, a more minimal example would truly be just one)

Kim Morrison (Nov 24 2024 at 21:28):

How about we deprecate Batteries.HashMap.findEntry?, to make sure that if there is actually a user of it, they see the warning about the changed behaviour, and then add some proof_wanteds or module docs about the desire to replace it?

Mario Carneiro (Nov 24 2024 at 21:29):

Why not just test the process you are advocating by implementing it instead?

Mario Carneiro (Nov 24 2024 at 21:29):

As I mentioned earlier, it plays a role in some theorem statements too

Mario Carneiro (Nov 24 2024 at 21:29):

the proposed implementation is just wrong

Kim Morrison (Nov 24 2024 at 21:30):

But it plays a role in theorem statements that haven't to date been written. :-)

Mario Carneiro (Nov 24 2024 at 21:31):

I have wanted from the beginning to have a robust HashMap library in lean. We're closer now, but having one person do all the work is not a scalable solution

Kim Morrison (Nov 24 2024 at 21:31):

Well, it's not one person: note that PR Henrik linked above is person 2. :-) And there's WIP about toList, but I forget who is working on it.

Henrik Böving (Nov 24 2024 at 21:32):

People from TU Dresden.

Mario Carneiro (Nov 24 2024 at 21:32):

As soon as this stops being a FRO priority I expect the lemmas to stop there

Mario Carneiro (Nov 24 2024 at 21:33):

and I don't know that there is a good pathway for external contributors to influence that

Mario Carneiro (Nov 24 2024 at 21:33):

hence why I want Batteries.HashMap to be the receiver for that energy instead

Mario Carneiro (Nov 24 2024 at 21:34):

we actively discouraged people from getting involved, I would like that to end once we do this

Kim Morrison (Nov 24 2024 at 21:35):

Mario Carneiro said:

we actively discouraged people from getting involved, I would like that to end once we do this

Sorry, wasn't that just pointing out that the lemma existed already? It's not discouraging people to tell them that the work has already been done.

Mario Carneiro (Nov 24 2024 at 21:36):

that and the complete lack of other proof_wanteds in this file I'm quite sure has been effective in keeping people away from working on hashmaps

Mario Carneiro (Nov 24 2024 at 21:36):

I know it has been effective on me and Wojciech at least

Kim Morrison (Nov 24 2024 at 21:37):

but, but... "the complete lack of other proof_wanteds in this file" is on you, isn't it? Who else would have written them? I'm confused.

Mario Carneiro (Nov 24 2024 at 21:38):

The messaging at me to stop messing with hashmaps has been coming loud and clear for the past few months

Kim Morrison (Nov 24 2024 at 21:38):

re: Wojciech, I started reviewing their PR, and it then immediately went silent?

Kim Morrison (Nov 24 2024 at 21:38):

Mario Carneiro said:

The messaging at me to stop messing with hashmaps has been coming loud and clear for the past few months

Sure, of course, as soon as we planned to write a replacement we said so.

Mario Carneiro (Nov 24 2024 at 21:39):

right, so now that this work is settled down I want hashmap to be back in business for me and others

Kim Morrison (Nov 24 2024 at 21:39):

Great! Yes.

Kim Morrison (Nov 24 2024 at 21:40):

I think it will be great if there are people (including you) who would like to contribute more operations and verification lemmas for Std.HashMap.

Kim Morrison (Nov 24 2024 at 21:40):

and speaking for myself at least, I don't see any problem with that incubating at least in Batteries.

Mario Carneiro (Nov 24 2024 at 21:40):

I'm not going to write a PR to lean though, I'm going to write PRs for batteries and you can take them to lean

Mario Carneiro (Nov 24 2024 at 21:40):

this is a much less stressful approach for everyone

Kim Morrison (Nov 24 2024 at 21:41):

and hopefully if there is good work on Std.HashMap in Batteries, then authors will be pleased when/if we ask if we can incorporate it into the Lean repo.

Kim Morrison (Nov 24 2024 at 21:41):

Great! It sounds like furious agreement all round, then. :-)

Mario Carneiro (Nov 24 2024 at 21:42):

but this does mean that batteries.hashmap isn't going to be deprecated any time soon

Mario Carneiro (Nov 24 2024 at 21:42):

and in fact it might become the default for anyone downstream of batteries

Kim Morrison (Nov 24 2024 at 21:42):

Why not change the namespace over to Std, though? Batteries (the library) can add code to Std (the namespace).

Mario Carneiro (Nov 24 2024 at 21:43):

That is a possibility as well

Mario Carneiro (Nov 24 2024 at 21:44):

am I getting your permission to not make a stink about batteries having theorems about std internals?

Kim Morrison (Nov 24 2024 at 21:56):

Let's pause until Markus is available, but in principle yes, I think the sounds great:

  • We merge batteries#1063 (either as is, or probably better with findEntry? deprecated for now)
  • Preferably we subsequently switch over the namespace too.
  • You or others are welcome to add operations and lemmas about Std.HashMap in Batteries
  • Said authors understand that the optimal outcome for good/stable code added in this way to Batteries is that it is upstreamed subsequently to the Lean repo, but there's no expectation or obligation that they do it themselves.
  • Given that there will also be people contributing HashMap code directly to the Lean repo, I attempt to provide communication in both directions about things I know people are working on.

Eric Wieser (Nov 24 2024 at 22:11):

I think if no implementation of findEntry? is available in the new shim, it would be better to replace it with a panic! "not implemented" than replace it with an implementation that does something different to what it did before

Kim Morrison (Nov 24 2024 at 22:45):

I've replaced findEntry? with a panic in batteries#1063.

Jannis Limperg (Nov 24 2024 at 23:46):

I feel like panic! is strictly worse than removing findEntry? (temporarily). In both cases I need to fix my code, but if it's removed I at least get a compiler error, rather than just a warning. Having a function in the library, even a deprecated one, that just straight up panics would be extremely surprising.

Mario Carneiro (Nov 24 2024 at 23:46):

I also don't think there is any need to remove the function at all, it's not much more than a copy paste job on find?

Eric Wieser (Nov 24 2024 at 23:49):

Having a copy-paste job on one function still seems like an improvement over the status quo where the

Jannis Limperg said:

I feel like panic! is strictly worse than removing findEntry? (temporarily). In both cases I need to fix my code, but if it's removed I at least get a compiler error, rather than just a warning. Having a function in the library, even a deprecated one, that just straight up panics would be extremely surprising.

I guess ideally you want want a compiler error saying "this has been removed, see discussion" rather than "this name is missing, good luck working out whether you're missing an import or something else is going on"

Mario Carneiro (Nov 24 2024 at 23:51):

it would be nice to have another deprecation-like attribute for post-removal assistance

Kim Morrison (Nov 25 2024 at 00:07):

My preference was just to deprecate it and change the behaviour.

Mario Carneiro (Nov 25 2024 at 00:07):

but why though? The new behavior is clearly worse

Kim Morrison (Nov 25 2024 at 00:08):

If someone would like to include an new implementation of findEntry? in batteries#1063, that would be great: please let me know! If someone would like to write a new implementation after batteries#1063, that would also be great.

Mario Carneiro (Nov 25 2024 at 00:08):

I don't see any justification for it other than it's less work given that the correct implementation doesn't exist yet

Kim Morrison (Nov 25 2024 at 00:08):

Particularly if it's just a copy paste job?

Kim Morrison (Nov 25 2024 at 00:11):

I don't want to drag this out longer for a function that is currently unused. We've got a path, as discussed above, for anyone who needs it to add it, either PRing to the Lean repo or the Batteries repo.

Kim Morrison (Nov 25 2024 at 00:27):

Jannis Limperg said:

In both cases I need to fix my code,

@Jannis Limperg, are you using findEntry?? I thought Aesop wasn't using Batteries.HashMap at all anymore.

Jannis Limperg (Nov 25 2024 at 00:28):

No, I meant "I" as in "a hypothetical downstream user". Maybe not my clearest formulation. :sweat_smile:

Johannes Tantow (Nov 25 2024 at 06:53):

Kim Morrison schrieb:

  1. :-) And there's WIP about toList, but I forget who is working on it.

Henrik Böving schrieb:

People from TU Dresden.

Not sure if other people are working on it, but @Lukas Gerlach and I are working on ofList not toList and with that on insertMany for Lists.

Markus Himmel (Nov 25 2024 at 07:20):

Regarding findEntry?: a possible compromise might be to use

def findEntry? [BEq α] [Hashable α] (m : Std.HashMap α β) (k : α) : Option (α × β) :=
  if h : k  m then some (m.getKey k h, m.get k h) else none

in the shim, which isn't particularly efficient but should have the same behavior as the current implementation in batteries.

Kim Morrison (Nov 25 2024 at 10:51):

Thanks Markus, great to know that is possible. I will put that in, with a note.

Kim Morrison (Nov 25 2024 at 10:54):

Okay, I've now restored findEntry? with the original behaviour.

Kim Morrison (Nov 25 2024 at 10:54):

I think batteries#1063 is ready for review?

Kim Morrison (Nov 25 2024 at 11:13):

(Thanks for the review comments, @Markus Himmel, I've dealt with these now.)

𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 (Nov 25 2024 at 16:44):

Kim Morrison said:

re: Wojciech, I started reviewing their PR, and it then immediately went silent?

Thank you for looking at that (batteries#279), and sorry for not getting back to you. The reason I abandoned it is that at that point I was feeling burnt out on specifically the "building a hashmap API for other people to use" front as a previous PR to that end existed for about a year, code-rotted, and never got merged, and I had also heard some rumours about the FRO starting new work on hashmaps. I should have communicated this better. I am personally happy with any solution that provides the necessary API (see below).

Kim Morrison said:

Okay, but there's no point doing a lemma-by-lemma comparison there, because that work never actually reached the point of the user-facing API, right?

It has reached the user-facing API, I just never made an honest attempt to upstream it for reasons outlined above. I am using the user-facing lemmas in this verification project. So, for me the ideal new Std.HashMap would provide replacements for everything I use there, at least in the sense that if some fact I use is not directly available in Std.HashMap, it is still trivial to prove given what is there in Std.HashMap. Note that that project assumes LawfulBEq, so subtleties of Eq vs BEq mostly don't apply.

Johannes Tantow (Nov 25 2024 at 20:28):

I have only looked briefly in this and didn't find everything yet in Std, but this feels wrong to me: theorem ext_findEntry? [LawfulBEq α] (m₁ m₂ : HashMap α β) : (∀ a, m₁.findEntry? a = m₂.findEntry? a) → m₁ = m₂. The order of elements in the bucket shouldn't matter for findEntry?, but in order for both maps to be equal the order in every buckets must be the same.

Mario Carneiro (Nov 25 2024 at 20:29):

yeah I think it should say m₁ ~ m₂

Mario Carneiro (Nov 25 2024 at 20:30):

this being an equivalence relation exposed by the API with lemmas about how all the operations respect it

Kim Morrison (Nov 25 2024 at 22:18):

(Just to be clear for context, this problem with ext_findEntry? is about the project Wojciech linked just above, not about the PR discussed here batteries#1063.)

𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 (Nov 25 2024 at 22:50):

Oops, that's what I get for leaving a sorry. Thanks for spotting that! It is used to prove insert_comm which is wrong for the same reason.

𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 (Nov 26 2024 at 02:59):

Okay fixed it here. As far as I can see, a big thing that I need that is not there for Std.HashMap atm are lemmas about fold.

Kim Morrison (Nov 26 2024 at 03:41):

@Wojciech Nawrocki, just double checking --- lemmas about fold, which are indeed not written yet for Std.HashMap (they will be presumably just say that fold agrees with toList.fold; i.e. they are part of the the verification lemmas for toList), also don't exist anywhere for Batteries.HashMap, right?

Mario Carneiro (Nov 26 2024 at 10:47):

they exist in Wojciech's project

Kim Morrison (Nov 29 2024 at 00:29):

Are we ready to merge batteries#1063?

Kim Morrison (Mar 20 2025 at 04:39):

Pinging on this again, @𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒. Std.HashMap now has fairly complete lemmas about the toList function and fold.

Kim Morrison (Mar 27 2025 at 01:30):

Okay, I'm going to interpret that emoji response as "yes, seen, and acknowledged", and I'll merge tomorrow. Just let me know if there's further reason to wait!

𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 (Mar 27 2025 at 02:38):

Sorry, I just haven't had the chance to see if the lemmas that are now in Std suffice to reproduce my proofs. At any rate I am not opposed to the unification.


Last updated: May 02 2025 at 03:31 UTC