Zulip Chat Archive

Stream: lean4

Topic: HashMap moved to std


Jannis Limperg (Oct 05 2022 at 17:11):

I see std now has its own implementation of HashMap, Std.HashMap, distinct from Lean.HashMap. What's the advantage of this, compared to changing the implementation in core (and adding functions/lemmas in std as necessary)? I feel like having two types which fulfil the same purposes and have almost the same implementation is not ideal.

James Gallicchio (Oct 05 2022 at 18:10):

I think the reasoning is that Std.HashMap is intended to be more stable than Lean.HashMap. Having people rely on the Lean one makes it more difficult if the compiler devs need to make changes. Also there's some issues in the bootstrap that affect its API, which Std.HashMap can avoid since it's not in the bootstrap

Jannis Limperg (Oct 06 2022 at 09:45):

I'm sure you all have given this issue more consideration than I have, but I already see myself wanting to use some library (which uses Std.HashMap) for metaprogramming (which uses Lean.HashMap). At that point, my options are

  • conversions back and forth, probably linear-time;
  • really unsafe casts if the representations are sufficiently similar;
  • duplicating the relevant part of the library's API for Lean.HashMap.

I would take occasional breakage over this.

Out of curiosity, what are the bootstrapping concerns?

James Gallicchio (Oct 06 2022 at 15:19):

I'm not super familiar with the metaprogramming API, does it explicitly expose hashmaps somewhere?

Jannis Limperg (Oct 06 2022 at 16:11):

Yes. For example, simp now collects used simp lemmas and these are stored in a HashMap. More importantly, PersistentHashMaps, which I guess should also move to std by the same reasoning, are used to implement all the core data structures.

In general, the metaprogramming API is now the entire non-private compiler API. So whatever appears in a type signature there is exposed. (Technically, defeq also makes all implementations part of the API, but let's not go there.)

Mario Carneiro (Oct 06 2022 at 17:19):

I was planning to add a conversion between the two types to Std. I think it has to be the linear time conversion because the invariants aren't the same. I think it is an acceptable cost for conversion (the zero-cost conversion is a little sketchy and will again tie the representations together, so I don't want to promise its continued availability), but in applications I would generally want to #xy why you actually need the conversion: it's rare to be literally giving a whole hashmap from one module to another. In most cases you should just use the data structure required by the API you are interacting with, or default to Std.HashMap if you have a free choice.

Mario Carneiro (Oct 06 2022 at 17:23):

Out of curiosity, what are the bootstrapping concerns?

Std wants to put invariants on all the data structures such that they can be verified to have the correct behavior under reasonable assumptions, and this sometimes front-loads the proving effort. Lean core is not currently in a position to be able to do those proofs however unless we move a decent chunk of Std to core.

Jannis Limperg (Oct 07 2022 at 09:52):

The potential issue I'm describing hasn't occurred yet, so I can't xy. I agree that the issue won't be terribly common in practice, but at the same time, passing around a map doesn't sound like an outlandish thing either.

One concrete issue I already have is that I have some utility methods for Lean.HashMap, Lean.PersistentHashMap, etc. lying around. I was planning to move them to std, but will now have to duplicate them for the Std versions as well. These are things like insertWith, update, merge, ForIn, BEq. In the long run, I think we will want much larger APIs for these data structures along the lines of Haskell's Data.Map. Duplicating these APIs will be annoying.

Could the bootstrapping issue be addressed by having the unverified data structure in core and verifying it post-hoc (and perhaps packaging up the verified version as a subtype, as usual)?

Mario Carneiro (Oct 07 2022 at 09:57):

I would eagerly add those APIs to std, and backport them to lean core as they become necessary

Mario Carneiro (Oct 07 2022 at 09:58):

I think it is feasible to have the unverified version in core, but leo doesn't want to deal with the maintenance overhead of that at the moment. I don't think this is a permanent state of affairs (but it could be a while before this is changed)

Jannis Limperg (Oct 07 2022 at 10:15):

All right, I'll deal with it for now and we'll see whether the duplication becomes an issue in practice.

Siddhartha Gadgil (Oct 07 2022 at 10:57):

Just in case it was not noticed, in the master of mathlib3port there is exactly this collision in Mathlib.Tactic.Sat.FromLRAT:

error: stdout:
./lean_packages/mathlib/././Mathlib/Tactic/Sat/FromLRAT.lean:251:30: error: ambiguous, possible interpretations
  Std.HashMap  Clause : Type

Mario Carneiro (Oct 07 2022 at 11:14):

that's only a name collision, it can be fixed by using open Lean hiding HashMap or the other way around

Mario Carneiro (Oct 07 2022 at 11:17):

you shouldn't be getting that error if the version of std4 and mathlib4 match. Note that this was fixed in the std4 bump for 10-03

Siddhartha Gadgil (Oct 07 2022 at 11:52):

Mario Carneiro said:

you shouldn't be getting that error if the version of std4 and mathlib4 match. Note that this was fixed in the std4 bump for 10-03

That commit is for mathlib4. I am getting the error for mathlib3port.

Mario Carneiro (Oct 07 2022 at 11:58):

you probably have the wrong lean-toolchain

Mario Carneiro (Oct 07 2022 at 11:58):

it needs to match whatever version of std4 / mathlib4 is being used

Siddhartha Gadgil (Oct 07 2022 at 14:05):

Mario Carneiro said:

you probably have the wrong lean-toolchain

Thanks. I will try to get things in sync.


Last updated: Dec 20 2023 at 11:08 UTC