Zulip Chat Archive
Stream: batteries
Topic: naming inconsistencies
Julien Michel (Mar 23 2024 at 09:13):
Is there a good reason why Array.size
is not named Array.length
, while for example, Array.get
(O(1)) is named like List.get
(O(n)) ?
Same question applies to many other pairs like Array.push
and List.concat
, Array.pop
and List.dropLast
, ...
Mario Carneiro (Mar 24 2024 at 07:02):
(Note, most of these definitions are not in std, and I had no hand in the naming)
Mario Carneiro (Mar 24 2024 at 07:02):
Once upon a time List.get
was also not named the same, it was List.nth
Mario Carneiro (Mar 24 2024 at 07:04):
I think the fact that they have different asymptotic behaviors is actually relevant to the naming (at least descriptively). More specifically, push
is what you do to a dynamic array to add an element at the end, but to add an element to the end of a linked list you concat
a new node at the end
Mario Carneiro (Mar 24 2024 at 07:05):
which is to say, these names come from the picture of the concrete data structure moreso than from the abstract specification of both of these types as sequences
Mario Carneiro (Mar 24 2024 at 07:07):
Having a more awkward name is an old trick for nudging people in the direction of "better" operations (for various notions of better), which to me seems to be part of what is going on in Array.pop
vs List.dropLast
Julien Michel (Mar 24 2024 at 15:28):
I initially thought that, as reasoning is only done on the Lean code, the name should be consistent with only the Lean code, so that for example Array
and List
lemmas have exactly the same names.
I think you are saying that a def's name should be consistent with the underlying implementation be it C++ or Lean.
A possible issue I see, is that an implementation might be changed later with a csimp attribute (or extern / implemented_by)
Comparing List
and Array
, I see functions that are named the same when they have different implementations (get
, set
, append
, map
, ...)
I also see functions that are named differently suggesting a different external implementation, but that have essentially the same Lean code (size
vs length
, push
vs concat
, pop
vs dropLast
, ...)
This lack of consistency feels a bit disconcerting. Is the naming now set in stone or will there be changes in the future?
Kim Morrison (Mar 24 2024 at 18:51):
It seems pretty unlikely to me that we will change such names.
Timo Carlin-Burns (Mar 24 2024 at 18:58):
Really? What do you mean by "such names"? I've noticed a lot of questionable and inconsistent naming in core/std and I always assumed that when Lean was ready to take on more stability promises, we would do a big survey and switch to a consistent set of names that we are happy to stabilize like the big pre-1.0 sweep Rust did
Kim Morrison (Mar 24 2024 at 19:10):
Personally, I have trouble caring about names as much as others do, and don't see anything to complain about in any of the examples mentioned so far in this thread. :-)
More generally in the FRO renaming things is often considered as rearranging deck chairs, and a time sink because of the inevitable bike shedding.
That said, if there are particular acute naming problems I'm game to start an internal discussion about what a renaming process would look like.
But "fix the gaping API holes, using the existing names", seems like a much higher priority. :-)
Kyle Miller (Mar 24 2024 at 19:30):
I'm on the fence about whether the inconsistencies for List
vs Array
is a problem or not. On one hand, I'm frequently annoyed when I switch between List
and Array
and then need to go through and swap length
for size
, etc., but on the other it's also often useful that they have different names because it forces thinking about the changes to performance. It's not clear that the language should be forcing this discipline however.
Timo Carlin-Burns (Mar 24 2024 at 20:27):
To give an example, I recently made a PR touching docs#Array.feraseIdx. I can't imagine that this is a permanent name, and I considered doing what I thought would be an improvement as a drive
-by change (mirroring get
and getD
with feraseIdx
-> eraseIdx
and eraseIdx
-> eraseIdxD
), but I decided that it wasn't worth it because eraseIdx
would probably not be the final name anyway. (I figured we would probably copy Rust's remove
in a naming sweep)
Julien Michel (Mar 25 2024 at 00:32):
@Scott Morrison I think if you want to attract more users and external contributors, it's becoming more and more important that the standard library and core look good and consistent. So far, as an outsider, I've been thinking "names feel weird and unaligned, I don't really wanna touch anything right now because everything will probably change soon". If other outsiders think as I do, I think you might miss opportunities to grow. If everything looked very consistent and appealing, don't you think you'd have more people willing to join in and fill the holes faster ?
BTW What API holes are you trying to fill in priority ? Is there documentation / discussions somewhere?
Kim Morrison (Mar 25 2024 at 01:36):
I appreciate what you're saying, but please make some allowance for "consistent and appealling" often being in the eye of the beholder, and for the desire to avoid bikeshedding. :-)
Re: plans for the API. One starting point is https://github.com/leanprover/std4/pull/561/files. It's lacking detail still, of course.
We started, but only barely, to give indications of desired API gaps using proof_wanted
in Std. Unfortunately there appear to only be two in main
at present, and std4#690 is working on resolving one.
HashMap.insert_find?
is still open, though!
More generally, for BitVec
/ HashMap
(and to a lesser extend the other data structures) we are still missing lots of lemmas about "observations about operations" (I guess insert_find?
is a typical example). (There's also the "simplifying pairs of operations into single operations, or commuting them" class of lemmas as well.)
My experience is that you try to use any of these data structures are there are no shortage of painful gaps.
BitVec
in particular, it would be great if every operation (of which there are many) had good lemmas for getLsb
, getMsb
, toNat
, and, where relevant toFin
, msb
. We also need some "casesOn" type lemmas for pulling of high or low bits, and chopping into pieces that can be appended. There are many missing pairwise operations lemmas in BitVec
.
Kim Morrison (Mar 25 2024 at 01:37):
Having a more detailed roadmap, and many more proof_wanted
statements, is definitely on the agenda, and I'm happy to have the reminder that there is appetite for helping fulfill these!
Mario Carneiro (Mar 25 2024 at 02:21):
Scott Morrison said:
HashMap.insert_find?
is still open, though!
I recently (re)started working on this, trying to get @Wojciech Nawrocki 's PR std4#279 over the line and close the API gap of basic theorems for HashMap (which are needed for at least lean4lean and a recent ITP formalization we worked on together, plus some incarnations of LeanSAT)
Last updated: May 02 2025 at 03:31 UTC