Zulip Chat Archive

Stream: lean4

Topic: List.concat and List.append


Wrenna Robson (Dec 04 2023 at 12:17):

I'm not sure where the best place to raise this is, and I am sure this conversation has been had before, but I was reminded of it again today and I really felt I should mention it.

List.concat does not perform concatenation. In essentially every language I am aware of, and e.g. on Wikipedia etc. concatenation is defined as the joining of two strings: that is, what List.append does. And indeed, I would normally say that append is the operation of adding an element on to the end of a list - what List.concat does.

Indeed, the description for List.concat is:
/-- l.concat a appends a at the *end* of l, that is, l ++ [a]. -/

And we then have things like this elsewhere:

/-- The homogeneous version of `HAppend`: `a ++ b : α` where `a b : α`. -/
class Append (α : Type u) where
  /-- `a ++ b` is the result of concatenation of `a` and `b`. See `HAppend`. -/
  append : α  α  α

How on earth did we get in a situation where we frequently confuse concatenation and appending, in ways that look,I'm sorry, kind of absurd? When you are saying all the time "this is append, which concatenates" and "this is concat, which appends", which we do, I can find lots of examples on a cursory check, that is to me the sign that something is deeply, weirdly wrong. I mean, from a formal perspective, fine! But why do we have to wear this confusion? Where did it come from? To me it just makes no sense at all.

Wrenna Robson (Dec 04 2023 at 12:18):

I want to stress that I understand this can't be a mistake, it's happened so much. And the people who wrote the core libraries are very clever people - cleverer than me. I just am baffled as to why we make this choice: if we had called "-" "Add" and "+" "Subtract" it would make as much sense to me.

Wrenna Robson (Dec 04 2023 at 12:19):

I'm sorry - I don't mean to make any trouble. This has just been really getting to me.

Marc Huisinga (Dec 04 2023 at 12:32):

image.png
:grinning_face_with_smiling_eyes:

Eric Wieser (Dec 04 2023 at 12:32):

The old thread is here

Wrenna Robson (Dec 04 2023 at 12:33):

Marc Huisinga said:

image.png
:grinning_face_with_smiling_eyes:

To be fair I don't mind the idea that append and concat are synonyms. It isn't totally unambiguous! IIRC Haskell uses concat for [[a]] -> [a].

Wrenna Robson (Dec 04 2023 at 12:34):

But all our documentation describing it upside down is just ???

Wrenna Robson (Dec 04 2023 at 12:35):

@Eric Wieser it looks like you reached a good solution there, but I assume there was never time for it to happen.

Wrenna Robson (Dec 04 2023 at 12:37):

Like I am not sure that "just switch append and concat" is right at all. I do note that we often seem to do that in documentation! For me concat is the greater ??? - append is one of those tricksy things that does tend to mean more stuff.

Wrenna Robson (Dec 04 2023 at 12:41):

But I think it is not true that every language makes totally different decisions - I think Lean might be unique (?) in having concat add an element to the end of a list (an operation which I struggle to describe as concatenation).

For a recent example of somewhere where the confusion is quite overtly exposed:

SMT-Lib name: `concat`.
-/
def append (msbs : BitVec n) (lsbs : BitVec m) : BitVec (n+m) :=
  shiftLeftZeroExtend msbs m ||| zeroExtend' (Nat.le_add_left m n) lsbs
/-- Append a single bit to the end of a bitvector, using big endian order (see `append`).
    That is, the new bit is the least significant bit. -/
def concat {n} (msbs : BitVec n) (lsb : Bool) : BitVec (n+1) := msbs ++ (ofBool lsb)

Even if we simply must preserve this distinction - then we should not use confusing language in documentation. And I would like to understand from where the confusion arose. I think Lisp might use "append" for putting lists together?

Eric Wieser (Dec 04 2023 at 12:46):

Wrenna Robson said:

Eric Wieser it looks like you reached a good solution there, but I assume there was never time for it to happen.

I assume you are referring to:

Eric Wieser said:

Personally I think the way forward here is:

  • Put together a document (gist) indicating the terminology used by as many languages as possible, both for and against the change
  • Make an offer to deal with making the change in Lean 4
  • Put forth the above two items to Leo to be the final arbiter
  • If the change is OK'd for lean4, backport to Lean 3

Wrenna Robson (Dec 04 2023 at 12:47):

I would be happy to help with putting together a summary document. Essentially for each language you'd want to write down
a) the name of the operation for adding an element to the start of a list
b) the name of the operation of adding an element to the end of a list
c) the name of the operation for taking two lists and outputting one list which contains the elements of both, first one then the other.

(I think in some languages you need to be clearer about the difference between creating a new object which is the concatenation of both lists, versus mutating one list by adding on the elements of the other. I guess in Lean we are mostly doing the former?)

Wrenna Robson (Dec 04 2023 at 12:47):

Eric Wieser said:

Wrenna Robson said:

Eric Wieser it looks like you reached a good solution there, but I assume there was never time for it to happen.

I assume you are referring to:

Eric Wieser said:

Personally I think the way forward here is:

  • Put together a document (gist) indicating the terminology used by as many languages as possible, both for and against the change
  • Make an offer to deal with making the change in Lean 4
  • Put forth the above two items to Leo to be the final arbiter
  • If the change is OK'd for lean4, backport to Lean 3

Yep - I think you put it well.

Wrenna Robson (Dec 04 2023 at 12:47):

Though we should skip the backporting to lean 3 bit ;)

Martin Dvořák (Dec 04 2023 at 12:50):

Note that these three operations differ a lot in their importance, namely:
a > c > b
I'm not claiming, however, that their importance should influence their names.

Wrenna Robson (Dec 04 2023 at 12:50):

Yeah, I agree with that. Generally speaking I find if you're using b, you shouldn't be ;)

Wrenna Robson (Dec 04 2023 at 12:52):

(I suppose you also have exotic generalisations like "insert an element at position i". For us this is List.insertNth which is the most sane name I can think of. Unfortunately List.insert does not have a compatible meaning but oh well.)

Wrenna Robson (Dec 04 2023 at 12:53):

(I mean, List.insert is a sensible definition given what Finset.insert etc. mean, don't get me wrong. It's just amusing that List.insertNth, which is also sensible, doesn't quite mean the same thing.)

Martin Dvořák (Dec 04 2023 at 12:53):

docs#List.insert

Wrenna Robson (Dec 04 2023 at 12:53):

I would not expect the "without duplication" part!

Wrenna Robson (Dec 04 2023 at 12:55):

Incidentally, for Fin n -> \alpha, a) and c) are Fin.cons and Fin.snoc, and we also have Fin.insertNth. b) is Fin.append.

Eric Wieser (Dec 04 2023 at 12:56):

I've made a start on the list at https://github.com/leanprover-community/mathlib4/wiki/Alternate-names-for-%60List.concat%60-and-%60List.append%60

Wrenna Robson (Dec 04 2023 at 12:57):

You may, for Lean at least, want to (as I just have there) note what we do for a few other datatypes - like Fin.snoc and List.concat are equivalent here, and we should probably note that.

Wrenna Robson (Dec 04 2023 at 12:57):

Can others edit a gist?

Wrenna Robson (Dec 04 2023 at 13:00):

Also apparantly in Python you can do mylist.append(["a", "b", "c"]) as well as mylist.append("a"). Absolutely cursed language.

Jakob von Raumer (Dec 04 2023 at 13:00):

Eric Wieser said:

I've made a start on the list at https://github.com/leanprover-community/mathlib4/wiki/Alternate-names-for-%60List.concat%60-and-%60List.append%60

Maybe you should also add whats List.join in Lean as a column since it also fights for the same name.

Wrenna Robson (Dec 04 2023 at 13:00):

Also you have .extend in Python which adds elements of an iterable to the end of a list.

Wrenna Robson (Dec 04 2023 at 13:05):

I've added PHP because we may as well look at the cursed languages first. Though functional ones probably a better shout.

Wrenna Robson (Dec 04 2023 at 13:25):

Scala has some very pleasing notation.

Wrenna Robson (Dec 04 2023 at 13:48):

(:+), (+:), and (++) for a), b) and c) respectively.

Wrenna Robson (Dec 04 2023 at 16:23):

I don't think we currently have join for tuples but it would be easy enough to do, as we have all the parts. I've added a note in about this.

Wrenna Robson (Dec 04 2023 at 16:23):

What this analysis is showing so far is that our use of append is not universal but not unknown. Our use of concat, however, I have seen nothing else like.

Eric Wieser (Dec 04 2023 at 16:26):

It does seem like renaming concat to snoc would be enough to avoid confusion

Eric Wieser (Dec 04 2023 at 16:27):

Wrenna Robson said:

I don't think we currently have join for tuples but it would be easy enough to do, as we have all the parts. I've added a note in about this.

Arguably that should work for mixed lengths of tuples

Wrenna Robson (Dec 04 2023 at 16:40):

Ah, where the length is dependent on the index?

Wrenna Robson (Dec 04 2023 at 16:40):

That's a more interesting challenge.

Wrenna Robson (Dec 04 2023 at 16:44):

Eric Wieser said:

It does seem like renaming concat to snoc would be enough to avoid confusion

I think there are two, maybe three things.

  • Rename concat to snoc. It is unequivocally the wrong name currently.
  • Decide whether we want to use concat at all. (I actually think the answer here is we don't, but other options are "use it where we use append" and "use it where we use join".)
  • Edit documentation so that we have no instances where we describe a function called "append" by saying that it concatenates. Either it should be called concat, or it should be described differently. Do similar for functions described as appending (though I think it's reasonably to say "snoc appends an element to the end of the list, so I'm not so hardline on this).

Wrenna Robson (Dec 04 2023 at 16:45):

Optionally, you could try and make a mutable/immutable distinction, but I think we probably won't.

Joe Hendrix (Dec 04 2023 at 17:04):

Array uses push which I find quite natural in that context. It's easier to pronounce that snoc and List.push would probably not confuse people.

Wrenna Robson (Dec 04 2023 at 17:26):

I don't disagree with that, but then in that case I would like to change all uses of snoc to the same thing...

Marc Huisinga (Dec 04 2023 at 17:26):

I would probably find O(n) push to be somewhat confusing

Wrenna Robson (Dec 04 2023 at 17:27):

Eric Wieser said:

Wrenna Robson said:

I don't think we currently have join for tuples but it would be easy enough to do, as we have all the parts. I've added a note in about this.

Arguably that should work for mixed lengths of tuples

def sigmaCurry (f : β  Type u) (α : Type v) : ((i : β)  (f i)  α)  ((Σ a : β, f a)  α) :=
{ toFun := fun g x, y => g x y
  invFun := fun h x y => h x, y
  left_inv := fun g => by ext ; rfl
  right_inv := fun h => by ext ; rfl }

open BigOperators
def finSigmaFinEquiv (f : Fin n  ) : (Σ a, Fin (f a))  Fin ( i, f i) := sorry


def finDepJoin (f : Fin n  ) : ((i : Fin n)  (Fin (f i)  α))  (Fin ( i, f i)  α) :=
  (sigmaCurry (fun i => Fin (f i)) α).trans (Equiv.arrowCongr (finSigmaFinEquiv f) (Equiv.refl α))

Something like this I think, but we need someone (@Yaël Dillies ???) to fill in that sorry I think. I was also surprised we didn't have sigmaCurry but we don't as far as I can see!

Wrenna Robson (Dec 04 2023 at 17:29):

Marc Huisinga said:

I would probably find O(n) push to be somewhat confusing

Yes, this is fair. And a good distinguisher! snoc for O(n), push otherwiseis a good rule.

Yaël Dillies (Dec 04 2023 at 17:29):

I would have expected it to be Equiv.sigmaCurry to match docs#Equiv.curry. Funnily enough, we have docs#DFinsupp.sigmaCurryEquiv

Yaël Dillies (Dec 04 2023 at 17:30):

That sorry doesn't seem very hard but I'm working on something else right now.

Wrenna Robson (Dec 04 2023 at 17:30):

Aha, it's Equiv.piCurry

Wrenna Robson (Dec 04 2023 at 17:31):

A slightly unexpected name.

Wrenna Robson (Dec 04 2023 at 17:31):

I don't think it is very hard but I wanted to see if you were going to tell me "oh yes it's a trivial consequence of such and such".

Wrenna Robson (Dec 04 2023 at 17:35):

This has finally made me intuitively understand why Sigma types are called that. So that's nice.

Joe Hendrix (Dec 04 2023 at 17:40):

I would find a O(1) List.push confusing, but I get the point that it may lead people to think it's more efficient than it is. Interestingly, List.concat is defined much earlier than List.append and it looks like that's done so one can give a semantics to Array.push. I think one could either rename the function or also move List.append up so List.snoc is definitionally equal to x ++ [y].

Wrenna Robson (Dec 04 2023 at 18:00):

It certainly looks as though you could do that. I would imagine touching the actual structure of Init.Prelude would be controversial...

Mario Carneiro (Dec 04 2023 at 19:52):

I believe mathlib (or possibly std?) has a simp lemma which unconditionally simplifies List.concat a b to a ++ [b]. It is definitely dispreferred in proofs

Mario Carneiro (Dec 04 2023 at 19:53):

I'm not a fan of snoc because it's a made up word / jargon, which languages use this name?

Yaël Dillies (Dec 04 2023 at 19:54):

I'm not sure why in fact. I sometimes wanted to dualise proofs by replacing cons with concat everywhere and adding the missing lemmas on the way, but mathlib worked against me by replacing everything with ++ [a].

Wrenna Robson (Dec 04 2023 at 19:57):

Haskell has a library for snoc. It implements Tsil

Wrenna Robson (Dec 04 2023 at 19:57):

Which is of course List backwards

Thomas Murrills (Dec 05 2023 at 01:14):

Would it be reasonable at this point to add a table which gives a rough "human" assessment of the frequency at which different proposed names are used for each respective operation (as well as simple reasons/details when appropriate)? (I feel using raw numbers for the frequencies is tricky since there are a bunch of languages we haven't surveyed—a gist might be more useful.)

append concat join flatten snoc push
++ often often :cross_mark: never :cross_mark: never :cross_mark: never :cross_mark: never
(· ++ [·]) often :cross_mark: never :cross_mark: never :cross_mark: never rarely, but in nearby languages<br>– pure jargon sometimes<br>+ consistent with Array.push<br>– unintuitive $O(n)$ behavior
⋯ ++ ⋯ ++ ⋯ :cross_mark: never often often often :cross_mark: never :cross_mark: never

But—do we agree with these assessments?

Thomas Murrills (Dec 05 2023 at 01:22):

Btw, thought I should note that we also confusingly have docs#Array.concatMap:

def concatMap (f : α  Array β) (as : Array α) : Array β :=
  as.foldl (init := empty) fun bs a => bs ++ f a

in core.

Thomas Murrills (Dec 05 2023 at 01:28):

This is despite having both docs#Array.flatten and docs#Array.join in std (which mean the same thing as each other and are implemented the same way, except Array.join is @[inline]) and Array.concat not existing—presumably the name ought to be Array.(flatten|join)Map, or we ought to have Array.concat

Wrenna Robson (Dec 05 2023 at 12:55):

concatMap is a fairly standard name for this. I don't think flattenMap or joinMap are as much so?

I seem to recall that join, map, and "concatMap" are all related monadically - specifically iirc concatMap is the bind operation for the List/Array monad.

Wrenna Robson (Dec 05 2023 at 12:58):

Hmm, apparently Java uses flatMap

Alex Keizer (Dec 05 2023 at 13:10):

Rust also uses flat_map

Wrenna Robson (Dec 05 2023 at 13:12):

Good argument for using it in some ways. But I think consistency is the key.

Alex Keizer (Dec 05 2023 at 13:25):

Also, re: snoc being jargon, I'd say cons is also very high on the jargon scale, but it's not that bad since we tend to use the :: notation anyway.
I personally like snoc because it's unambiguous, so what if we take a page from Scala's book and introduce Cons and Snoc typeclasses with (+:) and (:+) notations?

Wrenna Robson (Dec 05 2023 at 13:26):

Do we have a (++) typeclass currently?

Eric Wieser (Dec 05 2023 at 13:26):

docs#HAppend

Alex Keizer (Dec 05 2023 at 13:26):

Yes, HAppend

Eric Wieser (Dec 05 2023 at 13:26):

But a cons typeclass would be more annoying because then we end up with List.cons not HCons.cons when doing list induction

Eric Wieser (Dec 05 2023 at 13:27):

(We already have this problem for Nat, but until we solve it we probably don't want to make it worse)

Alex Keizer (Dec 05 2023 at 13:28):

Hmm, can we not define a trivial induction principle in terms of HCons.cons and mark that as the default eliminator?

Eric Wieser (Dec 05 2023 at 13:31):

Yes, but that also replaces the induction principle for cases, they can't currently be overridden separately.

Eric Wieser (Dec 05 2023 at 13:31):

Kyle is working on this, but it's not ready yet

Alex Keizer (Dec 05 2023 at 13:39):

Which would be annoying because then you'd also have the induction hypothesis show up if you're using cases, gotcha

Alex Keizer (Dec 05 2023 at 13:40):

Anyways, my suggestion was to use a new notation +: for the generic cons, so we could just keep on using :: as dedicated notation for List.cons.

Alex Keizer (Dec 05 2023 at 13:41):

But it would be nice to unify, so maybe we can just wait with this refactor until Kyle is done

Kyle Miller (Dec 05 2023 at 17:22):

I can't promise that custom induction principles except for the nice Nat one will be fully supported in the short term. There turns out to be more than just splitting @[eliminator] into two to get the experience to be nice, unfortunately. (Right now what I'm seeing is that when you unfold a recursive definition, you get Nat.add n 0 rather than just n. I'd imagine a generic cons could be liable to unfold too.)

Kyle Miller (Dec 05 2023 at 17:27):

There's desire for custom induction principles for more than Nat (for example lean4#2716 where extending pattern matching is considered), but in the short term the focus is just on Nat, where it would be a clear improvement making sure induction uses 0 and n + 1.

Wrenna Robson (Dec 05 2023 at 17:34):

It seems we shouldn't let that derail the discussion about concat etc.

Thomas Murrills (Dec 05 2023 at 21:27):

Wrenna Robson said:

concatMap is a fairly standard name for this. I don't think flattenMap or joinMap are as much so?

I seem to recall that join, map, and "concatMap" are all related monadically - specifically iirc concatMap is the bind operation for the List/Array monad.

Wouldn't the better, more consistent name be "whatever we use for ... ++ ... ++ ..., then Map"? So that e.g. joinMap f is the same as join ∘ map f, just more efficient. Otherwise we're both wrongly suggesting concat is involved (if that winds up existing somewhere and is not ... ++ ... ++ ...) and making people learn an extra name even though existing notions suffice.

I feel like what we prioritize when answering this also bears on snoc: how do we rank and balance the importance of

  1. being consistent with the relatively niche group of nearby functional languages and type theory literature
  2. being consistent with broader programming-language usage
  3. being internally consistent with our naming
  4. using names which reflect their intuitive, english-language meaning in their semantics

I'd personally deprioritize (1) the most due to Lean's relationship to math, and the need to make the naming accessible and discoverable to non-programming mathematicians and newcomers. As such, I'd rank these as: 4, 2, 3, 1. (For example, re: the above discussion about snoc, I'd vote for Prepend and Append typeclasses over Cons and Snoc. But I wouldn't push for having an "internally consistent" List.push/Array.push naming scheme if this were inconsistent with broader programming-language usage.) I'm curious to hear how other people would rank these (and if there are priorities that I'm missing)!

Wrenna Robson (Dec 05 2023 at 21:29):

I think I would put 4, 3, 2, 1.

Wrenna Robson (Dec 05 2023 at 21:31):

Though I might split 1 up - I think it is more important to be consistent with "other languages that somebody using Lean will have used before", and that probably includes Rust and Haskell but also Python and Java and C.

Wrenna Robson (Dec 05 2023 at 21:31):

As we've seen, it may also be useful to look at languages like Scala to see other solutions to the notational issue.

Wrenna Robson (Dec 05 2023 at 21:34):

In some sense I think the important thing to me is that it feels "ergonomic", which is a combination really of the concerns of 4 and 3.

Thomas Murrills (Dec 05 2023 at 21:46):

Yeah, that's what's important to me too!

I suppose part of the question is "ergonomic for who, in what context": as a bit of a programming outsider personally, I felt I didn't want to dismiss the concerns of people who felt that consistency across languages was familiar, standard, and therefore ergonomic, and was trying to balance non-programmer interests with programmer interests (since I'm guessing both will form a large share of the community).

But if it were just me, or we wanted to consider a "Lean-only" context (such as when strictly prioritizing newcomer experience), I'd move 3 above 2 as well. (Of course, if people (such as yourself?) who are not "programming outsiders" also like 4, 3, 2, 1, I'm happy to jump on board... :grinning_face_with_smiling_eyes:)

Wrenna Robson (Dec 05 2023 at 21:51):

I don't know what I am really. I suppose I have many hats. I've been a mathematician in the past but nowadays I have a bit more of a programmer view of things.

In a sense what we are thinking about is what they call "user stories", no? The different ways people come to Lean and the different things they'll bring.

I think if someone is using the List/Array parts of Lean, they are probably more likely to be someone with prior exposure to other programming languages. They're likely to be an English-language user but not necessarily someone with English as a first language, so it's important we aren't using English words in ways counter to either their dictionary or colloquial meanings. I think they'll be comfortable with the idea that different languages have different conventions and names for things, but they will bring their prior experiences to bear. And they're probably more likely than not to have experience with other functional programming languages.

Mac Malone (Dec 05 2023 at 22:12):

For my two cents, I think List.concat should be named List.snoc. Now it is most certainly weird jargon -- built on top of the already Lisp-inspired jargon name of cons for prepending. However, we do use cons already so the connection is a bit clear. Most important, though, the use of List.snoc is a code smell. Thus, the fact that is jargon alerts you that you may want to be doing something different (namely, using an Array and Array.push).

Wrenna Robson (Dec 05 2023 at 22:22):

Array.push gives you the last inserted element of the array, is that right? Whereas List.snoc gives you the first inserted element of the list? And for both data structures, they're LIFO, so Array.push is fast and List.snoc is slow?

Thomas Murrills (Dec 05 2023 at 22:30):

Hmm! So,

  1. name "bad" operations "badly" (to discourage use)

Mac Malone (Dec 05 2023 at 22:30):

@Wrenna Robson Sort of. List is a singly linked list -- LIFO at head. Thus adding and deleting is fast (O(1)) for the first element and slow for the last. Array is a growable block of contiguous memory (also known as a vector in other languages like C++ and Rust) that expands from the tail. Thus, adding and deleting elements is fast at the end and slow at the start.

Thomas Murrills (Dec 05 2023 at 22:31):

Hang on! We have unicode: List.ƨnoↄ :P

Wrenna Robson (Dec 05 2023 at 22:34):

Mac Malone said:

Wrenna Robson Sort of. List is a singly linked list -- LIFO at head. Thus adding and deleting is fast (O(1)) for the first element and slow for the last. Array is a growable block of contiguous memory (also known as a vector in other languages like C++ and Rust) that expands from the tail. Thus, adding and deleting elements is fast at the end and slow at the start.

But "end" is just a matter of where you index it, I suppose.

Mac Malone (Dec 05 2023 at 22:35):

Thomas Murrills said:

Hmm! So,

  1. name "bad" operations "badly" (to discourage use)

To a certain extent, yes. This is why, for example, Python has no built-in prepend operation for lists (which are Lean's Array), but rather forces you to use .insert(0, a). It helps highlight the fact that the action the user is performing is not quite what it appears (i.e., does not actually parallel append in performance).

Mac Malone (Dec 05 2023 at 22:36):

@Wrenna Robson Correct! "start" and "end" in this case mean lowest and highest index respectively (i.e., according to the order xs[n] would return).

Thomas Murrills (Dec 05 2023 at 22:40):

Right, deliberately tagging inefficient/likely “wrong” operations with code smell seems sensible for UX! The less times you need to know “oh, yeah, just don’t do that”, the better. I noticed that we have the same pattern for Arrays: xs.insertAt! 0 x seems to be the only way to do “Array.cons” currently.

Thomas Murrills (Dec 05 2023 at 22:45):

In a way, 5 is counterintuitively consistent with 4 and general notions of ergonomicness (ergonomicity? ergonomy?): by making only the “nice” operations ergonomic, you prevent newcomers from making mistakes by using “bad” operations freely. Then ergonomic code ↔︎ “ergonomic” algorithms (as perceived by the computer), and the broader experience of coding becomes more ergonomic since pitfalls aren’t.

Mario Carneiro (Dec 05 2023 at 23:20):

Thomas Murrills said:

Right, deliberately tagging inefficient/likely “wrong” operations with code smell seems sensible for UX! The less times you need to know “oh, yeah, just don’t do that”, the better. I noticed that we have the same pattern for Arrays: xs.insertAt! 0 x seems to be the only way to do “Array.cons” currently.

Note that this would motivate the option of just not having List.concat at all and using x ++ [y] instead

Mario Carneiro (Dec 05 2023 at 23:21):

which IMO is better all around

Wrenna Robson (Dec 05 2023 at 23:24):

Yes I was just thinking that. Is there really any reason not to do that?

Eric Wieser (Dec 05 2023 at 23:28):

It's useful to have a canonical name for the operation to use in lemma names, even if we spell it with ++

Mac Malone (Dec 06 2023 at 00:06):

Mario Carneiro said:

Note that this would motivate the option of just not having List.concat at all and using x ++ [y] instead

True. In fact, I suspect this being the canonical representation is what lead to List.concat being named concat in the first place.

Thomas Murrills (Dec 06 2023 at 00:20):

Maybe append1 (or <whateverwecallit>1) would be a sufficiently unsightly but informative option. “You’re just appending a singleton list here, nothing special.”

Thomas Murrills (Dec 06 2023 at 00:56):

Or we could simply say append_singleton or something in lemma names, without a dedicated definition.

Wrenna Robson (Dec 06 2023 at 01:29):

Eric Wieser said:

It's useful to have a canonical name for the operation to use in lemma names, even if we spell it with ++

Agreed - this could probably be snoc.

Wrenna Robson (Dec 06 2023 at 01:30):

Thomas Murrills said:

Or we could simply say append_singleton or something in lemma names, without a dedicated definition.

Or this, yes.

Yaël Dillies (Dec 06 2023 at 07:25):

That's quite annoying for proving purposes. You really want something which is pretty short so that you can write API about it matching cons.

Wrenna Robson (Dec 06 2023 at 07:26):

On that basis snoc seems perfect.

Alex Keizer (Dec 06 2023 at 08:01):

Note the context of this discussion: BitVec.concat (which is an efficient / "good" operation) was named thus to be consistent with List.concat. I would very much not want to call this an ugly name, so there is a tradeoff between priority 5 and internal consistency

Alex Keizer (Dec 06 2023 at 08:03):

However, if people are working with bitvectors, it's safe to assume they are at least somewhat knowledgeable programming-wise and snoc should make some sense, so I think this name strikes the correct balance

Wrenna Robson (Dec 06 2023 at 08:05):

It also matches Fin.snoc, and my gut feeling is that that is more important (Fin.snoc and Fin.append are what we use for tuples, which obviously BitVec isn't but it is imo the best model for them outside of the Fin(2^m) form, see discussions passim).

Alex Keizer (Dec 06 2023 at 08:10):

Also, it should be noted that not every use of concat is necessarily bad. For example, it might be easier to induct on a list in terms of [] and concat in a proof. I would thus strongly suggest we do keep a name for it!

Wrenna Robson (Dec 06 2023 at 08:11):

Hmm. But is it currently possible to do that i.e. do we have an induction principle for it?

Alex Keizer (Dec 06 2023 at 08:14):

We have one for Vector at least (which incidentally also uses snoc)

Yaël Dillies (Dec 06 2023 at 08:14):

docs#List.list_reverse_induction

Wrenna Robson (Dec 06 2023 at 08:16):

Presumably you could still have the principle even if we didn't have a name for xs ++ [x]

Wrenna Robson (Dec 06 2023 at 08:16):

Indeed I see that we don't use the name List.concat there!!

Alex Keizer (Dec 06 2023 at 08:18):

Sure, but having the name reinforces that this is an operation we do care about, and want to have API for. Plus, having the name makes finding that API easier

Wrenna Robson (Dec 06 2023 at 08:18):

I wonder how much that inductive principle is currently used (I'm on my phone, a little fiddly to check).

Yaël Dillies (Dec 06 2023 at 08:19):

I used it quite a lot in the past, but I think my uses never made it to mathlib.

Alex Keizer (Dec 06 2023 at 08:20):

Although I've not used the List one, I have used the Vector principle (which by virtue of being a subtype of List benefits from having this API as well)

Wrenna Robson (Dec 06 2023 at 08:22):

what were you proving?

Alex Keizer (Dec 06 2023 at 08:23):

Properties of Vector.mapAccumr (which is a foldr in disguise)

Wrenna Robson (Dec 06 2023 at 08:25):

Yeah, it makes sense that you'd use it when proving stuff about foldrs

Wrenna Robson (Dec 06 2023 at 08:28):

I think I'm increasingly of the opinion - I swear I am not a lobbyist for Big Tuple - that "cons" for "insert on left", "append" for "join two lists", and "snoc" for "insert on right" are good short forms for this. With probably "flatten" for the other thing that gets called concat sometimes. And then I'd just ban use of concat.

Wrenna Robson (Dec 06 2023 at 08:28):

But as discussed in some contexts other spellings are going to be appropriate, e.g. I like Array.push

Alex Keizer (Dec 06 2023 at 08:30):

Array is slightly different, because of it's different usage parrern, but I would like List, Vector, Fin-tuples and BitVec to all have consistent naming. The ones you propose, though, would fit for all of these IMO

Mario Carneiro (Dec 06 2023 at 09:35):

Alex Keizer said:

However, if people are working with bitvectors, it's safe to assume they are at least somewhat knowledgeable programming-wise and snoc should make some sense, so I think this name strikes the correct balance

I very much disagree with this reasoning. People working with bitvectors are probably quite knowledgeable programming-wise but maybe not the kind of programming that would use snoc, e.g. C, LLVM, Verilog.

Wrenna Robson (Dec 06 2023 at 09:36):

Indeed, a more pertinent question might be "what does SMT-Lib use" (for instance).

Mario Carneiro (Dec 06 2023 at 09:37):

I don't think they have this operation

Mario Carneiro (Dec 06 2023 at 09:38):

and it makes a bit of sense, they don't often treat bits via inductive principles because inductive reasoning is hard

Wrenna Robson (Dec 06 2023 at 09:38):

Yes - I know they have append/concat (because it's noted in BitVec.append that concat is the thing used in SMT-Lib) but I don't know that they have - well I mean effectively it's multiplication by 2 and add 1.

Mario Carneiro (Dec 06 2023 at 09:38):

"bitblasting" is literally just introducing 64 variables for 64 bits

Wrenna Robson (Dec 06 2023 at 09:38):

Mario Carneiro said:

and it makes a bit of sense, they don't often treat bits via inductive principles because inductive reasoning is hard

Tell me about it...

Mario Carneiro (Dec 06 2023 at 09:39):

they have shift left

Wrenna Robson (Dec 06 2023 at 09:39):

Indeed.

Mario Carneiro (Dec 06 2023 at 09:40):

aka they do it the way you would do it in C

Wrenna Robson (Dec 06 2023 at 09:40):

As I have been discussing ad naseum, I've been working a lot with "pull out the Nth bit" as an operation but I can't pretend it's a thing people do that often normally.

Alex Keizer (Dec 06 2023 at 09:40):

Hmm, good point, but cons would be just as unfamiliar / jargon-like for people with a C, LLVM, Verilog background. I guess my argument was more, people with such experience might be more comfortable with learning jargon (even if they don't know this particular one)

Mario Carneiro (Dec 06 2023 at 09:42):

I don't think they will be using cons either?

Alex Keizer (Dec 06 2023 at 09:52):

Mario Carneiro said:

and it makes a bit of sense, they don't often treat bits via inductive principles because inductive reasoning is hard

Presumably, if you're choosing to reason about bitvectors in Lean, you're doing that because you want to prove things about arbitrary widths, or other kinds of proofs for which bit-blasting is insufficient, right? Or because you're implementing an SMT-solver in Lean. Otherwise, would a regular SMT-solver not work better?

Wrenna Robson (Dec 06 2023 at 09:52):

Or you're coming at it from a different angle.

Alex Keizer (Dec 06 2023 at 09:53):

Fair enough, it might be that my perspective is too coloured by our particular use-case and motivation

Wrenna Robson (Dec 06 2023 at 09:54):

Well, same hat. But I suppose that's my point. Everyone will have different perspectives.

Alex Keizer (Dec 06 2023 at 09:57):

Anyway, we might want to refocus the discussion back to the original topic, what/if to name List.concat.

The latest proposal is to rename List.concat to List.snoc.
Myself, @Wrenna Robson, @Mac Malone, (and @Yaël Dillies?) have expressed support for this

@Thomas Murrills you've stated you don't like snoc because it's too much jargon. How do you feel about the response that, yes, it's jargon, but since it has some pitfalls it's a good thing that the name is not too accessible (paraphrased from @Mac Malone's)

@Mario Carneiro would either like to remove List.concat or keep it as-is.

@Eric Wieser, @Marc Huisinga, @Joe Hendrix you've participated in the discussion, but I couldn't see a clear opinion in favor or against, maybe you'd like to share your thoughts?

EDIT: to reflect Mario's response

Mario Carneiro (Dec 06 2023 at 09:57):

I would like to remove it

Mario Carneiro (Dec 06 2023 at 09:57):

or keep it as is

Alex Keizer (Dec 06 2023 at 10:01):

Would you support keeping a succinct canonical name for use in lemmas, even if we spell it xs ++ [x]?

Mario Carneiro (Dec 06 2023 at 10:03):

sure, Thomas suggested append1 or append_singleton

Yaël Dillies (Dec 06 2023 at 10:04):

append_singleton is not what I'd call "succinct".

Mario Carneiro (Dec 06 2023 at 10:05):

true. I'm fine with using domain-specific naming methods here

Mario Carneiro (Dec 06 2023 at 10:06):

which is to say, I would prefer to be asked the question "what to name this particular lemma" rather than "what to name lemmas containing x ++ [y] in general"

Alex Keizer (Dec 06 2023 at 10:06):

Since this is (I think) already the third thread calling out the confusing names, leaving it as-is seems undesirable

Yaël Dillies (Dec 06 2023 at 10:07):

Mario Carneiro said:

which is to say, I would prefer to be asked the question "what to name this particular lemma" rather than "what to name lemmas containing x ++ [y] in general"

That's quite antithetic to the naming convention.

Mario Carneiro (Dec 06 2023 at 10:08):

I mean, the usual naming convention would be to call it append_singleton but you don't like that so :shrug:

Mario Carneiro (Dec 06 2023 at 10:09):

I think in many cases I can think of it won't be necessary to call out this sequence of symbols in particular

Mario Carneiro (Dec 06 2023 at 10:09):

so it might help to talk about a particular lemma as an example at least

Wrenna Robson (Dec 06 2023 at 10:09):

I quite like append_singleton. If you really wanted you could shorten singleton to sngletn but if you want to do that we should do it everywhere (and it seems bad to me).

Mario Carneiro (Dec 06 2023 at 10:10):

I think it is abbreviated to single in some places, but most of the time it really is the full singleton

Mario Carneiro (Dec 06 2023 at 10:10):

it's a long name for a short operation

Wrenna Robson (Dec 06 2023 at 10:10):

I recently had a similar issue with a bunch of lemmas about commutators

Yaël Dillies (Dec 06 2023 at 10:11):

Following the naming convention, if we're not going to have a named definition for x ++ [y], we should at least consider it as such in lemma names by using a single atom to refer to it. Eg appendSingleton.

Mario Carneiro (Dec 06 2023 at 10:11):

no, because it's two operations

Yaël Dillies (Dec 06 2023 at 10:11):

Yes, but the fact is that you really do want to consider it a single operation for proving purposes!

Mario Carneiro (Dec 06 2023 at 10:11):

I disagree...

Wrenna Robson (Dec 06 2023 at 10:11):

Yeah, really what it is is append_cons_nil, right?

Yaël Dillies (Dec 06 2023 at 10:12):

Try splitting docs#List.list_reverse_induction into something about append and something about singleton.

Mario Carneiro (Dec 06 2023 at 10:12):

That's why I suggested using a one word abbreviation for the combo but Yael says it's "antithetic to the naming convention"

Mario Carneiro (Dec 06 2023 at 10:13):

even though they just gave an example which is clearly using that style of naming

Wrenna Robson (Dec 06 2023 at 10:13):

You know when I questioned the colour of this bike shed I hadn't anticipated arguments about whether or not it should have a roof.

Wrenna Robson (Dec 06 2023 at 10:13):

But I see now I was a fool.

Alex Keizer (Dec 06 2023 at 10:13):

which is to say, I would prefer to be asked the question "what to name this particular lemma" rather than "what to name lemmas containing x ++ [y] in general"

My objection to this statement was that it suggested naming the sequence x ++ [y] differently in different lemmas

Yaël Dillies (Dec 06 2023 at 10:13):

Mario Carneiro said:

which is to say, I would prefer to be asked the question "what to name this particular lemma" rather than "what to name lemmas containing x ++ [y] in general"

I read this as saying that we might want to call it snoc in one context and append_singleton in another, and append1 in yet another one, depending on the current mood.

Mario Carneiro (Dec 06 2023 at 10:14):

Alex Keizer said:

My objection to this statement was that it suggested naming the sequence x ++ [y] differently in different lemmas

More like, in some situations you will be naming things other than this operation in the lemma

Mario Carneiro (Dec 06 2023 at 10:14):

like List.reverse_induction

Mario Carneiro (Dec 06 2023 at 10:15):

(actually, it's List.list_reverse_induction? what's up with the duplication)

Alex Keizer (Dec 06 2023 at 10:15):

Sure, but a lot of the relevant API is how this operation commutes with other things, like List.foldr_concat or List.map_concat

Mario Carneiro (Dec 06 2023 at 10:16):

and that's great, because such lemmas are unnecessary, being a simple composition of other things

Yaël Dillies (Dec 06 2023 at 10:17):

Not foldr_concat, no.

Mario Carneiro (Dec 06 2023 at 10:17):

docs#List.foldr_concat

Mario Carneiro (Dec 06 2023 at 10:17):

are you sure? It looks like I could just hit it with simp

Alex Keizer (Dec 06 2023 at 10:17):

I would very much like to call that induction principle List.foo_induction, where foo is snoc / concat / append_singleton,

Mario Carneiro (Dec 06 2023 at 10:18):

why?

Mario Carneiro (Dec 06 2023 at 10:18):

we don't call the other one List.cons_induction

Yaël Dillies (Dec 06 2023 at 10:18):

Mario Carneiro said:

are you sure? It looks like I could just hit it with simp

Pretty sure, yes. Can you hit docs#List.foldl_cons with simp?

Mario Carneiro (Dec 06 2023 at 10:18):

well yes, because it's a simp lemma

Yaël Dillies (Dec 06 2023 at 10:19):

Okay sure, but that's not my point. Can you remove the current List.foldl_cons and reprove it by simp?

Mario Carneiro (Dec 06 2023 at 10:19):

No, but lists are not defined symmetrically so I'm not sure what that proves

Yaël Dillies (Dec 06 2023 at 10:20):

I think the answer is "Yes, because that's the definition of foldl", but that answer specifically doesn't apply to foldr, which doesn't use structural recursion.

Mario Carneiro (Dec 06 2023 at 10:20):

I can prove List.foldr_concat by simp because it's a composition of simpler operations with simp lemmas

Mario Carneiro (Dec 06 2023 at 10:21):

I'm not really talking about how these lemmas themselves are proved though, I'm talking about whether in user code one could prove an instance of List.foldr_concat by simp

Mario Carneiro (Dec 06 2023 at 10:21):

I don't really care how the lemmas themselves are proved (well I do but that's another discussion)

Alex Keizer (Dec 06 2023 at 10:21):

Mario Carneiro said:

we don't call the other one List.cons_induction

We do have Fin.consInduction, Finset.cons_induction, Fin.snocInduction in this pattern

Mario Carneiro (Dec 06 2023 at 10:22):

eh, we have Fin.snoc? :sad:

Alex Keizer (Dec 06 2023 at 10:22):

We do

Wrenna Robson (Dec 06 2023 at 10:23):

Mario Carneiro said:

eh, we have Fin.snoc? :sad:

I think Fin.snoc makes more sense than List.snoc? I could be wrong though. But tuples are defined quite differently...

Alex Keizer (Dec 06 2023 at 10:23):

But we also have Fin.induction and Fin.reverseInduction, so possibly the constructorInduction are the ones that need changing. Let's put this side-track on hold, though

Mario Carneiro (Dec 06 2023 at 10:25):

I think we should be using names inspired from languages like Java, JS, Rust, Ruby. I really want to combat the common perception of functional languages as elitist "monads are just monoids in the category of endofunctors" stuff

Mario Carneiro (Dec 06 2023 at 10:25):

I am strongly opposed to the use of snoc in library functions, it's second-order jargon

Wrenna Robson (Dec 06 2023 at 10:25):

Mario Carneiro said:

I think we should be using names inspired from languages like Java, JS, Rust, Ruby. I really want to combat the common perception of functional languages as elitist "monads are just monoids in the category of endofunctors" stuff

I'm not quite sure that Rust is who we should be looking at for "don't be elitist" cues but I agree with the sentiment.

Mario Carneiro (Dec 06 2023 at 10:28):

I stand by my statement. Rust is a very good model for this kind of thing, they do think quite hard about how to connect to users coming from different backgrounds like python, JS or C

Alex Keizer (Dec 06 2023 at 10:28):

Although I agree with this sentiment, we should be careful about mirroring naming that people associate with O(1) mutating operations for functional constructors

Wrenna Robson (Dec 06 2023 at 10:28):

Yes, that's fair. It just amused me because I do know people who find Rust very intimidating!

Mario Carneiro (Dec 06 2023 at 10:29):

Oh sure, it's a hard language for irreducible reasons

Mario Carneiro (Dec 06 2023 at 10:29):

but I don't think the naming of functions is the problem

Mario Carneiro (Dec 06 2023 at 10:29):

Lean is also a hard language for irreducible reasons

Wrenna Robson (Dec 06 2023 at 10:30):

Yes, and I think it's uncommonly approachable for languages in its class, but I agree that every step towards making that even more true is good.

Alex Keizer (Dec 06 2023 at 10:31):

The closes analogue of List in Rust is the LinkedList, which uses push_front() and push_back() for adding a single element at either end, but these don't seem appropriate for functional lists

Wrenna Robson (Dec 06 2023 at 10:32):

Is that true? LinkedList is a doubly linked list (that's why they have both I think).

Alex Keizer (Dec 06 2023 at 10:44):

That is true, my objection was mostly about push feeling weird in a non-mutating setting, but then again, Rust's LinkedList is effectively non-mutating as well, so that might just be me being too entrenched in the FP style of naming things

Mario Carneiro (Dec 06 2023 at 10:44):

yeah I was thinking pushBack might not be so bad

Mario Carneiro (Dec 06 2023 at 10:45):

we use mutating terminology in lean all the time

Alex Keizer (Dec 06 2023 at 10:46):

Then I would say: for List.concat we drop the name (to match the fact that it's an inefficient operation, I am hesitant to call an O(n) operation push), but in other context (e.g, BitVec) where it does make sense, use pushBack

Wrenna Robson (Dec 06 2023 at 10:50):

Presumably we should also use that where we currently use Fin.snoc.

Wrenna Robson (Dec 06 2023 at 10:51):

(It is slightly irritating that that isn't about snoc on Fin but about tuples, i.e. pi-types from Fin - but it is what it is.)

Alex Keizer (Dec 06 2023 at 10:57):

(There has been previous discussion about, and support for, creating a separate name(space) for Fin-tuples, but this is another side-track that deserves it's own thread)

Wrenna Robson (Dec 06 2023 at 10:57):

(I'd support it but indeed, stay on target.)

Mac Malone (Dec 06 2023 at 19:33):

Alex Keizer said:

Note the context of this discussion: BitVec.concat (which is an efficient / "good" operation) was named thus to be consistent with List.concat. I would very much not want to call this an ugly name, so there is a tradeoff between priority 5 and internal consistency

In this case, it seems like it should be called BitVec.push (or your suggestion of pushBack). Alternatively, it could be called some kind of a left shift, which as Mario noted, is what operation would be in lower-level languages.

Mac Malone (Dec 06 2023 at 19:36):

Alex Keizer said:

The closes analogue of List in Rust is the LinkedList, which uses push_front() and push_back() for adding a single element at either end, but these don't seem appropriate for functional lists

Note that Rust's LinkedList is a doubly linked list so both those operations are "good" O(1) operations (like Array.push). To me, this further reinforces that a more obscure name like List.snoc is key to prevent people from thinking it is a "good" operation.

Mac Malone (Dec 06 2023 at 19:38):

Mario Carneiro said:

I am strongly opposed to the use of snoc in library functions, it's second-order jargon

It is not math jargon, though, it is very CS jargon from Lisp. (Also, it is not very hard to unpack as it is just the name of the front operation backwards.)

Mac Malone (Dec 06 2023 at 19:42):

Mario Carneiro said:

I think it is abbreviated to single in some places, but most of the time it really is the full singleton

Mario Carneiro said:

it's a long name for a short operation

On this bike shed, I have long wanted to suggest the much more succinct name solo for singleton. My inspiration being the name of Data.Tuple.Solo for a 1-tuple in Haskell.

Wrenna Robson (Dec 06 2023 at 19:43):

It's more of a bike warehouse at this point.

Thomas Murrills (Dec 06 2023 at 23:10):

On reflection, the fact that Lean is also used for theorem proving really muddies the waters around 5. It's only bad to use List.snoc in an actual algorithm—not in a proof about list behavior! We really do need another (hopefully last) priority:

  1. name things such that theorems about them are phrased nicely

(This splits into the type of the theorem itself being phrased nicely, and the theorem being named nicely.)

Thomas Murrills (Dec 06 2023 at 23:13):

Ultimately—after reading the 116 messages that appeared in this thread since I last looked at it :P—I think there are three paths here. In all of them, I think we should have a canonical, internally consistent name for each abstract operation (e.g. push for "adding to the end of something", something for "putting two things together in sequence", etc.) Then, the three paths are

  1. Only use the canonical name when that operation is "ok"—and then in individual, specific exceptions, deliberately introduce code smell when it's not a "good" operation (e.g. List.snoc instead of a 'canonical' List.push or whatever).
  2. Only use the canonical name when that operation is "ok", and leave individual, specific exceptions unnamed as definitions when it's not ok (e.g. just use (· ++ [·]). (This has two subvariants for lemma naming: (2.a) use the canonical name in lemma names about the operation E.g. foo_push_bar for a lemma involving (· ++ [·]); (2.b) describe the operation faithfully (e.g. append_singleton/append_solo etc.)
  3. Always use the canonical name, but signal loudly when they're "bad" in the docstring. (After all, new users, who as I understand it are the main group we'd be protecting via code smell, are probably looking at hovers to figure out what things are anyway.)

(There's also a possibly-over-engineered 4th path which I'm only half-seriously suggesting: introduce a @[discouraged] or @[formal] attribute on definitions like e.g. List.push which are ok in theorems but shouldn't be in runnable code, and lint against it when inside defs (but not in theorems). Code smell → code warning lights!)

We can, of course, choose any of these paths as a guideline, then deviate when there are special circumstances (for example, some operation is really special and deserves its own name for other reasons, like cons itself). But it might be helpful if we at least chose one. (Are there any concerns not addressed by at least one of these paths?)

Joe Hendrix (Dec 07 2023 at 01:51):

@Thomas Murrills That seems like a reasonable summary. I am in favor of consistent naming across datatypes (i.e., 3) when feasible.

I think that reduces the cognitive load for learning new APIs, and makes it easier to migrate code from one type to another for performance or other reasons. For example, maybe somebody starts with List because it is simpler and then migrates to Array for one of their types. It's a lot easier to do that if you don't have to rename all your function calls.

Joe Hendrix (Dec 07 2023 at 01:52):

I'd also note that Array.push is also O(n) if the array is shared, and that's pretty easy to make that mistake if one needs to, for example, append to an array stored inside a larger data structure.

Mario Carneiro (Dec 07 2023 at 01:53):

Yep, I have discovered that any sufficiently large lean program contains a fair number of sharing bugs


Last updated: Dec 20 2023 at 11:08 UTC