Zulip Chat Archive

Stream: general

Topic: name for `erase_dup`


Eric Rodriguez (Feb 14 2022 at 15:53):

I'm currently defining list.chain_dedup' in #11934, and there's been a lot of discussion about the name. There's almost two separate things:

Eric Rodriguez (Feb 14 2022 at 15:53):

/poll What should the function [1,1,1,2,3,3] -> [1,2,3] be called?
norep
chain_dedup
squeeze
uniq
erase_rep

Eric Rodriguez (Feb 14 2022 at 15:54):

But secondly, we currently have chain and chain', with chain a l talking about a :: l whilst chain' l talks about the list itself. I understand that chain a l may be the more sensible CS notion, but as a mathematician I see zero sense in it and would much rather talk about chain' l almost exclusively. I propose we swap these names; is there people against this?

Yaël Dillies (Feb 14 2022 at 15:55):

I find myself mostly using chain, actually, so I'm not sure the argument holds.

Eric Rodriguez (Feb 14 2022 at 15:57):

How come?

Yaël Dillies (Feb 14 2022 at 15:59):

I don't actually know. I think it's because the empty list is pathological where I use it, so using chain is a nice sweep-under-the-rug way to forget about it. When I have a chain', the first thing I do is to case on the list to get back to chain.

Kyle Miller (Feb 14 2022 at 16:04):

If you're able to modify the poll question, you might want to make it clear this function is only removing consecutive repeats, so [2, 2, 3, 3, 2] -> [2, 3, 2]

Kyle Miller (Feb 14 2022 at 16:14):

@Yaël Dillies Here's another argument: chain seems to be the auxiliary definition for chain'. It's asymmetric, giving you an argument for the pre-head (why not the post-tail?)

Joachim Breitner (Feb 14 2022 at 17:10):

A term for consecutive repeated elements of a sequence is “suttering”, I believe, e.g. in state machine machines. I hence added erase_stutter to the list of options

Yaël Dillies (Feb 14 2022 at 17:11):

Obscure, but I like it!

Joachim Breitner (Feb 14 2022 at 17:11):

There is precedence in the Software Foundations book that teaches ITP with Coq: https://github.com/bishboria/software-foundations/blob/master/MoreLogic.v#L161

Kyle Miller (Feb 14 2022 at 17:25):

Collecting rationales for some of the names:

  • norep came from a Haskell library
  • uniq came from a Haskell library and Unix
  • chain_dedup deduplicates elements in the chainward direction
  • erase_rep mirrors erase_dup, where it seems plausible that there could be a distinction between repetitions and duplications
  • erase_consecutive_dup is a clear modification of erase_dup
  • erase_stutter evokes localized repetition, which is a fairly clear metaphor
  • squeeze evokes physically compressing a list, which is less clear

Yakov Pechersky (Feb 14 2022 at 17:26):

uniq is also how unix uniq works.

Yakov Pechersky (Feb 14 2022 at 17:42):

If we're making an erase_rep, can that be the prod.fst map out of a "counter" function that we write? Otherwise, we'll want that counter later anyway.

Yaël Dillies (Feb 14 2022 at 17:45):

Oh, I read erase_rep as "erase representative".

Ruben Van de Velde (Feb 14 2022 at 17:46):

erase_rpt?

Yaël Dillies (Feb 14 2022 at 17:47):

erase_repeat?

Junyan Xu (Feb 15 2022 at 00:20):

Googling destutter seems to return a reasonable amount of results, so I added that. By the way,

erase_dup l removes duplicates from l (taking only the first occurrence). Defined as pw_filter ().
erase_dup [1, 0, 2, 2, 1] = [0, 2, 1]

docs#list.erase_dup
Why does this not return [1,0,2] ?

Mario Carneiro (Feb 15 2022 at 00:56):

The doc is lying, it works from right to left

Eric Rodriguez (Feb 15 2022 at 13:43):

I ended up going for destutter, now that I finished generalizing to an arbitrary relation. erase_stutter felt a bit too long. I also swapped the 's compared to chain', now that it doesn't mention it in the name. (And I think that this time, destutter really is the more correct notion 99% of the time - for example, you only have (l.destutter' R a).chain R a if R is reflexive, which is often not the case for these usecases).

Eric Wieser (Feb 15 2022 at 13:44):

It seems like we should rename erase_dup to dedup if we're going to call it destutter

Eric Rodriguez (Feb 15 2022 at 13:47):

Is dedup currently the property of being equal to your own erase_dup?

Eric Rodriguez (Feb 15 2022 at 13:48):

Also, what's the difference between pw_filter (≠) and erase_dup? Maybe that's the correct way, just use the more general

Yaël Dillies (Feb 15 2022 at 13:49):

No, that's nodup

Yaël Dillies (Feb 15 2022 at 13:50):

docs#list.erase_dup is defined as pw_filter (≠)

Eric Rodriguez (Feb 15 2022 at 13:53):

why do we keep it around then?

Yaël Dillies (Feb 15 2022 at 13:53):

because it's very common?

Eric Rodriguez (Feb 15 2022 at 15:03):

Eric Wieser said:

It seems like we should rename erase_dup to dedup if we're going to call it destutter

#12057


Last updated: Dec 20 2023 at 11:08 UTC