Zulip Chat Archive
Stream: general
Topic: name for `erase_dup`
Eric Rodriguez (Feb 14 2022 at 15:53):
I'm currently defining list.chain_dedup' in #11934, and there's been a lot of discussion about the name. There's almost two separate things:
Eric Rodriguez (Feb 14 2022 at 15:53):
/poll What should the function [1,1,1,2,3,3] -> [1,2,3] be called?
norep
chain_dedup
squeeze
uniq
erase_rep
Eric Rodriguez (Feb 14 2022 at 15:54):
But secondly, we currently have chain and chain', with chain a l talking about a :: l whilst chain' l talks about the list itself. I understand that chain a l may be the more sensible CS notion, but as a mathematician I see zero sense in it and would much rather talk about chain' l almost exclusively. I propose we swap these names; is there people against this?
Yaël Dillies (Feb 14 2022 at 15:55):
I find myself mostly using chain, actually, so I'm not sure the argument holds.
Eric Rodriguez (Feb 14 2022 at 15:57):
How come?
Yaël Dillies (Feb 14 2022 at 15:59):
I don't actually know. I think it's because the empty list is pathological where I use it, so using chain is a nice sweep-under-the-rug way to forget about it. When I have a chain', the first thing I do is to case on the list to get back to chain.
Kyle Miller (Feb 14 2022 at 16:04):
If you're able to modify the poll question, you might want to make it clear this function is only removing consecutive repeats, so [2, 2, 3, 3, 2] -> [2, 3, 2]
Kyle Miller (Feb 14 2022 at 16:14):
@Yaël Dillies Here's another argument: chain seems to be the auxiliary definition for chain'. It's asymmetric, giving you an argument for the pre-head (why not the post-tail?)
Joachim Breitner (Feb 14 2022 at 17:10):
A term for consecutive repeated elements of a sequence is “suttering”, I believe, e.g. in state machine machines. I hence added erase_stutter to the list of options
Yaël Dillies (Feb 14 2022 at 17:11):
Obscure, but I like it!
Joachim Breitner (Feb 14 2022 at 17:11):
There is precedence in the Software Foundations book that teaches ITP with Coq: https://github.com/bishboria/software-foundations/blob/master/MoreLogic.v#L161
Kyle Miller (Feb 14 2022 at 17:25):
Collecting rationales for some of the names:
norepcame from a Haskell libraryuniqcame from a Haskell library and Unixchain_dedupdeduplicates elements in the chainward directionerase_repmirrorserase_dup, where it seems plausible that there could be a distinction between repetitions and duplicationserase_consecutive_dupis a clear modification oferase_duperase_stutterevokes localized repetition, which is a fairly clear metaphorsqueezeevokes physically compressing a list, which is less clear
Yakov Pechersky (Feb 14 2022 at 17:26):
uniq is also how unix uniq works.
Yakov Pechersky (Feb 14 2022 at 17:42):
If we're making an erase_rep, can that be the prod.fst map out of a "counter" function that we write? Otherwise, we'll want that counter later anyway.
Yaël Dillies (Feb 14 2022 at 17:45):
Oh, I read erase_rep as "erase representative".
Ruben Van de Velde (Feb 14 2022 at 17:46):
erase_rpt?
Yaël Dillies (Feb 14 2022 at 17:47):
erase_repeat?
Junyan Xu (Feb 15 2022 at 00:20):
Googling destutter seems to return a reasonable amount of results, so I added that. By the way,
erase_dup l removes duplicates from l (taking only the first occurrence). Defined as pw_filter (≠).
erase_dup [1, 0, 2, 2, 1] = [0, 2, 1]
docs#list.erase_dup
Why does this not return [1,0,2] ?
Mario Carneiro (Feb 15 2022 at 00:56):
The doc is lying, it works from right to left
Eric Rodriguez (Feb 15 2022 at 13:43):
I ended up going for destutter, now that I finished generalizing to an arbitrary relation. erase_stutter felt a bit too long. I also swapped the 's compared to chain', now that it doesn't mention it in the name. (And I think that this time, destutter really is the more correct notion 99% of the time - for example, you only have (l.destutter' R a).chain R a if R is reflexive, which is often not the case for these usecases).
Eric Wieser (Feb 15 2022 at 13:44):
It seems like we should rename erase_dup to dedup if we're going to call it destutter
Eric Rodriguez (Feb 15 2022 at 13:47):
Is dedup currently the property of being equal to your own erase_dup?
Eric Rodriguez (Feb 15 2022 at 13:48):
Also, what's the difference between pw_filter (≠) and erase_dup? Maybe that's the correct way, just use the more general
Yaël Dillies (Feb 15 2022 at 13:49):
No, that's nodup
Yaël Dillies (Feb 15 2022 at 13:50):
docs#list.erase_dup is defined as pw_filter (≠)
Eric Rodriguez (Feb 15 2022 at 13:53):
why do we keep it around then?
Yaël Dillies (Feb 15 2022 at 13:53):
because it's very common?
Eric Rodriguez (Feb 15 2022 at 15:03):
Eric Wieser said:
It seems like we should rename
erase_duptodedupif we're going to call itdestutter
Last updated: May 02 2025 at 03:31 UTC