Zulip Chat Archive
Stream: general
Topic: name for `erase_dup`
Eric Rodriguez (Feb 14 2022 at 15:53):
I'm currently defining list.chain_dedup' in #11934, and there's been a lot of discussion about the name. There's almost two separate things:
Eric Rodriguez (Feb 14 2022 at 15:53):
/poll What should the function [1,1,1,2,3,3] -> [1,2,3]
be called?
norep
chain_dedup
squeeze
uniq
erase_rep
Eric Rodriguez (Feb 14 2022 at 15:54):
But secondly, we currently have chain
and chain'
, with chain a l
talking about a :: l
whilst chain' l
talks about the list itself. I understand that chain a l
may be the more sensible CS notion, but as a mathematician I see zero sense in it and would much rather talk about chain' l
almost exclusively. I propose we swap these names; is there people against this?
Yaël Dillies (Feb 14 2022 at 15:55):
I find myself mostly using chain
, actually, so I'm not sure the argument holds.
Eric Rodriguez (Feb 14 2022 at 15:57):
How come?
Yaël Dillies (Feb 14 2022 at 15:59):
I don't actually know. I think it's because the empty list is pathological where I use it, so using chain
is a nice sweep-under-the-rug way to forget about it. When I have a chain'
, the first thing I do is to case on the list to get back to chain
.
Kyle Miller (Feb 14 2022 at 16:04):
If you're able to modify the poll question, you might want to make it clear this function is only removing consecutive repeats, so [2, 2, 3, 3, 2] -> [2, 3, 2]
Kyle Miller (Feb 14 2022 at 16:14):
@Yaël Dillies Here's another argument: chain
seems to be the auxiliary definition for chain'
. It's asymmetric, giving you an argument for the pre-head (why not the post-tail?)
Joachim Breitner (Feb 14 2022 at 17:10):
A term for consecutive repeated elements of a sequence is “suttering”, I believe, e.g. in state machine machines. I hence added erase_stutter
to the list of options
Yaël Dillies (Feb 14 2022 at 17:11):
Obscure, but I like it!
Joachim Breitner (Feb 14 2022 at 17:11):
There is precedence in the Software Foundations book that teaches ITP with Coq: https://github.com/bishboria/software-foundations/blob/master/MoreLogic.v#L161
Kyle Miller (Feb 14 2022 at 17:25):
Collecting rationales for some of the names:
norep
came from a Haskell libraryuniq
came from a Haskell library and Unixchain_dedup
deduplicates elements in the chainward directionerase_rep
mirrorserase_dup
, where it seems plausible that there could be a distinction between repetitions and duplicationserase_consecutive_dup
is a clear modification oferase_dup
erase_stutter
evokes localized repetition, which is a fairly clear metaphorsqueeze
evokes physically compressing a list, which is less clear
Yakov Pechersky (Feb 14 2022 at 17:26):
uniq is also how unix uniq works.
Yakov Pechersky (Feb 14 2022 at 17:42):
If we're making an erase_rep, can that be the prod.fst map out of a "counter" function that we write? Otherwise, we'll want that counter later anyway.
Yaël Dillies (Feb 14 2022 at 17:45):
Oh, I read erase_rep
as "erase representative".
Ruben Van de Velde (Feb 14 2022 at 17:46):
erase_rpt
?
Yaël Dillies (Feb 14 2022 at 17:47):
erase_repeat
?
Junyan Xu (Feb 15 2022 at 00:20):
Googling destutter
seems to return a reasonable amount of results, so I added that. By the way,
erase_dup l removes duplicates from l (taking only the first occurrence). Defined as pw_filter (≠).
erase_dup [1, 0, 2, 2, 1] = [0, 2, 1]
docs#list.erase_dup
Why does this not return [1,0,2] ?
Mario Carneiro (Feb 15 2022 at 00:56):
The doc is lying, it works from right to left
Eric Rodriguez (Feb 15 2022 at 13:43):
I ended up going for destutter
, now that I finished generalizing to an arbitrary relation. erase_stutter
felt a bit too long. I also swapped the '
s compared to chain'
, now that it doesn't mention it in the name. (And I think that this time, destutter
really is the more correct notion 99% of the time - for example, you only have (l.destutter' R a).chain R a
if R
is reflexive, which is often not the case for these usecases).
Eric Wieser (Feb 15 2022 at 13:44):
It seems like we should rename erase_dup
to dedup
if we're going to call it destutter
Eric Rodriguez (Feb 15 2022 at 13:47):
Is dedup currently the property of being equal to your own erase_dup?
Eric Rodriguez (Feb 15 2022 at 13:48):
Also, what's the difference between pw_filter (≠) and erase_dup? Maybe that's the correct way, just use the more general
Yaël Dillies (Feb 15 2022 at 13:49):
No, that's nodup
Yaël Dillies (Feb 15 2022 at 13:50):
docs#list.erase_dup is defined as pw_filter (≠)
Eric Rodriguez (Feb 15 2022 at 13:53):
why do we keep it around then?
Yaël Dillies (Feb 15 2022 at 13:53):
because it's very common?
Eric Rodriguez (Feb 15 2022 at 15:03):
Eric Wieser said:
It seems like we should rename
erase_dup
todedup
if we're going to call itdestutter
Last updated: Dec 20 2023 at 11:08 UTC