Zulip Chat Archive

Stream: batteries

Topic: eraseDup keeps last, not first occurrence?


Malvin Gattinger (Nov 05 2024 at 13:42):

The documentation says that eraseDup l removes duplicates from l (taking only the first occurrence). (emphasis mine)

I think it should say the last there, if the example below is correct?

import Batteries.Data.List.Basic

#eval List.eraseDup [1, 0, 2, 2, 1] -- [0, 2, 1]

#print List.eraseDup -- List.pwFilter (· ≠ ·)

#eval List.pwFilter (·  ·) [1, 0, 2, 2, 1] -- [0, 2, 1]

#print List.pwFilter -- List.foldr (fun x IH => if ∀ (y : α), y ∈ IH → R x y then x :: IH else IH) [] l

-- Silly example to check what happens.
inductive MagicStuff : Type where
  | mk : Nat  Nat  MagicStuff
deriving Repr

instance : BEq MagicStuff := fun x,_⟩ y,_⟩ => x == y

def magicList : List MagicStuff := [ 1,1 , 0,1, 2,1, 2,2, 1,2 ]

#eval List.eraseDup magicList -- [MagicStuff.mk 0 1, MagicStuff.mk 2 2, MagicStuff.mk 1 2]
-- The `2`s here mean that eraseDup keeps the **last** occurrence, right?

Kim Morrison (Nov 05 2024 at 21:36):

Yes, you're right. A documentation fix would be welcome, but perhaps we can leave to @Mario Carneiro what they think should be done regarding possibly changing the definitions.

Mario Carneiro (Nov 05 2024 at 21:58):

I think a documentation fix would be the easiest thing, these definitions are used not insignificantly for setting up finsets

Mario Carneiro (Nov 05 2024 at 21:58):

I wrote the docstrings exactly because I would always get mixed up about these kind of details

Mario Carneiro (Nov 05 2024 at 21:59):

I guess it is the first, working from right-to-left :upside_down:

François G. Dorais (Nov 09 2024 at 03:45):

The thing is that foldr has better basic equations for find-like purposes.

Malvin Gattinger (Nov 11 2024 at 12:56):

I just realised that there is also docs#List.eraseDups (with s) in Init.Data.List.Basic which indeed does keep the first occurrence.

-- Silly example to check what happens.
inductive MagicStuff : Type where
  | mk : Nat  Nat  MagicStuff
deriving Repr
instance : BEq MagicStuff := fun x,_⟩ y,_⟩ => x == y
def magicList : List MagicStuff := [ 1,1 , 0,1, 2,1, 2,2, 1,2 ]

#eval List.eraseDups [1, 0, 2, 2, 1] -- [1, 0, 2]

#print List.eraseDups -- List.eraseDups.loop as []

#eval List.eraseDups magicList -- [MagicStuff.mk 1 1, MagicStuff.mk 0 1, MagicStuff.mk 2 1]

So maybe eraseDup should just be removed?

Malvin Gattinger (Nov 11 2024 at 12:58):

Mario Carneiro said:

I think a documentation fix would be the easiest thing, these definitions are used not insignificantly for setting up finsets

I tried to find uses of eraseDup and eraseDups elswhere in batteries and mathlib, but could not find many, and in particular not in the Finset files. And what I did find seems to use the function with s already anyway.

Kim Morrison (Nov 12 2024 at 02:49):

I'm not certain here. eraseDups (from Lean) is simpler, and more intuitively keeps the first occurrence. However eraseDup (from Batteries) is built on top of pwFilter, so in the long run there will be a better verification story (because it is connected to other things, and some lemmas can be shared).

Kim Morrison (Nov 12 2024 at 02:50):

Ideally we could sensibly change the definition of pwFilter, so eraseDup did in fact keep the first occurrence, and then eventually replace eraseDups with eraseDup.

Kim Morrison (Nov 12 2024 at 02:50):

Writing theory for eraseDups is not high on my priorities right now, but I'll eventually get there. If eraseDup can be fixed up in the meantime, that would be great.

Mario Carneiro (Nov 12 2024 at 06:43):

Also as @François G. Dorais said, eraseDup, being built on a right fold, has nicer equations for rewriting. Since the ordering is actually visible here it's not like we can prove right-fold lemmas for eraseDups either.

Mario Carneiro (Nov 12 2024 at 06:44):

However the names are obviously bad

Mario Carneiro (Nov 12 2024 at 06:48):

I think if you had a free choice of these two and didn't care whether the first or last duplicate was kept, and were in a proof context, you would prefer eraseDup and pwFilter. There are no similarly nice equations for a pwFilter' which takes the left-most occurrence other than either reversing the list first or having an accumulator.


Last updated: May 02 2025 at 03:31 UTC