Zulip Chat Archive

Stream: batteries

Topic: Where should List.mem_eraseDup and List.mem_eraseDups l...


Nicolas Rouquette (Dec 23 2025 at 01:14):

I've proven membership lemmas for eraseDup (Batteries) and eraseDups (Lean core), analogous to Mathlib's mem_dedup. The proofs currently depend on Mathlib (LawfulBEqpwFilter_cons_of_pos/neg).

See this gist: https://gist.github.com/NicolasRouquette/3f05e55649926b2ba47d30af3b36c316

It is unclear to me where such proofs belong.

Options:

  1. Batteries - since eraseDup is defined there
  2. Mathlib - since proofs use Mathlib lemmas
  3. Lean core - for mem_eraseDups at least, since eraseDups is in core

My take:

  1. Batteries could accept it directly - If the proof can be rewritten using only Batteries lemmas, it could go into Batteries without creating a Mathlib dependency
  2. Faster builds - Batteries-only code compiles much faster than Mathlib-dependent code
  3. Broader usability - Projects not using Mathlib could still use these lemmas

Good News: Most lemmas ARE in Batteries!

  • :check: pwFilter_cons_of_pos - In [\leanprover-community\batteries\tree\main\Batteries\Data\List\Pairwise.lean](vscode-file://vscode-app/c:/Users/nfr/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
  • :check: pwFilter_cons_of_neg - In [\leanprover-community\batteries\tree\main\Batteries\Data\List\Pairwise.lean](vscode-file://vscode-app/c:/Users/nfr/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
  • :check: pwFilter_subset - In [\leanprover-community\batteries\tree\main\Batteries\Data\List\Pairwise.lean](vscode-file://vscode-app/c:/Users/nfr/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
  • :check: LawfulBEq - In Lean core / Std

The Real Mathlib Dependencies:

  1. push_neg tactic - This is from Mathlib. Used to transform ¬∀ y ∈ xs.eraseDup, (a != y) = true into ∃ y ∈ xs.eraseDup, ¬(a != y) = true

  2. rcases / obtain - These are Mathlib tactics for pattern matching

  3. bne_iff_ne - (@Jakub Nowak pointed that this is in core)

Thoughts?

Jakub Nowak (Dec 23 2025 at 01:23):

bne_iff_ne is in core.

Jakub Nowak (Dec 23 2025 at 01:42):

rcases and obtain are also in core.

Vlad Tsyrklevich (Dec 23 2025 at 10:30):

(deleted) used a mathlib lemma

Vlad Tsyrklevich (Dec 23 2025 at 10:38):

The entire arm using push_neg can be replaced by simp_all [eraseDup, pwFilter_cons_of_neg hball]

Nicolas Rouquette (Dec 23 2025 at 16:53):

Vlad Tsyrklevich said:

The entire arm using push_neg can be replaced by simp_all [eraseDup, pwFilter_cons_of_neg hball]

Thanks Vlad!

I updated the gist so that it only depends on Batteries.
There's additional simplification possible that I'm looking into.

Nicolas Rouquette (Dec 23 2025 at 18:29):

I have simplified the proofs and separated potential contributions to Batteries and Std in the revised gist:

https://gist.github.com/NicolasRouquette/3f05e55649926b2ba47d30af3b36c316

Are these in a reasonable shape to submit issues & PRs?

Robin Arnez (Dec 23 2025 at 21:50):

I'll note that the first one in ToStd_EraseDups.lean can also be proven by:

theorem List.mem_eraseDupsBy_loop [BEq α] [LawfulBEq α] {a : α} {l acc : List α} :
    a  eraseDupsBy.loop (· == ·) l acc  a  l  a  acc := by
  fun_induction eraseDupsBy.loop with grind

grind is seriously powerful!

Nicolas Rouquette (Dec 23 2025 at 23:49):

Wow! Thanks @Robin Arnez ! The proofs are elegant now.

https://gist.github.com/NicolasRouquette/3f05e55649926b2ba47d30af3b36c316#file-tostd_erasedups-lean

Nicolas Rouquette (Dec 24 2025 at 00:34):

Thanks, everyone, for your helpful suggestions.

I wrote two RFCs from the much simplified gist.

https://github.com/leanprover/lean4/issues/11786
This is about: https://gist.github.com/NicolasRouquette/3f05e55649926b2ba47d30af3b36c316#file-tostd_erasedups-lean

https://github.com/leanprover-community/batteries/issues/1572
This is about: https://gist.github.com/NicolasRouquette/3f05e55649926b2ba47d30af3b36c316#file-tobatteries_erasedup-lean

Robin Arnez (Dec 24 2025 at 10:52):

Do you want to make the PRs?

Nicolas Rouquette (Dec 25 2025 at 02:02):

Ok I’ll prepare PRs

Yury G. Kudryashov (Dec 25 2025 at 05:34):

Do we have a reason to have both definitions, or one of them should be deprecated?

Robin Arnez (Dec 25 2025 at 19:43):

I guess if we were to deprecate one it would be eraseDup? That one has no lemmas about it. Also: as.eraseDup = as.reverse.eraseDups.reverse:

Proof

Nicolas Rouquette (Dec 27 2025 at 06:05):

FYI: for this issue: https://github.com/leanprover/lean4/issues/11786
I submitted a PR: https://github.com/leanprover/lean4/pull/11811

Unfortunately, I could not use @Robin Arnez 's concise proof because grind isn't available in the context of src/Init/Data but I managed to work around this (thanks to Claude Sonnet 4.5 & Github Copilot).

Nicolas Rouquette (Dec 27 2025 at 19:16):

Although List.eraseDup is not used anywhere in Batteries, cslib, or Mathlib currently, there are several legitimate arguments for keeping it thanks to help from Github Copilot + Claude Opus 4.5:

  1. Different semanticseraseDup and eraseDups produce different results — they preserve different occurrences:

    • eraseDups [1, 0, 2, 2, 1] = [1, 0, 2] (keeps first occurrence)
    • eraseDup [1, 0, 2, 2, 1] = [0, 2, 1] (keeps last occurrence)

    These are not interchangeable! Formalizations may specifically require "last occurrence" semantics.

  2. Proof architecture: Being defined as pwFilter (· != ·)eraseDup inherits all pwFilter lemmas for free. Proofs follow natural structural induction without needing loop invariants for accumulators. For example, mem_eraseDup uses pwFilter_cons_of_pos/neg directly, while mem_eraseDups requires reasoning about eraseDupsBy.loop's accumulator.

  3. ComposabilitypwFilter is a general-purpose "maximal pairwise sublist" combinator. Having eraseDup as a named specialization:

    • Makes proofs about deduplication more readable
    • Allows leveraging theorems like pairwise_pwFilterpwFilter_eq_selfpwFilter_idemforall_mem_pwFilter directly
    • Connects to Nodup via Pairwise (≠)
  4. @Robin Arnez's equivalence theorem demonstrates the relationship between these implementations, but they serve different purposes — efficiency vs. proof ergonomics.

The fact that it's unused today doesn't mean it won't be useful. Having both functions provides flexibility for formalizations where the choice of which duplicate to keep matters, or where the pwFilter-based proof structure is more natural.

Nicolas Rouquette (Dec 27 2025 at 20:05):

FYI: for this issue: https://github.com/leanprover-community/batteries/issues/1572
I submitted a PR: https://github.com/leanprover-community/batteries/pull/1580

It includes @Robin Arnez 's equivalence theorem and the one from my Gist.

Yury G. Kudryashov (Dec 27 2025 at 22:46):

Note that both docstrings claim to preserve the first element, so at least, we should fix the wrong one.

Yury G. Kudryashov (Dec 27 2025 at 22:47):

Also, if we're going to keep both, then we should find better names that tell the users what's the difference.

Nicolas Rouquette (Dec 28 2025 at 01:41):

Thanks, @Yury G. Kudryashov! I've applied your suggestion and realized my PR was missing @Robin Arnez's equivalence theorem. The PR is now updated.

Nicolas Rouquette (Dec 28 2025 at 01:49):

Regarding the names, I kept the naming conventions -- these are theorems about list membership & operations.
Instead, I added a descriptive docstring. Does this adequately address your naming concern @Yury G. Kudryashov ?

Yury G. Kudryashov (Dec 28 2025 at 05:32):

I'm talking about renaming the definitions, not theorems. I think that having two definitions that differ by s at the end is a footgun.

Yury G. Kudryashov (Dec 28 2025 at 05:33):

But I'm neither a Lean core maintainer, nor a Batteries maintainer. You should ask them.

Nicolas Rouquette (Dec 28 2025 at 22:46):

The naming collision between eraseDup (last occurrence) and eraseDups (first occurrence) is indeed problematic.

How about the following to prevent confusion with Lean's List.eraseDups?

/-- `eraseDupLast l` removes duplicates from `l` (keeping the last occurrence).
Defined as `pwFilter (≠)`.

    eraseDupLast [1, 0, 2, 2, 1] = [0, 2, 1] -/
@[inline] def eraseDupLast [BEq α] : List α  List α := pwFilter (· != ·)

@[deprecated eraseDupLast (since := "2025-12-28")]
def eraseDup [BEq α] : List α  List α := eraseDupLast

Yury G. Kudryashov (Dec 28 2025 at 22:48):

I'm moving this discussion from #mathlib4 to #batteries.

Notification Bot (Dec 28 2025 at 22:48):

This topic was moved here from #mathlib4 > Where should List.mem_eraseDup and List.mem_eraseDups l... by Yury G. Kudryashov.

Yury G. Kudryashov (Dec 28 2025 at 23:04):

This thread is about modifying Batteries and/or core Lean, not Mathlib, so I think this is a better place for the discussion.

Yaël Dillies (Dec 29 2025 at 06:32):

eraseLastDup seems more grammatical than eraseDupLast. Otherwise, I agree!

Yury G. Kudryashov (Dec 29 2025 at 07:19):

For me, eraseLastDup looks like "erase the last duplicate", not "erase all duplicates but the last one".

François G. Dorais (Dec 29 2025 at 16:41):

I'm fully in holiday mayhem, so I won't act on the batteries PR until January. However, my current leaning is to deprecate the Batteries version and just keep the core version. (Reversing lists is relatively cheap, so there is minimal gain in duplication. Note that reversing arrays is not at all as cheap, but that's not relevant.)

Although both the core PR and batteries PR were very verbose, they failed to mention each other. This is bad since it could lead to the worst outcome: both core and batteries deprecating the functions , each expecting the other to keep it.

Nicolas Rouquette (Dec 29 2025 at 18:09):

Thanks @François G. Dorais for taking the time to comment on this discussion.

I updated both PRs to refer to this relocated thread, link the PRs with each other, and comment on the deprecation, if any, of Batteries' List.eraseDup given its low usage compared to Lean's List.eraseDups.

Nicolas Rouquette (Dec 30 2025 at 23:15):

Thanks for the discussion! I'd like to summarize what I see as the two key decisions, and I'd appreciate guidance on how the community typically handles these:

1) Does it make sense to keep Batteries' eraseDup?

This function originates from this commit.

I shared my thoughts on keeping it here, but I'm curious what the usual process is for deciding whether to deprecate vs. keep such functions?

2) If we do keep it, how should we rename it to avoid the footgun?

Since the key difference is which duplicate to keep, perhaps names that make this explicit would help?

For example:

  • List.eraseDupsKeepLast (current Batteries List.eraseDup)
  • List.eraseDupsKeepFirst (current Lean List.eraseDups)

Just a suggestion — happy to hear other ideas!

Robin Arnez (Dec 31 2025 at 00:21):

Actually, it originates from this commit from 8 years ago at which point it also had a lot more API including a rather short proof of mem_erase_dup. Then, 6 years ago, core lean 4 independently got a definition of eraseDups in this commit. After that, the definition of erase_dup got ported to mathlib4 4 years ago in #59 but without porting any lemmas. In February of 2022, erase_dup then got renamed to dedup in mathlib3#12057 and finally in December of the same year, the definition of dedup with the corresponding lemmas was ported to mathlib4 in #803. The lemmas about eraseDups actually came a while later in April of this year in #8148.
In short, the fact that we got docs#List.eraseDup and docs#List.dedup is a miscommunication / porting mistake while eraseDups was added independently before any ports to mathlib4 happened.
So we should propably properly replace eraseDup with the definition of dedup from mathlib. I'm not exactly sure what to do for eraseDups vs dedup though.

Jakub Nowak (Jan 01 2026 at 16:21):

Is the difference in order between eraseDups and dedup that important to necessitate having both?

Yury G. Kudryashov (Jan 01 2026 at 20:03):

It looks like @François G. Dorais suggests we drop the Batteries version (probably, together with docs#List.dedup).

François G. Dorais (Jan 03 2026 at 19:56):

Yes, but the API should be merged with no loss on all three sides.

The ball is in core's court now.

Jakub Nowak (Jan 03 2026 at 20:10):

We can remove mathlib's definition of Lean.dedup but keep verification API in mathlib, but with dedup replaced with core's Lean.eraseDups (this would require remaking proofs).

Yury G. Kudryashov (Jan 03 2026 at 20:11):

We can deprecate the other two definitions and translate the relevant lemmas to use the core def. Then core team can upstream these lemmas on their own schedule. WDYT?

François G. Dorais (Jan 03 2026 at 20:39):

I can make a RFC in core to enshrine this as a MOU between all three parties to make sure that actions are properly coordinated and have a common goal.

AFAICT there are no uses of Batteries' eraseDup in Mathlib, so I will make a deprecation PR.

François G. Dorais (Jan 04 2026 at 02:36):

There are no visible uses of Batteries' List.eraseDup and there is no associated API, so I just deprecated it.


Last updated: Feb 28 2026 at 14:05 UTC