Zulip Chat Archive

Stream: mathlib4

Topic: Where to put these List operations?


Thomas Murrills (Nov 15 2025 at 21:22):

I'd like to contribute these two short list operations--assuming they don't exist already in some form, and that I haven't just missed them. :)

  1. Do these belong in mathlib at all?
  2. If so, where should they go?
  3. Should they be named something else?
  4. Is it good etiquette to contribute them along with lemmas, either proving expectations about these or relating them to other things (e.g. rdropWhile)? Or should that come later?
  5. Variations: there are of course similar versions of these one can come up with to flesh out the API, usually to do with reversal or negation (lengthWhile or lengthUntil on the reversed list, insertBeforeAny, etc.); do those belong somewhere too, or do we try to minimize things like "reversed variants"?

Thanks. :)

/--
Finds the length of the smallest prefix of `l` that contains all elements `a` of `l` for which
`p a` is `true`. The result satisfies `0 ≤ result ≤ l.length`.

Examples:
* `[0, 1, 2, 1, 3].prefixLengthContaingAll (· == 1)` returns `4`
* `[5].prefixLengthContainingAll (· == 1)` returns `0`
* `[7, 8, 1].prefixLengthContaingAll (· == 1)` returns `3`
-/
def List.prefixLengthContainingAll {α} (p : α  Bool) (l : List α) : Nat :=
  go 0 0 l
where
  go (current lengthSoFar : Nat) : List α  Nat
    | [] => lengthSoFar
    | a :: l => go (current + 1) (if p a then current + 1 else lengthSoFar) l

/--
Inserts `a` immediately after the last element `x` in `l` for which `p x` is `true`. If there is no
element `x` in `l` for which `p x` is `true`, inserts `a` at the beginning of the list.

Examples:
* `[0, 1, 2, 1, 3].insertAfterAll (· == 1) a` returns `[0, 1, 2, 1, a, 3]`
* `[5].insertAfterAll (· == 1) a` returns `[a, 5]`
* `[1, 5, 1].insertAfterAll (· == 1) a` returns `[1, 5, 1, a]`
-/
def List.insertAfterAll {α} (p : α  Bool) (l : List α) (a : α) :=
  l.insertIdx (l.prefixLengthContainingAll p) a

Ruben Van de Velde (Nov 15 2025 at 21:54):

  1. They seem pretty niche, and using Bool is somewhat unexpected in mathlib
  2. Maybe they'd fit better in batteries
  3. No better suggestions
  4. If they would go into mathlib, I'd strongly encourage adding basic API lemmas

Snir Broshi (Nov 15 2025 at 21:59):

Can you clarify why you need these? Because as list operations these don't seem very useful, at least not as general list operations.

Here's the first function implemented with existing list operations:

def List.prefixLengthContainingAll {α} (p : α  Bool) (l : List α) : Nat :=
  l.length - (l.map p).reverse.idxOf true

I also think reusing existing definitions is better than writing a separate recursive function.

Is it good etiquette to contribute them along with lemmas, either proving expectations about these or relating them to other things (e.g. rdropWhile)? Or should that come later?

AFAIK mathlib won't accept new definitions without a few basic API lemmas

Thomas Murrills (Nov 15 2025 at 22:05):

Snir Broshi said:

Here's the first function implemented with existing list operations:

def List.prefixLengthContainingAll {α} (p : α  Bool) (l : List α) : Nat :=
  l.length - (l.map p).reverse.idxOf true

I also think reusing existing definitions is better than writing a separate recursive function.

I appreciate the motivation, but isn't this quite inefficient, traversing the list ~2-3 times in all? Ime most list definitions are written recursively, and using a go is quite standard, but maybe I'm most familiar with the ones from core...

These came up while reviewing Category*, where we want to insert a new level name into a list of existing level names after all level names in that list which satisfy a certain property (in that case, being contained in some other array of collected level names).

Thomas Murrills (Nov 15 2025 at 22:08):

To be clear I am happy to see if this can be rewritten! :) But imo it should only involve one traversal of the list to count, and two to insert.

Robin Arnez (Nov 16 2025 at 15:31):

Is the first one not equivalent to List.findIdx (fun a => !p a) l?

Robin Arnez (Nov 16 2025 at 15:33):

Wait no I misunderstood

Robin Arnez (Nov 16 2025 at 15:36):

Actually it would be more something like (List.findRevIdx? p l).elim 0 (· + 1) (if we had findRevIdx?)

Kim Morrison (Nov 16 2025 at 22:03):

I'm pretty skeptical about putting operations like this on List. Just converting to Array early on and working there seems both more ergonomic, and possibly more efficient anyway (e.g., if you start from the back, you need to evaluate p less often).

Thomas Murrills (Nov 16 2025 at 22:11):

How cheap is converting to Array and back? It obviously makes sense if there is an “early on”, but what if you’re forced by circumstance to receive, manipulate, and send a List many times? Is converting to Array not as big a deal as my intuition makes it seem? :grinning_face_with_smiling_eyes:

(I guess these might be micro optimizations, but it still helps build some intuition for more general cases. I always feel on slightly shaky ground around performance…)

François G. Dorais (Nov 16 2025 at 22:23):

FWIW, I'm in a slow, long-term process to upstream, cleanup and upgrade selected parts Mathlib.Data to Batteries.Data. However, my selection process is focused on Batteries: "Would this be useful in Batteries?" is what I ask myself. That being said, I'm open to recommendations if anything seems it would be suitable for Batteries. Batteries will also consider upstreaming PRs but beware that these are sometimes tricky to get right.

Snir Broshi (Nov 17 2025 at 00:51):

I think adding reversed versions of idxOf and findIdx is useful, and they exist in other languages

Alfredo Moreira-Rosa (Nov 17 2025 at 21:16):

Maybe a litle out of topic question, but how do you decide to upstream to batteries and not core Std ? what should be the mental questions one should ask themselves to decide ? Is there a clear guideline ?

Kim Morrison (Nov 17 2025 at 23:18):

Snir Broshi said:

I think adding reversed versions of idxOf and findIdx is useful, and they exist in other languages

It would be helpful to have reference to these, specifically on singly linked lists.

Remember that we don't want to "complete the API" with operations that are not "natural" to types because of performance patterns. Just because some function exists on Array does not mean that it should exist on List.

Kim Morrison (Nov 17 2025 at 23:21):

Alfredo Moreira-Rosa said:

Maybe a litle out of topic question, but how do you decide to upstream to batteries and not core Std ? what should be the mental questions one should ask themselves to decide ? Is there a clear guideline ?

Mostly this is "Do Markus and Kim want it? If yes, then Std. Does François want it? If yes, then Batteries."

Std is also gradually converging towards "completeness", i.e. we only want operations with complete lemma APIs, whereas to my knowledge Batteries doesn't have the aspiration.

Robin Arnez (Nov 17 2025 at 23:22):

Alright, maybe we don't need something like findIdxRev? on lists but it seems like it's missing on arrays too

Kim Morrison (Nov 17 2025 at 23:24):

findIdxRev? on arrays would definitely be appreciated!

Kim Morrison (Nov 17 2025 at 23:24):

Note actually that the find family of functions still doesn't have as complete API as it should: contributions welcome there as well.

Wrenna Robson (Nov 18 2025 at 12:34):

Yeah I have been doing a bunch of work there recently.

Wrenna Robson (Nov 18 2025 at 12:36):

I note that List.idxsOf sort of gives you what you'd want from a findIdxRev? in the sense that (l.idxsOf a).getLast? is precisely findIdxRev?, and this has the advantage that idxsOf is natural on List (because you traverse left-to-right when finding them - if you were defining findIdxRev? you'd also have to traverse the whole list because List does not naturally go right-to-left.

Wrenna Robson (Nov 18 2025 at 12:39):

(I have a currently open PR to give idxsOf better API which you can find here: batteries#1500)

Wrenna Robson (Nov 18 2025 at 12:40):

And indeed I intend to get to defining it on Arrays also.

Wrenna Robson (Nov 18 2025 at 12:40):

What is the cost of List-to-Array? Is it O(l.length) (i.e. you traverse the list and load in all the elements once)?

Wrenna Robson (Nov 18 2025 at 12:41):

My understanding is that after that, computationally speaking Array doesn't have a preference in terms of how it is accessed? But adding elements is expensive on the left and cheap on the right?

Wrenna Robson (Nov 18 2025 at 12:41):

(Wheras for List it's cheap on the left and expensive on the right?)

Bhavik Mehta (Nov 18 2025 at 12:58):

Wrenna Robson said:

What is the cost of List-to-Array? Is it O(l.length) (i.e. you traverse the list and load in all the elements once)?

The docstring here https://leanprover-community.github.io/mathlib4_docs/Init/Prelude.html#List.toArray answers: yes, it's linear in the length

Wrenna Robson (Nov 18 2025 at 14:17):

My further question is "why isn't List.toArrayImpl xs implemented as xs.foldl Array.push Array.empty"?

Henrik Böving (Nov 18 2025 at 14:40):

Wrenna Robson said:

My further question is "why isn't List.toArrayImpl xs implemented as xs.foldl Array.push Array.empty"?

The original code for List.toArrayImpl stems from Init.Prelude where it used to be in a position where List.foldl wasn't defined yet. Though not much point to change it now.

Wrenna Robson (Nov 18 2025 at 14:51):

Aye - well, I would argue for it on aesthetic grounds.

Wrenna Robson (Nov 18 2025 at 14:53):

Is there a csimp theorem that links toArrayImpl to toArray or is it assumed?

Henrik Böving (Nov 18 2025 at 15:34):

https://github.com/leanprover/lean4/blob/master/src/Init/Data/List/ToArrayImpl.lean#L32-L33 the documentation here explains how the linking works

Henrik Böving (Nov 18 2025 at 15:35):

toArrayImpl never occurs in user written code

Wrenna Robson (Nov 19 2025 at 10:43):

Right I see.

Wrenna Robson (Nov 19 2025 at 10:45):

It is a bit odd that we have List.size_toArrayAux and Array.toArrayAux_eq then.


Last updated: Dec 20 2025 at 21:32 UTC