Zulip Chat Archive

I wanted to fill a proof_wanted for erase_data in std (equivalence between Array.erase and List.erase) but I feel that some code in the Init library needs to be cleaned up.
To do things cleanly, I wanted to prove lemmas about the functions buried in it.
I proved that findIdx?, and indexOf? are equivalent for Array and List, after making some modifications for convenience.

1) One issue is that in Array.findIdx?'s code uses index tricks to avoid wf recursion (old code maybe ?), but I find that it's less readable. For my proof, I simplified it to use termination_by (which I believe shouldn't affect performance).

2) Another issue is that Array.indexOf? returns a Fin whereas List.indexOf?, and findIdx?'s return Nat's, and the order of parameters is not the same.
I want to simplify Array.indexOf? to use Array.findIdx? but their signature is different. Should they all return Fin ?

3) Are contributions to the Init library welcome at this time ?
Regarding std, is there a list of tasks that need to be done and where help would be appreciated currently ?

Thanks for your thoughts

François G. Dorais (Mar 10 2024 at 11:14):

These functions are widely used in core so you want to do rigorous performance testing.

Julien Michel (Mar 10 2024 at 13:26):

What I'd like to do first is something like that:

import Std.Data.List
-- I would suggest simplifying Array.findIdx? with below code
@[inline]
def Array.findIdx?' {α : Type u} (as : Array α) (p : α → Bool) : Option Nat := -- wouldn't it be better to return Option (Fin as.size) ?
  let rec loop (j : Nat) :=
    if h : j < as.size then
      if p as[j] then some j else loop (j + 1)
    else none
    termination_by as.size - j
  loop 0

-- so that it's more convenient to write this theorem
theorem Array.findIdx?_eq {α} (as : Array α) (p : α → Bool) : as.findIdx?' p = as.1.findIdx? p :=
  let rec loop (as) (j) : findIdx?'.loop as p j = (as.1.drop j).findIdx? p j := by
    let ⟨l⟩ := as
    unfold findIdx?'.loop
    by_cases h1 : j < size ⟨l⟩
    . rw [dif_pos h1, l.drop_eq_get_cons h1]
      show _ = ite ..
      by_cases h2 : p l[j] = true
      . erw [if_pos h2, if_pos h2]
      . erw [if_neg h2, if_neg h2]; apply loop
    . rw [dif_neg h1, List.drop_eq_nil_of_le (Nat.le_of_not_lt h1)]
      cases j <;> rfl
    termination_by as.size - j
  loop as 0

(Let me know if you can improve this proof). Note that one can then prove easily that indexOf? match for Array and List. Later, if nobody does it, I'll try to finish the proof that erase functions match. But I think there is still some cleaning up to do.

4) Do you have any suggestions on how to do performance testing rigorously? Are there performance tests automatically done during the Core building pipeline? I will have a look

François G. Dorais (Mar 10 2024 at 22:29):

A lot of the code in Lean core is old. The modern way to approach this kind of dilemmas is to have two defs: one which is inefficient but easy to use in proofs, another which is efficient but hard to use in proofs, and a @[csimp] lemma proving that they compute the same function.

Markus Schmaus (Mar 12 2024 at 12:47):

Julien Michel said:

2) Another issue is that Array.indexOf? returns a Fin whereas List.indexOf?, and findIdx?'s return Nat's, and the order of parameters is not the same.
I want to simplify Array.indexOf? to use Array.findIdx? but their signature is different. Should they all return Fin ?

The size of Array is cached and can be accessed in O(1), the length of a list is not. So making List.indexOf? return a Fin would require additional computation. On the other hand if I want to use Array.indexOf? in Array.get, I need a Fin.

Julien Michel (Mar 16 2024 at 11:41):

I disagree that there would be an additional computation to return a Fin instead of a Nat, if you just prove what you need.

import Std.Data.List.Basic

theorem List.lt_length_of_mem_findIdx? (p : α → Bool) (l : List α) (s := 0)
    : ∀ i ∈ l.findIdx? p s, i < l.length + s := by
  induction l generalizing s with
  | nil => intro _ _; trivial
  | cons a l ih =>
    intro i (h : ite .. = _)
    erw [Nat.add_assoc, Nat.add_comm 1]
    split at h
    . simp_all_arith
    . simp_all

def List.findIdxFin? (p : α → Bool) (l : List α) : Option $ Fin l.length := -- this length won't be called at runtime
  (l.findIdx? p).pmap Fin.mk (l.lt_length_of_mem_findIdx? p)

Markus Schmaus (Mar 16 2024 at 13:28):

You are right, sorry I was wrong.

Julien Michel (Mar 16 2024 at 13:36):

I actually dislike the idea of returning Fin or any data coupled with properties. I prefer returning pure data like Nat and separately prove things about it.

Eric Wieser (Mar 17 2024 at 11:04):

I don't think the line between "data" and "data coupled with properties" is as clear as you make it out to be. docs#Fin and docs#Fin2 both represent numbers less than n, but the latter doesn't carry any explicit proof fields.

Eric Wieser (Mar 17 2024 at 11:05):

And docs#Rat contains proofs fields, but is presumably something you consider as "pure data"

Mario Carneiro (Mar 17 2024 at 11:11):

the usual place I would draw the line between "data" and "data with properties" is whether the function explicitly uses Subtype in inputs or outputs, or has proof arguments. Once you have defined a custom type with the properties as embedded fields, it's been rebranded as "data"

Mario Carneiro (Mar 17 2024 at 11:14):

There is another useful place to draw the line though, which is whether the type is a dependent function, as type dependencies can make rewriting more difficult. That litmus test would differentiate between the cases of Fin n vs Rat

Last updated: May 02 2025 at 03:31 UTC

leanprover-community / mathlib