Zulip Chat Archive

Stream: batteries

Topic: findIdx and indexOf

Julien Michel (Mar 10 2024 at 14:03):

In std, I see a bunch of a functions like that :

/-- Returns the index of the first element satisfying `p`, or the length of the list otherwise. -/
@[inline] def findIdx (p : α → Bool) (l : List α) : Nat

/-- Returns the index of the first element equal to `a`, or the length of the list otherwise. -/
def indexOf [BEq α] (a : α) : List α → Nat

/-- Removes the `n`th element of `l`, or the original list if `n` is out of bounds. -/
@[simp] def removeNth : List α → Nat → List α

At first sight, I kind of dislike the "or" part , which looks like it's reinventing NULL. Is there a good reason for having them?
Wouldn't it be better to just have a panic version (!) along with the Option version (?), and maybe a regular version that takes a proof that the function will be successful, thereby mirroring Array.get conventions?

James Gallicchio (Mar 11 2024 at 17:31):

Potentially; another apt comparison, though, is Nat.div, for which we return a dummy value in the undefined case (Nat.div i 0 = 0). I think there's reasons for doing either in Lean, but I'm not sure I have intuition for when to use one over the other.

James Gallicchio (Mar 11 2024 at 17:36):

I personally take the route of avoiding the nth and idx based APIs to List, cuz I find them very hard to write code with and prove theorems about

Chris Wong (Mar 12 2024 at 01:24):

Another way of describing findIdx is that it returns the least index n such that, for every k < n, p l[k] is false

Chris Wong (Mar 12 2024 at 01:25):

Binary search algorithms work the same way. It means that you can use the same function for finding an element and inserting a new one.

Chris Wong (Mar 12 2024 at 01:36):

James Gallicchio said:

I'm not sure I have intuition for when to use one over the other.

I think the rule of thumb is to restrict theorems, not definitions. Because if you put the hypothesis on the definition, then you need to re-prove it every time you mention the definition, even if the hypothesis is irrelevant to the result.

Markus Schmaus (Mar 12 2024 at 12:36):

Chris Wong said:

I think the rule of thumb is to restrict theorems, not definitions. Because if you put the hypothesis on the definition, then you need to re-prove it every time you mention the definition, even if the hypothesis is irrelevant to the result.

Does this really make a difference? I still need to prove the same hypothesis in order to use the theorem. Or if I instead unfold the definition, I need to prove the hypothesis in order to exclude the corresponding branch of the function.

Kevin Buzzard (Mar 12 2024 at 19:43):

Yes it makes a difference, because in a proof like $\sqrt{6}=\sqrt{2}\sqrt{3}$ you have to prove three obvious things if you demand nonnegativity in the definition, but you need less if you just want to apply the theorem. These extra things soon start adding up.

Or think about it this way: you're defining a function but you're ignoring one of the inputs. Where is the logic in that?

Julien Michel (Mar 12 2024 at 22:13):

Thanks Kevin that's a nice example.
My original concern is that programmers who write software code, will mindlessly pick findIdx, and forget that the returned index might not be valid.
For proving this is not an issue because if you forget things, your proof just won't type check and that's it.
Perhaps these functions should be named with a more explicit ending.

Kevin Buzzard (Mar 12 2024 at 23:16):

(at a computer now): yeah docs#Real.sqrt_mul only asks that one of the inputs is non-negative (because the junk value 0 is returned for square roots of negative numbers)

Markus Schmaus (Mar 12 2024 at 23:44):

To me the proof looks exactly the same in both cases.

import Mathlib

example : Real.sqrt 6 = (Real.sqrt 3) * (Real.sqrt 2) := by
  have : (6 : Real) = 3 * 2 := by linarith
  simp_all only [Nat.ofNat_nonneg, zero_le, Real.sqrt_mul]

noncomputable def sqrt' (x : ℝ) (h : 0 ≤ x := by linarith) := Real.sqrt x

@[simp]
theorem sqrt'_mul (x y : ℝ) {ha : _} {hb : _} {hc : _} : sqrt' (x * y) ha = (sqrt' x hb) * (sqrt' y hc) := by
  simp_all only [sqrt', Real.sqrt_mul]

example : sqrt' 6 = (sqrt' 3) * (sqrt' 2) := by
  have : (6 : Real) = 3 * 2 := by linarith
  simp_all only [Nat.ofNat_nonneg, sqrt'_mul]

Kevin Buzzard (Mar 13 2024 at 00:05):

Sure, but you've put a default discharger on the input which has been chosen to work on the example I gave (and which had to run three times instead of once on this example)

Markus Schmaus (Mar 13 2024 at 11:54):

Can you give me an example for which this discharger doesn't work, but also doesn't require additional proofs for Real.sqrt?

When I look the code of Real.sqrt and the functions it invokes, I find ⟨max r 0, le_max_right _ _⟩, so here Lean has figure out that 0 ≤ 6, otherwise it can't know anything about the result. I can avoid this check for sqrt' by using NNReal.sqrt directly. So in both cases there are three checks comparing 6, 3, and 2 with 0, either by the linarith or when calling max.

import Mathlib

noncomputable def sqrt' (x : ℝ) (h : 0 ≤ x := by linarith) : Real := NNReal.sqrt ⟨x, h⟩

@[simp]
theorem sqrt'_mul (x y : ℝ) {ha : _} {hb : _} {hc : _} : sqrt' (x * y) ha = (sqrt' x hb) * (sqrt' y hc) := by
  simp only [sqrt', ← NNReal.coe_mul, ← NNReal.sqrt_mul, NNReal.coe_inj]
  congr

example : sqrt' 6 = (sqrt' 3) * (sqrt' 2) := by
  have : (6 : Real) = 3 * 2 := by linarith
  simp_all only [Nat.ofNat_nonneg, sqrt'_mul]

While the differences look minute to me, they are still there and I can understand if you prefer using max for theorem proving. And for the non-computable Real.sqrt that's the end of it.

But if we would be talking about a computable function like Nat.div, there is also runtime to consider. With a Nat.div' the non-zero check could be discharged entirely at compile time. While Nat.div checks for 0 every time it's called.

Julien Michel (Mar 13 2024 at 14:19):

If a def should not have assumptions, should a def ever have a Fin parameter ?

Kim Morrison (Mar 13 2024 at 21:16):

I'm not sure where "a def should not have assumptions" comes from. A def should not have any unnecessary assumptions for the definition.

Julien Michel (Mar 13 2024 at 22:52):

I think Kevin was arguing in favor of that tendency towards absence of assumptions in many mathlib defs, and returning junk values on meaningless inputs (in traditional math).
Continuing along this path, I was wondering why we should ever use Fin in a def, when we can replace a Fin with a Nat, and return a junk value if it's out of bounds.
But it's true that many mathlib defs clearly have typeclass assumptions.

Kevin Buzzard (Mar 14 2024 at 00:32):

@Markus Schmaus the point I'm trying to make is that Real.sqrt_mul only asks for a proof that 0 <= x whereas putting non-negativity as an assumption into the def of Real.sqrt means that you get asked to prove a non-negativity condition three times when invoking Real.sqrt_mul. Thus putting the assumptions only in the theorems makes the user's life easier. That is all I'm saying. I don't see the relevance of any of your questions to this point right now, but if you can explain why they're relevant to the point I'm making then I will certainly try to answer them.

Johan Commelin (Mar 14 2024 at 03:52):

@Julien Michel Suppose you want to prove something about finite types, eg involving their cardinality. Then Fin is quite useful

Last updated: May 02 2025 at 03:31 UTC