Zulip Chat Archive

Stream: mathlib4

Topic: List insertion slowdown


Bhavik Mehta (Sep 04 2023 at 10:00):

cc @Scott Morrison, to move the discussion of parts_property out of this thread.
Looking at it again, this example isn't too slow with the priority of List.decidableBall increased, and in Lean 3 the decidable instance is inferred to be this one while it isn't in Lean 4. I'll try to make an example where this isn't the issue, but in the meantime is it reasonable to increase the priority of this instance (and similar ones for Finset and Multiset and Bex)?

Bhavik Mehta (Sep 04 2023 at 10:06):

Here was my original (only slightly minimised) example:

import Mathlib.Data.Finite.Card

def parts : Finset (Fin 4  ZMod 2) :=
  {![1, 1, 0, 0], ![0, 0, 1, 1], ![1, 0, 0, 1], ![1, 1, 1, 0], ![1, 0, 0, 0]}

theorem parts_property :
     (x) (_ : x  parts) (y) (_ : y  parts), x + y  parts := by decide

which is not fast but compiles in lean 3 - it's checking 125 cases. But in Lean 4 it times out, even if I say attribute [instance 10000] Finset.decidableDforallFinset first. I'll try to minimise here, but I'll include this version in case anyone does it faster / different / better than me

Arthur Adjedj (Sep 04 2023 at 10:30):

Having tried to debug this issue, the slowdown doesn't seem to be related to the type-checking of the value in itself. Working on the following (faster) example:

import Mathlib.Data.Finite.Card

def parts : Finset (Fin 3  ZMod 2) :=
  {![1, 1, 0], ![0, 0, 1], ![1, 0, 0], ![1, 1, 1], ![1, 0, 0]}

set_option trace.profiler true

theorem parts_property :
     (x) (_ : x  parts) (y) (_ : y  parts), x + y  parts := by decide

On my PC, less than a 100th of the execution time of the tactic is spent doing meaningful computations(~15ms out of 15s of execution). After making a custom version of lean with more precise traces in evalDecide, I still couldn't find any slowdown in the execution of the tactic itself. Could this be a linters issue ? If so, how can one profile those ?

Bhavik Mehta (Sep 04 2023 at 12:01):

One thing which Eric suggested - changing the definition of parts to

def parts : Finset (Fin 4  ZMod 2) :=
  Finset.cons ![1, 1, 0, 0]
    (Finset.cons ![0, 0, 1, 1]
      (Finset.cons ![1, 0, 0, 1]
        (Finset.cons ![1, 1, 1, 0]
          {![1, 0, 0, 0]}
          sorry)
        sorry)
      sorry)
    sorry

(and keeping the "fixed" instance priority) makes it 5x faster, which seems to suggest something in Finset.insert (or List.insert) is making it slow...

Eric Rodriguez (Sep 18 2023 at 06:50):

I wonder if #7232 will make this faster. It's worth checking after CI

Eric Rodriguez (Sep 18 2023 at 11:35):

It doesn't, even if all instances that use if are replaced (even in Std). native_decide is instant.

Bhavik Mehta (Sep 18 2023 at 21:14):

import Mathlib.Data.Finite.Card

attribute [instance 10000] List.decidableBall

def parts : List (Fin 4  ZMod 2) :=
  {![1, 1, 0, 0], ![0, 0, 1, 1], ![1, 0, 0, 1], ![1, 1, 1, 0], ![1, 0, 0, 0]}

def parts2 : List (Fin 4  ZMod 2) :=
  [![1, 1, 0, 0], ![0, 0, 1, 1], ![1, 0, 0, 1], ![1, 1, 1, 0], ![1, 0, 0, 0]]

count_heartbeats in -- 14K (quick)
example : parts = parts2 := by decide

count_heartbeats in -- 15K (quick)
theorem parts2_property :
     (x) (_ : x  parts2) (y) (_ : y  parts2), x + y  parts2 := by decide

count_heartbeats in -- 412K (slow)
theorem parts_property :
     (x) (_ : x  parts) (y) (_ : y  parts), x + y  parts := by decide

is a further partial minimisation thanks to @Kevin Buzzard

Kevin Buzzard (Sep 18 2023 at 21:54):

I've got it down to this:

import Mathlib.Data.Fintype.Basic -- `Fintype.decidablePiFintype : DecidableEq (Fin 4 → ℕ)`
import Std.Data.Fin.Lemmas
import Std.Data.List.Basic

-- `!` vector notation
namespace Matrix

universe u

variable {α : Type u}

def vecEmpty : Fin 0  α :=
  Fin.elim0

def vecCons {n : Nat} (h : α) (t : Fin n  α) : Fin n.succ  α :=
  fun j  Fin.cases h t j

syntax (name := vecNotation) "![" term,* "]" : term

macro_rules
  | `(![$term:term, $terms:term,*]) => `(vecCons $term ![$terms,*])
  | `(![$term:term]) => `(vecCons $term ![])
  | `(![]) => `(vecEmpty)

end Matrix

def parts : List (Fin 4  Nat) := List.insert (![1, 1, 0, 0]) $ List.insert (![0, 0, 1, 1]) $
  List.insert (![1, 0, 0, 1]) $ List.insert (![1, 1, 1, 0]) $ List.insert (![1, 0, 0, 0]) []

def parts2 : List (Fin 4  Nat) :=
  [![(1 : Nat), 1, 0, 0], ![0, 0, 1, 1], ![1, 0, 0, 1], ![1, 1, 1, 0], ![1, 0, 0, 0]]

count_heartbeats in -- 6K (quick)
example : parts = parts2 := by decide

count_heartbeats in -- 13K (quick)
theorem parts2_property :
     (x) (_ : x  parts2) (y) (_ : y  parts2), x + y  parts2 := by decide

count_heartbeats in -- 374K (slow)
theorem parts_property :
     (x) (_ : x  parts) (y) (_ : y  parts), x + y  parts := by decide

but I still need Fintype for DecidableEq (Fin 4 → ℕ)

Kevin Buzzard (Sep 18 2023 at 22:37):

[Elab.command] [18.116291s] set_option trace.profiler true in
    set_option maxHeartbeats 0 in
    theorem parts_property :  (x) (_ : x  parts) (y) (_ : y  parts), x + y  parts := by decide 
  [] [18.116215s] set_option maxHeartbeats 0 in
      theorem parts_property :  (x) (_ : x  parts) (y) (_ : y  parts), x + y  parts := by decide 
    [] [18.116142s] theorem parts_property :  (x) (_ : x  parts) (y) (_ : y  parts), x + y  parts := by decide 
      [step] [18.042215s] decide 
        [] [18.042199s] decide 
          [] [18.042183s] decide 
            [Meta.appBuilder] [0.193585s]  f: Decidable.decide, xs: [ (x : Fin 4  ), x  parts   (y : Fin 4  ), y  parts  ¬x + y  parts,
                 <not-available>] 
      [Kernel] [0.066214s] typechecking declaration

Wait where did the 18 seconds go?

Kevin Buzzard (Sep 18 2023 at 23:14):

Oh I know where they went; if you trace isDefEq then the output is 1.8 million lines long.

OK so with Eric R and Bhavik on the Discord we have a mathlib-free repro:

import Std.Data.Fin.Lemmas
import Std.Data.List.Basic

-- boilerplate
instance instAdd.{u,v₁} {I : Type u} {f : I  Type v₁} [ i, Add <| f i] : Add ( i : I, f i) :=
  fun f g i => f i + g i

instance (priority := high): DecidableEq (Fin 4  Nat) :=
fun f g =>
match (inferInstance : Decidable (f 0 = g 0)) with
| isFalse h => isFalse sorry
| isTrue h =>
  match (inferInstance : Decidable (f 1 = g 1)) with
  | isFalse h => isFalse sorry
  | isTrue h =>
    match (inferInstance : Decidable (f 2 = g 2)) with
    | isFalse h => isFalse sorry
    | isTrue h =>
      match (inferInstance : Decidable (f 3 = g 3)) with
        | isFalse h => isFalse sorry
        | isTrue h => isTrue sorry

namespace Matrix

universe u

variable {α : Type u}

def vecEmpty : Fin 0  α :=
  Fin.elim0

def vecCons {n : Nat} (h : α) (t : Fin n  α) : Fin n.succ  α :=
  fun j  Fin.cases h t j

syntax (name := vecNotation) "![" term,* "]" : term

macro_rules
  | `(![$term:term, $terms:term,*]) => `(vecCons $term ![$terms,*])
  | `(![$term:term]) => `(vecCons $term ![])
  | `(![]) => `(vecEmpty)

end Matrix

-- actual example starts here

def parts : List (Fin 4  Nat) := List.insert (![1, 1, 0, 0]) $ List.insert (![0, 0, 1, 1]) $
  List.insert (![1, 0, 0, 1]) $ List.insert (![1, 1, 1, 0]) $ List.insert (![1, 0, 0, 0]) []

def parts2 : List (Fin 4  Nat) :=
  [![(1 : Nat), 1, 0, 0], ![0, 0, 1, 1], ![1, 0, 0, 1], ![1, 1, 1, 0], ![1, 0, 0, 0]]

set_option trace.profiler true in -- quick
example : parts = parts2 := rfl

set_option trace.profiler true in -- quick
theorem parts2_property :
     (x) (_ : x  parts2) (y) (_ : y  parts2), x + y  parts2 := by decide

set_option trace.profiler true in -- slooooow
-- set_option trace.Meta.isDefEq true in
set_option maxHeartbeats 400000 in
theorem parts_property :
     (x) (_ : x  parts) (y) (_ : y  parts), x + y  parts := by decide

If you chase the 11 second trace as it stands then it just magically disappears like in the example in the previous post. However if you uncomment trace.Meta.isDefEq then you crash your VS Code. Running the code above on the command line and piping the output to a file gives you 113K lines of code which you can view here if you like. This is with Eric R's simplified DecidableEq instance on Fin 4 -> Nat. With mathlib's it's 1.8 million lines of trace and takes about twice as long.

Scott Morrison (Sep 19 2023 at 01:38):

Can we get it down to a Std free minimisation, and then open an issue?

Eric Rodriguez (Sep 19 2023 at 08:38):

Here's a very much not minimal std-free example:

code

This can be much improved - for example, replacing vec_notation with some custom enum and turning the ![a,b,c,d] syntax into a match of that enum -> Nat.

Eric Rodriguez (Sep 19 2023 at 08:39):

However, I don't have the time to tidy this for a couple days. Also note that docs#Fin.induction needs a performance review, supposedly, an this is copy pasted into this code so maybe (but really doubtfully!) this is the cause of the slowdown.

Kevin Buzzard (Sep 19 2023 at 08:54):

Changing Fin 4 -> Nat to a simpler type typically makes things quicker, but I've countered this by making the lists two elements longer. Below is a short lean-only repro and now I'm wondering whether the issue is the if in docs#List.insert ? I have not opened an issue yet: I find these things difficult because I have no idea what decide and native_decide do so I find it hard to explain them properly. These sorts of proofs are a long way from how I use Lean, I just like the puzzle of minimising examples of things and I like to support people like Bhavik who get frustrated with Lean 4 because of issues like this, that's the only reason I'm involved.

section decidability_instances

variable {α : Type} {p : α  Prop} [DecidablePred p]

instance decidableBex :  (l : List α), Decidable ( x, x  l  p x)
  | []    => isFalse sorry
  | x::xs =>
    match DecidablePred p x with
    | isTrue h₁ => isTrue sorry
    | isFalse h₁ => match decidableBex xs with
      | isTrue h₂  => isTrue <| by sorry
      | isFalse h₂ => isFalse <| by sorry

instance decidableBall (l : List α) : Decidable ( x, x  l  p x) :=
  match (inferInstance : Decidable <|  x, x  l  ¬ p x) with
  | isFalse h => isTrue $ fun x hx => match DecidablePred p x with
    | isTrue h' => h'
    | isFalse h' => False.elim $ h sorry
  | isTrue h => isFalse <| sorry

end decidability_instances

@[inline] protected def List.insert {α : Type} [DecidableEq α] (a : α) (l : List α) : List α :=
  if a  l then l else a :: l

def parts : List (List Nat) := List.insert ([1, 1, 0, 0]) <| List.insert ([0, 0, 1, 1]) <|
  List.insert ([1, 0, 0, 1]) <| List.insert ([1, 1, 1, 0]) <| List.insert ([1, 0, 0, 0]) <|
  List.insert [1, 2, 3, 4] <| List.insert [5, 6, 7, 8] []

def parts2 : List (List Nat) :=
  [[1, 1, 0, 0], [0, 0, 1, 1], [1, 0, 0, 1], [1, 1, 1, 0], [1, 0, 0, 0], [1, 2, 3, 4], [5, 6, 7, 8]]

-- quick
example : parts = parts2 := rfl

-- quick
example :  (x) (_ : x  parts2) (y) (_ : y  parts2), x ++ y  parts2 := by decide

-- quick
example :  (x) (_ : x  parts) (y) (_ : y  parts), x ++ y  parts := by native_decide

-- slow
set_option maxHeartbeats 600000 in -- needed
example :  (x) (_ : x  parts) (y) (_ : y  parts), x ++ y  parts := by decide

Kevin Buzzard (Sep 19 2023 at 20:01):

Scott Morrison said:

Can we get it down to a Std free minimisation, and then open an issue?

lean4#2564 .

Sebastian Ullrich (Sep 19 2023 at 20:43):

Thanks for filing. If you want to try profiling again with lean4#2559, you should get a more meaningful trace

Sebastian Ullrich (Sep 19 2023 at 20:44):

Also, as your trace above already showed, the kernel computation is fast! Which in itself is interesting, what is the fundamental difference between elaborator and kernel here?


Last updated: Dec 20 2023 at 11:08 UTC