Zulip Chat Archive

Stream: lean4

Topic: Surprising sharing bug


Mario Carneiro (Oct 27 2023 at 10:10):

I just wanted to share this great example of a bug I found in lean4lean. It was very puzzling: it was going through all the same motions as the C++ code but it was in rare cases performing 20x worse. On these examples there wasn't much in the profiler report, but I did notice that a lot of time was being spent in lean_array_push in the union-find data structure, which seemed a bit weird since there were also hashmap pushes and other things that should presumably also have shown up if it was just working hard. Turns out it was a sharing bug, or actually two.

Example 1:

def push (self : UnionFind) : UnionFind where
  arr := self.arr.push self.arr.size, 0
  model := let _, hm := self.model'; _, _, hm.push _ rfl

fixed:

def push (self : UnionFind) : UnionFind where
  arr := let n := self.arr.size; self.arr.push n, 0
  model := let _, hm := self.model'; _, _, hm.push _ rfl

This one is understandable, although it's annoying that lean makes you do it.

Example 2:

def quickIsDefEq (t s : Expr) (useHash := false) : RecM LBool := do
  if  modifyGet fun st =>
    let (b, m) := st.eqvManager.isEquiv useHash t s
    (b, { st with eqvManager := m })
  then return .true

fixed:

 def quickIsDefEq (t s : Expr) (useHash := false) : RecM LBool := do
  if  modifyGet fun (.mk a1 a2 a3 a4 a5 a6 (eqvManager := m)) =>
    let (b, m) := m.isEquiv useHash t s
    (b, .mk a1 a2 a3 a4 a5 a6 (eqvManager := m))
  then return .true

This fix makes me unhappy, and is almost enough to make me declare "modifyGet considered harmful". If I had just used plain structure update this would have preserved sharing, but lean apparently does not hoist the projections of everything other than the eqvManager field past the call to isEquiv here, meaning that a reference to st is live while isEquiv is running and hence st.eqvManager is shared during the call. The fix is to fully destructure it and put it back together afterward.

Mario Carneiro (Oct 27 2023 at 10:28):

Oh no, it's even worse than I thought: the bug fix above doesn't work. Here's an MWE:

@[noinline] def bla : StateM (Array Nat) Bool := fun s => (true, s)

set_option compiler.extract_closed false in
set_option trace.compiler.ir.result true in
def test : StateT (Array Nat × Array Nat) Id Bool :=
  fun st =>
    let one, two := dbgTraceIfShared "st" st
    let (b, two') := bla (dbgTraceIfShared "two" two)
    pure (b, one, two'⟩)

set_option compiler.extract_closed false in
#eval test #[], #[]⟩

This prints shared RC two even though it manifestly uses every value linearly

Mario Carneiro (Oct 27 2023 at 10:32):

Here's the start of the extracted function test:

def test (x_1 : obj) : obj :=
  let x_2 : obj := "st";
  let x_3 : obj := dbgTraceIfShared  x_2 x_1;
  let x_4 : u8 := isShared x_3;
  case x_4 : u8 of
  Bool.false 
    let x_5 : obj := proj[0] x_3;
    let x_6 : obj := proj[1] x_3;
    let x_7 : obj := "two";
    let x_8 : obj := dbgTraceIfShared  x_7 x_6;
    let x_9 : obj := bla x_8;
    let x_10 : u8 := isShared x_9;
    case x_10 : u8 of
    Bool.false 
      let x_11 : obj := proj[0] x_9;
      set x_9[0] := x_5;
      set x_3[1] := x_9;
      set x_3[0] := x_11;
      ret x_3
    Bool.true 
      let x_12 : obj := proj[0] x_9;
      let x_13 : obj := proj[1] x_9;
      inc x_13;
      inc x_12;
      dec x_9;
      let x_14 : obj := ctor_0[Prod.mk] x_5 x_13;
      set x_3[1] := x_14;
      set x_3[0] := x_12;
      ret x_3

I think it might be that the FBIP optimization, which has triggered to reuse the memory of x_3, is also keeping it alive and causing a shared result? This seems very bizarre

Mario Carneiro (Oct 27 2023 at 10:52):

Oops, false alarm, the test is bad because lean made two copies of the empty array via CSE. Using test ⟨#[0], #[1]⟩ results in no shared data. The bug fix does work, it just shows sharing at the very beginning because the empty state is shared.

Joachim Breitner (Oct 27 2023 at 11:10):

I'm actually already surprised by the first example. I could understand that it introduces non-unqueness in a lazy language, but in a strict language the argument needs to be evaluated before the function call, so why does binding n make a difference here?

Mario Carneiro (Oct 27 2023 at 11:16):

because the arguments are evaluated left to right

Mario Carneiro (Oct 27 2023 at 11:17):

actually no nvm, I agree that one is dumb

Sebastian Ullrich (Oct 27 2023 at 11:18):

What's the IR for that one?

Mario Carneiro (Oct 27 2023 at 11:19):

I have an example 3 because it seems this array refuses to be unshared:

def next (m : EquivManager) : NodeRef := m.uf.size

def push (m : EquivManager) : EquivManager := { m with uf := m.uf.push }

def toNode (e : Expr) : StateM EquivManager NodeRef := fun m => do
  if let some r := m.toNodeMap.find? e then
    return (r, m)
  let r := m.next
  let m := m.push
  (r, { m with toNodeMap := m.toNodeMap.insert e r })

It appears that m is being shared across the call to push, even if I inline the two functions:

def Lean.EquivManager.toNode._lambda_1 (x_1 : obj) (x_2 : obj) (x_3 : @& obj) : obj :=
  let x_4 : obj := proj[0] x_1;
  inc x_4;
  let x_5 : obj := Array.size  x_4;
  let x_6 : obj := Lean4Lean.UnionFind.push x_4;
  let x_7 : obj := proj[1] x_1;
  inc x_7;
  dec x_1;
  inc x_5;
  let x_8 : obj := Lean.HashMap.insert._at.Lean.EquivManager.toNode._spec_1 x_7 x_2 x_5;
  let x_9 : obj := ctor_0[Lean.EquivManager.mk] x_6 x_8;
  let x_10 : obj := ctor_0[Prod.mk] x_5 x_9;
  ret x_10

(Notice that x_4 is inc'd just before the push call)

Mario Carneiro (Oct 27 2023 at 11:19):

Sebastian Ullrich said:

What's the IR for that one?

Example 1:

def Lean4Lean.UnionFind.push (x_1 : obj) : obj :=
  let x_2 : obj := Array.size  x_1;
  let x_3 : obj := 0;
  let x_4 : obj := ctor_0[Lean4Lean.UFNode.mk] x_2 x_3;
  let x_5 : obj := Array.push  x_1 x_4;
  ret x_5

Mario Carneiro (Oct 27 2023 at 11:20):

oh wait that's the fixed version

Mario Carneiro (Oct 27 2023 at 11:20):

the unfixed version is exactly the same...

Mario Carneiro (Oct 27 2023 at 11:20):

I must have misdiagnosed because of example 3

Mario Carneiro (Oct 27 2023 at 11:21):

(I have a shared array and a bunch of suspicious patterns I'm trying to fix to eliminate it)

Mario Carneiro (Oct 27 2023 at 11:26):

Here's an MWE for example 3:

@[noinline] def bla : StateM (Array Nat) Bool := fun s => (true, s)

def next (m : Array Nat × Array Nat) : Nat := m.1.size

def push (m : Array Nat × Array Nat) : Array Nat × Array Nat :=
  { m with 1 := m.1.push 0 }

set_option compiler.extract_closed false in
set_option trace.compiler.ir.result true in
def test : StateT (Array Nat × Array Nat) Id Nat := fun m => do
  let r := next m
  let m := push m
  (r, { m with 2 := m.2.push r })
def test (x_1 : obj) : obj :=
  let x_2 : obj := next x_1;
  inc x_1;
  let x_3 : obj := push x_1;
  let x_4 : u8 := isShared x_3;
  case x_4 : u8 of
  Bool.false 
    let x_5 : obj := proj[1] x_3;
    dec x_5;
    let x_6 : obj := proj[1] x_1;
    inc x_6;
    dec x_1;
    inc x_2;
    let x_7 : obj := Array.push  x_6 x_2;
    set x_3[1] := x_7;
    let x_8 : obj := ctor_0[Prod.mk] x_2 x_3;
    ret x_8

Mario Carneiro (Oct 27 2023 at 11:27):

the inc before push is the problem, and it happens because of the proj[1] x_1 further down

Sebastian Ullrich (Oct 27 2023 at 11:28):

Wow, I've never seen with 1 before

Mario Carneiro (Oct 27 2023 at 11:29):

I think this one can be fixed by just inlining everything

Mario Carneiro (Oct 27 2023 at 11:30):

but inline annotations are not sufficient

Mario Carneiro (Oct 27 2023 at 11:32):

even

def test : StateT (Array Nat × Array Nat) Id Nat := fun m => do
  let r := m.1.size
  (r, m.1.push 0, m.2.push r)

doesn't work

Mario Carneiro (Oct 27 2023 at 11:32):

def test : StateT (Array Nat × Array Nat) Id Nat := fun (a, b) => do
  let r := a.size
  (r, a.push 0, b.push r)

does though

Mario Carneiro (Oct 27 2023 at 11:37):

woohoo, that finally killed the sharing bug and lean4lean is acceptably fast

Reid Barton (Oct 28 2023 at 02:17):

This one is at least understandable though, right?
IIRC, there is something vaguely related in GHC where if the GC encounters a thunk that is a projection (say m.2) out of another heap object m that has already been evaluated to a constructor, then it will just evaluate the thunk rather than copy it, which might eliminate the last reference to m. Not sure if there is any way to diagnose this Lean situation at runtime though.

Schrodinger ZHU Yifan (Oct 28 2023 at 03:29):

wow, just bumped into this. I am actually debugging gccjit backend all day long and surprisingly seeing a similar test case here.

Schrodinger ZHU Yifan (Oct 28 2023 at 03:30):

https://github.com/SchrodingerZhu/LeanGccBackend/commit/555fdb80c8b850f8df6ccca752c59c0f6301e47d

Schrodinger ZHU Yifan (Oct 28 2023 at 03:32):

I guess I have messed up something in the backend, such that the array is not updated in place somehow. I am looking into the low level it line by line and I am still not sure why the RC keeps ramping up before push.

Schrodinger ZHU Yifan (Oct 28 2023 at 04:33):

Yeah, it was just me setting wrong header.

Mario Carneiro (Oct 30 2023 at 05:14):

Today's example of a 40x slowdown in lean4lean wrt the C++ is, again, traced to a sharing bug. Example 4:

def whnfCore' (e : Expr) (cheapRec := false) (cheapProj := false) : RecM Expr := do
  let save r := do
    if !cheapRec && !cheapProj then
      modify fun s => { s with whnfCoreCache := (dbgTraceIfShared "!!" s.whnfCoreCache).insert e r }
    return r
  match e with
  | .app .. =>
    e.withAppRev fun f0 rargs => do
    let f  whnfCore f0 cheapRec cheapProj
    if let .lam _ _ body _ := f then
      let rec loop m (f : Expr) : RecM Expr :=
        let cont2 := do
          let r := f.instantiateRange (rargs.size - m) rargs.size rargs
          let r := r.mkAppRevRange 0 (rargs.size - m) rargs
          save <|← whnfCore r cheapRec cheapProj
        if let .lam _ _ body _ := f then
          if m < rargs.size then loop (m + 1) body
          else cont2
        else cont2
      loop 1 body
    else if f == f0 then
      ...
    else
      let r := f.mkAppRevRange 0 rargs.size rargs
      save <|← whnfCore r cheapRec cheapProj
  | _ => ...

Mario Carneiro (Oct 30 2023 at 05:15):

The shared value is the s.whnfCoreCache value at the start. As far as I can tell, there is no way to predict this by looking at the code. It is fixed by writing let rec save instead of let save for the join point; this causes lean to hoist it out as a separate definition instead of inlining it, and somehow this gets the order of operations right

Mario Carneiro (Oct 30 2023 at 05:43):

MWE:

structure Foo where
  arr : Array Nat
  bla : Nat
  deriving Repr

inductive Maybe where
  | some (_ : Nat)
  | none
  with @[computed_field] bar : Maybe  Unit | _ => ()
  deriving Repr

set_option trace.compiler.ir.result true in
def foo (e : Maybe) : StateT Foo Id Unit := do
  let save := do -- change this to `let rec save` to eliminate the sharing below
    modify fun s => { s with arr := (dbgTraceIfShared "!!" s.arr).push 0 }
  if let .some _ := e then
    save

set_option compiler.extract_closed false in
#eval Id.run <| foo (.some 1) #[1], 3

Mario Carneiro (Oct 30 2023 at 05:43):

it appears to have something to do with computed fields


Last updated: Dec 20 2023 at 11:08 UTC