Zulip Chat Archive
Stream: lean4
Topic: Surprising sharing bug
Mario Carneiro (Oct 27 2023 at 10:10):
I just wanted to share this great example of a bug I found in lean4lean. It was very puzzling: it was going through all the same motions as the C++ code but it was in rare cases performing 20x worse. On these examples there wasn't much in the profiler report, but I did notice that a lot of time was being spent in lean_array_push
in the union-find data structure, which seemed a bit weird since there were also hashmap pushes and other things that should presumably also have shown up if it was just working hard. Turns out it was a sharing bug, or actually two.
Example 1:
def push (self : UnionFind) : UnionFind where
arr := self.arr.push ⟨self.arr.size, 0⟩
model := let ⟨_, hm⟩ := self.model'; ⟨_, _, hm.push _ rfl⟩
fixed:
def push (self : UnionFind) : UnionFind where
arr := let n := self.arr.size; self.arr.push ⟨n, 0⟩
model := let ⟨_, hm⟩ := self.model'; ⟨_, _, hm.push _ rfl⟩
This one is understandable, although it's annoying that lean makes you do it.
Example 2:
def quickIsDefEq (t s : Expr) (useHash := false) : RecM LBool := do
if ← modifyGet fun st =>
let (b, m) := st.eqvManager.isEquiv useHash t s
(b, { st with eqvManager := m })
then return .true
fixed:
def quickIsDefEq (t s : Expr) (useHash := false) : RecM LBool := do
if ← modifyGet fun (.mk a1 a2 a3 a4 a5 a6 (eqvManager := m)) =>
let (b, m) := m.isEquiv useHash t s
(b, .mk a1 a2 a3 a4 a5 a6 (eqvManager := m))
then return .true
This fix makes me unhappy, and is almost enough to make me declare "modifyGet
considered harmful". If I had just used plain structure update this would have preserved sharing, but lean apparently does not hoist the projections of everything other than the eqvManager
field past the call to isEquiv
here, meaning that a reference to st
is live while isEquiv
is running and hence st.eqvManager
is shared during the call. The fix is to fully destructure it and put it back together afterward.
Mario Carneiro (Oct 27 2023 at 10:28):
Oh no, it's even worse than I thought: the bug fix above doesn't work. Here's an MWE:
@[noinline] def bla : StateM (Array Nat) Bool := fun s => (true, s)
set_option compiler.extract_closed false in
set_option trace.compiler.ir.result true in
def test : StateT (Array Nat × Array Nat) Id Bool :=
fun st =>
let ⟨one, two⟩ := dbgTraceIfShared "st" st
let (b, two') := bla (dbgTraceIfShared "two" two)
pure (b, ⟨one, two'⟩)
set_option compiler.extract_closed false in
#eval test ⟨#[], #[]⟩
This prints shared RC two
even though it manifestly uses every value linearly
Mario Carneiro (Oct 27 2023 at 10:32):
Here's the start of the extracted function test
:
def test (x_1 : obj) : obj :=
let x_2 : obj := "st";
let x_3 : obj := dbgTraceIfShared ◾ x_2 x_1;
let x_4 : u8 := isShared x_3;
case x_4 : u8 of
Bool.false →
let x_5 : obj := proj[0] x_3;
let x_6 : obj := proj[1] x_3;
let x_7 : obj := "two";
let x_8 : obj := dbgTraceIfShared ◾ x_7 x_6;
let x_9 : obj := bla x_8;
let x_10 : u8 := isShared x_9;
case x_10 : u8 of
Bool.false →
let x_11 : obj := proj[0] x_9;
set x_9[0] := x_5;
set x_3[1] := x_9;
set x_3[0] := x_11;
ret x_3
Bool.true →
let x_12 : obj := proj[0] x_9;
let x_13 : obj := proj[1] x_9;
inc x_13;
inc x_12;
dec x_9;
let x_14 : obj := ctor_0[Prod.mk] x_5 x_13;
set x_3[1] := x_14;
set x_3[0] := x_12;
ret x_3
I think it might be that the FBIP optimization, which has triggered to reuse the memory of x_3
, is also keeping it alive and causing a shared result? This seems very bizarre
Mario Carneiro (Oct 27 2023 at 10:52):
Oops, false alarm, the test is bad because lean made two copies of the empty array via CSE. Using test ⟨#[0], #[1]⟩
results in no shared data. The bug fix does work, it just shows sharing at the very beginning because the empty state is shared.
Joachim Breitner (Oct 27 2023 at 11:10):
I'm actually already surprised by the first example. I could understand that it introduces non-unqueness in a lazy language, but in a strict language the argument needs to be evaluated before the function call, so why does binding n
make a difference here?
Mario Carneiro (Oct 27 2023 at 11:16):
because the arguments are evaluated left to right
Mario Carneiro (Oct 27 2023 at 11:17):
actually no nvm, I agree that one is dumb
Sebastian Ullrich (Oct 27 2023 at 11:18):
What's the IR for that one?
Mario Carneiro (Oct 27 2023 at 11:19):
I have an example 3 because it seems this array refuses to be unshared:
def next (m : EquivManager) : NodeRef := m.uf.size
def push (m : EquivManager) : EquivManager := { m with uf := m.uf.push }
def toNode (e : Expr) : StateM EquivManager NodeRef := fun m => do
if let some r := m.toNodeMap.find? e then
return (r, m)
let r := m.next
let m := m.push
(r, { m with toNodeMap := m.toNodeMap.insert e r })
It appears that m
is being shared across the call to push
, even if I inline the two functions:
def Lean.EquivManager.toNode._lambda_1 (x_1 : obj) (x_2 : obj) (x_3 : @& obj) : obj :=
let x_4 : obj := proj[0] x_1;
inc x_4;
let x_5 : obj := Array.size ◾ x_4;
let x_6 : obj := Lean4Lean.UnionFind.push x_4;
let x_7 : obj := proj[1] x_1;
inc x_7;
dec x_1;
inc x_5;
let x_8 : obj := Lean.HashMap.insert._at.Lean.EquivManager.toNode._spec_1 x_7 x_2 x_5;
let x_9 : obj := ctor_0[Lean.EquivManager.mk] x_6 x_8;
let x_10 : obj := ctor_0[Prod.mk] x_5 x_9;
ret x_10
(Notice that x_4
is inc
'd just before the push
call)
Mario Carneiro (Oct 27 2023 at 11:19):
Sebastian Ullrich said:
What's the IR for that one?
Example 1:
def Lean4Lean.UnionFind.push (x_1 : obj) : obj :=
let x_2 : obj := Array.size ◾ x_1;
let x_3 : obj := 0;
let x_4 : obj := ctor_0[Lean4Lean.UFNode.mk] x_2 x_3;
let x_5 : obj := Array.push ◾ x_1 x_4;
ret x_5
Mario Carneiro (Oct 27 2023 at 11:20):
oh wait that's the fixed version
Mario Carneiro (Oct 27 2023 at 11:20):
the unfixed version is exactly the same...
Mario Carneiro (Oct 27 2023 at 11:20):
I must have misdiagnosed because of example 3
Mario Carneiro (Oct 27 2023 at 11:21):
(I have a shared array and a bunch of suspicious patterns I'm trying to fix to eliminate it)
Mario Carneiro (Oct 27 2023 at 11:26):
Here's an MWE for example 3:
@[noinline] def bla : StateM (Array Nat) Bool := fun s => (true, s)
def next (m : Array Nat × Array Nat) : Nat := m.1.size
def push (m : Array Nat × Array Nat) : Array Nat × Array Nat :=
{ m with 1 := m.1.push 0 }
set_option compiler.extract_closed false in
set_option trace.compiler.ir.result true in
def test : StateT (Array Nat × Array Nat) Id Nat := fun m => do
let r := next m
let m := push m
(r, { m with 2 := m.2.push r })
def test (x_1 : obj) : obj :=
let x_2 : obj := next x_1;
inc x_1;
let x_3 : obj := push x_1;
let x_4 : u8 := isShared x_3;
case x_4 : u8 of
Bool.false →
let x_5 : obj := proj[1] x_3;
dec x_5;
let x_6 : obj := proj[1] x_1;
inc x_6;
dec x_1;
inc x_2;
let x_7 : obj := Array.push ◾ x_6 x_2;
set x_3[1] := x_7;
let x_8 : obj := ctor_0[Prod.mk] x_2 x_3;
ret x_8
Mario Carneiro (Oct 27 2023 at 11:27):
the inc
before push
is the problem, and it happens because of the proj[1] x_1
further down
Sebastian Ullrich (Oct 27 2023 at 11:28):
Wow, I've never seen with 1
before
Mario Carneiro (Oct 27 2023 at 11:29):
I think this one can be fixed by just inlining everything
Mario Carneiro (Oct 27 2023 at 11:30):
but inline annotations are not sufficient
Mario Carneiro (Oct 27 2023 at 11:32):
even
def test : StateT (Array Nat × Array Nat) Id Nat := fun m => do
let r := m.1.size
(r, m.1.push 0, m.2.push r)
doesn't work
Mario Carneiro (Oct 27 2023 at 11:32):
def test : StateT (Array Nat × Array Nat) Id Nat := fun (a, b) => do
let r := a.size
(r, a.push 0, b.push r)
does though
Mario Carneiro (Oct 27 2023 at 11:37):
woohoo, that finally killed the sharing bug and lean4lean is acceptably fast
Reid Barton (Oct 28 2023 at 02:17):
This one is at least understandable though, right?
IIRC, there is something vaguely related in GHC where if the GC encounters a thunk that is a projection (say m.2
) out of another heap object m
that has already been evaluated to a constructor, then it will just evaluate the thunk rather than copy it, which might eliminate the last reference to m
. Not sure if there is any way to diagnose this Lean situation at runtime though.
Schrodinger ZHU Yifan (Oct 28 2023 at 03:29):
wow, just bumped into this. I am actually debugging gccjit backend all day long and surprisingly seeing a similar test case here.
Schrodinger ZHU Yifan (Oct 28 2023 at 03:30):
https://github.com/SchrodingerZhu/LeanGccBackend/commit/555fdb80c8b850f8df6ccca752c59c0f6301e47d
Schrodinger ZHU Yifan (Oct 28 2023 at 03:32):
I guess I have messed up something in the backend, such that the array is not updated in place somehow. I am looking into the low level it line by line and I am still not sure why the RC keeps ramping up before push.
Schrodinger ZHU Yifan (Oct 28 2023 at 04:33):
Yeah, it was just me setting wrong header.
Mario Carneiro (Oct 30 2023 at 05:14):
Today's example of a 40x slowdown in lean4lean wrt the C++ is, again, traced to a sharing bug. Example 4:
def whnfCore' (e : Expr) (cheapRec := false) (cheapProj := false) : RecM Expr := do
let save r := do
if !cheapRec && !cheapProj then
modify fun s => { s with whnfCoreCache := (dbgTraceIfShared "!!" s.whnfCoreCache).insert e r }
return r
match e with
| .app .. =>
e.withAppRev fun f0 rargs => do
let f ← whnfCore f0 cheapRec cheapProj
if let .lam _ _ body _ := f then
let rec loop m (f : Expr) : RecM Expr :=
let cont2 := do
let r := f.instantiateRange (rargs.size - m) rargs.size rargs
let r := r.mkAppRevRange 0 (rargs.size - m) rargs
save <|← whnfCore r cheapRec cheapProj
if let .lam _ _ body _ := f then
if m < rargs.size then loop (m + 1) body
else cont2
else cont2
loop 1 body
else if f == f0 then
...
else
let r := f.mkAppRevRange 0 rargs.size rargs
save <|← whnfCore r cheapRec cheapProj
| _ => ...
Mario Carneiro (Oct 30 2023 at 05:15):
The shared value is the s.whnfCoreCache
value at the start. As far as I can tell, there is no way to predict this by looking at the code. It is fixed by writing let rec save
instead of let save
for the join point; this causes lean to hoist it out as a separate definition instead of inlining it, and somehow this gets the order of operations right
Mario Carneiro (Oct 30 2023 at 05:43):
MWE:
structure Foo where
arr : Array Nat
bla : Nat
deriving Repr
inductive Maybe where
| some (_ : Nat)
| none
with @[computed_field] bar : Maybe → Unit | _ => ()
deriving Repr
set_option trace.compiler.ir.result true in
def foo (e : Maybe) : StateT Foo Id Unit := do
let save := do -- change this to `let rec save` to eliminate the sharing below
modify fun s => { s with arr := (dbgTraceIfShared "!!" s.arr).push 0 }
if let .some _ := e then
save
set_option compiler.extract_closed false in
#eval Id.run <| foo (.some 1) ⟨#[1], 3⟩
Mario Carneiro (Oct 30 2023 at 05:43):
it appears to have something to do with computed fields
Last updated: Dec 20 2023 at 11:08 UTC