Zulip Chat Archive
Stream: lean4
Topic: Using dbgTraceIfShared correctly
Thomas Murrills (Mar 16 2024 at 02:01):
My current understanding is that you can wrap dbgTraceIfShared
around an expression, and a message will be logged if the expression is shared. (And that for arrays, sharing implies copying.) However:
def x (d : Array Nat) := (dbgTraceIfShared "💥" d)
#eval x #[2] -- shared RC 💥
I noticed that some messages on zulip using dbgTraceIfShared
employ the following incantation, which makes the sharing message disappear:
set_option compiler.extract_closed false in
#eval x #[2]
What does this mean? What are the implications re: whether you can expect the relevant code to induce sharing when run in other contexts?
Mario Carneiro (Mar 16 2024 at 02:07):
What's happening is that #[2]
is a closed expression, so lean hoists it out to a global variable and initializes it at startup. Then at #eval
time it passes this global variable to the x
function, meaning that d
is not the only reference - the global is also holding on to a copy, and if you modify it then it will copy the global to give you a local copy. This is mostly fine, but it can be inefficient if you made a large array with mkArray
and lean has decided to hoist it since now you have replaced a large allocation with a large copy
Mario Carneiro (Mar 16 2024 at 02:08):
and if the array is really large you also have the overhead of needing to have two of them in your program memory
Thomas Murrills (Mar 16 2024 at 02:10):
Oh, ok—what counts as a closed expression in this context?
Mario Carneiro (Mar 16 2024 at 02:11):
any subexpression with no free variables
Mario Carneiro (Mar 16 2024 at 02:12):
in fact in this example even x #[2]
is subject to hoisting
Thomas Murrills (Mar 16 2024 at 02:16):
Ah, ok. So can hoisting happen from within any expression? Even, say, def x := fun _ : Nat => mkArray 5 0
? (I wouldn't expect to make an array in this context before x
was called on something.)
Mario Carneiro (Mar 16 2024 at 02:17):
yes
Mario Carneiro (Mar 16 2024 at 02:19):
In fact, it will even hoist the #[][0]!
subexpression in
def foo (b : Bool) (_: b = true) := if b then 0 else #[][0]!
which leads to a panic at initialization time
Thomas Murrills (Mar 16 2024 at 02:21):
Whoa. So just to be clear: does this mean that in the def x := fun _ : Nat => mkArray 5 0
case, there's actually a size-5 array in a global variable after this? If so, is a set_option
really the only way to prevent that?
Mario Carneiro (Mar 16 2024 at 02:22):
yes
Thomas Murrills (Mar 16 2024 at 02:24):
Huh. That definitely changes how I imagine my code working in certain cases!
Mario Carneiro (Mar 16 2024 at 02:24):
yes, it's a pretty severe issue with the current compiler, but things aren't likely to improve until the new compiler is finished
Mario Carneiro (Mar 16 2024 at 02:26):
it does make me wonder what the perf effect of just setting set_option compiler.extract_closed false
globally (in lean + lake) is
Thomas Murrills (Mar 16 2024 at 04:01):
Hmm, here's another question about interpreting the result of dbgTraceIfShared
: consider
macro "s?" n:num t:term : term => `(term|dbgTraceIfShared $(quote s!"{n.getNat}") $t:term)
set_option trace.compiler.ir.result true in
def r (d : Array Nat) := if (s? 1 d).size == 3 then #[0] else (s? 2 d)
set_option compiler.extract_closed false in -- shared RC 1
#eval r #[2,3]
But if you remove the trace, the IR looks a little different (besides the obvious). E.g. in the false case we see inc x_1
.
Does checking an array's size count as a copy-inducing use here? Does RC > 1 always imply copying for an array?
This is a smaller version of the real-life example:
def add {α} [BEq α] (d : Array α) (e : α) :=
if (s? 1 d).contains e then (s? 2 d) else (s? 3 d).push e
which yields shared RC 1
. Intuitively, you'd think that merely inspecting elements wouldn't induce copying—but I guess a reference is a reference.
Mac Malone (Mar 16 2024 at 04:09):
In addition to the option there is also the @[never_extract]
attribute (def).
Mario Carneiro (Mar 16 2024 at 05:13):
Does checking an array's size count as a copy-inducing use here? Does RC > 1 always imply copying for an array?
What you are seeing is that RC > 1 in the first call, and this correctly reflects that the array is still live while running contains
. This is not a copy-inducing use though, because you didn't call a mutating operation like push
while it was copied. Note that in your test the s? 3 d
call did not trigger, meaning that when you called the only potentially copy-inducing function here, it was not shared and therefore no copy happened.
Thomas Murrills (Mar 16 2024 at 15:47):
Ah, ok, so only operations which mutate the array induce copies—all is right with the world.
Btw, is there a way to tell if operations (other than the obvious ones) mutate the array? What if, say, contains
were implemented in a way that mutated the array during evaluation despite its functionality?
Thomas Murrills (Mar 16 2024 at 15:52):
(With the goal of being able to say confidently that d
is never actually copied during x
, ofc.)
Thomas Murrills (Mar 16 2024 at 18:45):
(And just to put it explicitly in this thread for posterity, what I'm gathering is that the condition for copying an array a
is exactly "a
is shared (RC(a
) > 1) while a mutating operation is called on a
".)
Mario Carneiro (Mar 17 2024 at 02:23):
Thomas Murrills said:
Btw, is there a way to tell if operations (other than the obvious ones) mutate the array? What if, say,
contains
were implemented in a way that mutated the array during evaluation despite its functionality?
Not really, you just have to know and/or check the docs. That said, in most cases it's pretty obvious, functions that mutate usually return a mutated copy. But there is nothing from stopping you from defining some contains'
which calls (arr.push a).contains a'
or something and that would mutate the array even though it doesn't have anything in its type signature saying it does.
Jannis Limperg (Mar 18 2024 at 11:00):
Feature request: some way to indicate to the RTS that an object should never be shared, and then an option for the RTS to check this dynamically at runtime. E.g. if I have an array in some monadic state, I want to ensure that the array is always accessed linearly (not shared).
Last updated: May 02 2025 at 03:31 UTC