Zulip Chat Archive
Stream: lean4
Topic: inlining and specialization questions; stream fusion example
Scott Kovach (Mar 08 2024 at 01:30):
hello! I've recently spent some time working on a prototype library that implements a sort of stream fusion for key-value data structures. The efficiency of the code depends quite a lot on inlining and specialization behavior, so I've been digging into generated code and made a few confusing observations. I intend to figure out as much as possible by exploring the compiler, but of course I would greatly appreciate any help navigating.
[See code below; I'm happy to provide more background on what this program is meant to do, but this message is already rather long, and I suppose explaining the compiler behavior might not require digging into the program too much.]
This example compares two stream fold functions which are the same, except for where they construct a subtype value (see lines marked "HERE").
An example using the first (t1
) is full inlined (all stream functions inlined to fold) whereas the second isn't.
The mental model I've heard before is that subtypes don't exist at runtime, so this difference is surprising.
Looking at the trace from trace.compiler.inline
shows an extra instance of inlining Function.comp
in the latter which does not exist in the former.
Indeed, replacing the call to ∘
with a lambda eliminates the issue (both results are the same).
My guess is that the Subtype might be a red herring, and actually lambda expressions just sometimes defeat inlining? I have another example where this seems to be the case. It is surprising nevertheless that Function.comp
leads to such an issue.
In this and more complex examples I have trouble tracing through what the compiler sees; any suggestions for other trace options to turn on for this investigation?
other questions:
1. Is it expected that replacing inline
with always_inline
reduces the amount of inlining that occurs? (try doing this anywhere in the code below). I took a quick look at inlineCandidate?
and can't see why this is the case; maybe it has to do with how attributes are assigned to internally generated code?
2. Even after specialization, the fold functions have more than two parameters (the rest of which are not referenced). For instance, in the examples here a stream is constructed and passed to fold.go
even though go
is parametrized to not even textually reference the stream! This leads to unnecessary allocations at run time. Is there a way to understand this and ensure that it doesn't happen?
namespace mwe
notation "ℕ" => Nat
-- This represents a streaming representation of an associative container
structure Stream' (ι : Type) (α : Type u) where
σ : Type
valid : σ → Bool
index : {x // valid x} → ι
seek : {x // valid x} → ι × Bool → σ
ready : {x // valid x} → Bool
value : {x // ready x} → α
-- stream plus a state
structure Stream (ι : Type) (α : Type u) extends Stream' ι α :=
q : σ
infixr:25 " →ₛ " => Stream
@[inline] partial def Stream.fold (f : β → ι → α → β) (s : ι →ₛ α) : β → β :=
let rec @[specialize] go {σ} f
(valid : σ → Bool)
(index : {x // valid x} → ι)
(seek : {x // valid x} → ι → Bool → σ)
(ready : {x // valid x} → Bool)
(value : (q : {x // valid x}) → ready q → α) -- HERE
(q : σ) (acc : β) :=
if hv : valid q then
let q := ⟨q, hv⟩
let i := index q
let hr := ready q
let acc' := if hr : hr
then f acc i (value q hr) -- HERE
else acc
let q' := seek q i hr
go f valid index seek ready value q' acc'
else acc
go f s.valid s.index (fun q i r => s.seek q (i,r)) s.ready (fun q r => s.value ⟨q, r⟩)
s.q
@[inline] partial def Stream.fold' (f : β → ι → α → β) (s : ι →ₛ α) : β → β :=
let rec @[specialize] go {σ} f
(valid : σ → Bool)
(index : {x // valid x} → ι)
(seek : {x // valid x} → ι → Bool → σ)
(ready : {x // valid x} → Bool)
(value : {x // ready x} → α) -- HERE
(q : σ) (acc : β) : β :=
if hv : valid q then
let q := ⟨q, hv⟩
let i := index q
let hr := ready q
let acc' := if hr : hr
then f acc i (value ⟨q, hr⟩) -- HERE
else acc
let q' := seek q i hr
go f valid index seek ready value q' acc'
else acc
go f s.valid s.index (fun q i r => s.seek q (i,r)) s.ready s.value s.q
@[inline] def Stream.map (f : α → β) (s : ι →ₛ α) : ι →ₛ β := { s with value := f ∘ s.value }
-- @[inline] def Stream.map (f : α → β) (s : ι →ₛ α) : ι →ₛ β := { s with value := fun x => f (s.value x) } -- fixes demonstrated issue
-- example stream; traverse values from `lo` to `hi`
@[inline] def range (lo hi : ℕ) : ℕ →ₛ Bool where
σ := ℕ
q := lo
valid q := q < hi
ready _ := true
index q := q
value _ := true
seek q := fun ⟨j, r⟩ =>
if r then if q ≤ j then q+1 else q
else if q < j then q+1 else q
@[inline] def t1 : ℕ := (range 0 10 |>.map fun _ => 1).fold (fun b _ x => b + x) 0
@[inline] def t2 : ℕ := (range 0 10 |>.map fun _ => 1).fold' (fun b _ x => b + x) 0
section
set_option trace.compiler.stage2 true
set_option trace.compiler.inline true
-- This line demonstrates that all stream component functions have been inlined into fold:
example := t1
-- However, calls to _x_2 and _x_7 are not inlined here; somehow Function.comp is the culprit
example := t2
end
-- lower-level view of resulting code
set_option trace.compiler.ir.result true
example := t1
example := t2
end mwe
Patrick Massot (Mar 08 2024 at 01:39):
I don’t know anything about the compiler but I can already tell you that somewhat strange things are expected. There is a new compiler planned (and already started) but it is currently not a priority. So bugs or counter-intuitive behavior in the old compiler won’t be worked on, but the proper fix won’t come soon, sorry.
Last updated: May 02 2025 at 03:31 UTC