Zulip Chat Archive

Stream: new members

Topic: Realtime audio and mutability


Daniel Skinner (Jan 29 2026 at 02:46):

Hi all, new here; I have a question about practicality in application

From what I can tell, Lean4 wants to say that it is possible to function as a general purpose language (to some degree). At the same time, it appears to be purely functional with sugar that might suggest otherwise.

Let's say I was interested in writing a DSP from scratch and had a need to reuse memory (so, mutability) to extend the amount of computability that could be performed in realtime versus a system designed in purity that might become more limited depending exactly on how things are resolved from Lean to C to Assembly

My question, I suppose, are multiple:

  • Do I misunderstand Lean? I'd also like to explore sound ideas in a space that is not necessarily listenable, but I'd still want things to work efficiently when rendering realtime audio
  • Do I misunderstand mutability in Lean? it appears to be sugar and the language is purely functional and anything otherwise under the covers is happenstance and there's not practical way to reuse something like an Array through multiple function calls
  • Do I misunderstand design? This question asks if there is some design where, through a system of mutability perhaps I can reason about something that functions in realtime but through lack of experience I don't know how to make the comparable in a system that does otherwise (like pure functional)

Anyway, I appreciate feedback and insight, and I would like to say that Lean4 has been one of the most interesting languages to learn as of recent with excellent tooling for developers and I appreciate that.

Notification Bot (Jan 29 2026 at 07:38):

A message was moved here from #new members > Absolute minimum vs. interesting minimum mixins. by Julian Berman.

Robin Arnez (Jan 29 2026 at 17:51):

If your concern is that Lean wouldn't allow directly mutating something like an array in place, then no, Lean does allow modifying an array in place. To be precise, whenever something like an Array or a ByteArray or whatever has a reference count of 1, the mutation happens in place and nothing new is allocated. In particular that means that as long as you don't use the original array anymore after mutation, the mutation happens in place and no copying is performed. This way of not using the original after mutation is often referred to as "linear use" and can be compared to ownership in rust. For example, this code uses the array linearly:

def fillArray (n : Nat) : Array Nat := Id.run do
  let mut a := #[]
  let mut sizeSum := 0
  for i in *...n do
    sizeSum := sizeSum + a.size
    a := a.push i
  return a.push sizeSum

As you can see, computations like #eval (fillArray 100_000).size happen very quickly and even #eval (fillArray 10_000_000).size doesn't take too long (and in compiled code it would be even faster).
If we compare this to a nonlinear implementation:

def fillArrayNonlinear (n : Nat) : Array Nat := Id.run do
  let mut a := #[]
  let mut sizeSum := 0
  for i in *...n do
    let orig := a
    a := a.push i
    sizeSum := sizeSum + orig.size -- orig is used after `push`
  return a.push sizeSum

Even #eval (fillArrayNonlinear 100000).size takes a long time here. That's because the original array is still used so it can't mutate in place, so it copies every time, resulting in quadratic runtime.
Another way to introduce mutation is through IO:

def fillArrayIO (n : Nat) : BaseIO (Array Nat) := do
  let ref  IO.mkRef #[]
  let mut sizeSum := 0
  for i in *...n do
    sizeSum  ref.modifyGet fun x => (sizeSum + x.size, x.push i)
  return ( ref.get).push sizeSum

Here we use an actual mutable reference cell and thus need IO (or at least BaseIO, which is IO without exceptions) to preserve purity. The runtime of this one is comparable to the first one, maybe a tiny bit slower.
So in a sense, yes, Lean doesn't have mutation, since it is purely functional but really, by being reference counted, Lean can turn things that look like mutation into actual mutation.

Jakub Nowak (Jan 29 2026 at 23:01):

Although, linearization doesn't work in multi-threaded context. #lean4 > [cache] attribute @ 💬

Philip Zucker (Jan 29 2026 at 23:40):

I was trying something similar, drawing an image

abbrev Pixel := UInt8 × UInt8 × UInt8
abbrev Image : Type := Array (Array Pixel)

partial def doit (N : Nat) := Id.run do
  let mut img : Image := Array.replicate N (Array.replicate N (0, 0, 0))
  for i in [0:N] do
    for j in [0:N] do
          if ((i : Int)- N/2)^2 + ((j : Int) - N/2)^2 < (N/4)^2
          then
            let newrow := (img[i]!).set! j (0, 0, 255)
            img := img.set! i newrow
  return img

It isn't clear to me that the reference count of the row would be 1? Is nesting like this a bad data format and how can I tell if I've achieved the reasonable implementation? Is there some #set_option I can do to see such things? Also any other suggestions for improvement welcome

Robin Arnez (Jan 29 2026 at 23:50):

The right thing to do here is to img.modify i fun row => row.set! j (0, 0, 255); modify clears the value out of the array before running the inner function

Jakub Nowak (Jan 29 2026 at 23:52):

Looking at the generated C code I think that Lean was able to optimize away copy'ing row.

Daniel Skinner (Jan 30 2026 at 03:42):

I was familiar with mutating in place as long as ref count is 1 as documentation I think does a good job describing this behavior but I just wasn't sure about making that work in composing a larger system that may consist of multiple various buffers persisted and then reused through various iterations of repeated function calls. From my naive experimentation, it seemed like multiple copies were created during a recalculation.

For example, a structure that holds a private array that acts as a small buffer and then repeated function calls that recalculate the private buffer, as a single recalculation may then be followed by multiple reads. And so the issue arises because I make a local mutable reference of the private (which presumably bumps ref count) but then also have to solve updating the structure requiring pushing some logic up the chain to build a larger system, and so on.

But, I was not at all familiar with IO.mkRef and it seems like maybe this might resolve my immediate problem and it's not something I've seen come up in the documentation yet. It seems like that can let me do encapsulation without also having to redefine a structure when encapsulated mutability occurs, so I'll definitely have a look at this.

Thank you!

Daniel Skinner (Jan 30 2026 at 04:54):

well I was finishing chapter 18 and then went on a tangent into metaprogramming guide to build a DSL for something else, whereas IO.mkRef shows up in chapter 21, so more reading to do still

Henrik Böving (Jan 30 2026 at 08:02):

Jakub Nowak said:

Looking at the generated C code I think that Lean was able to optimize away copy'ing row.

You cannot directly see array copies in the generated code, the original code without modify very much does always copy

Jakub Nowak (Jan 30 2026 at 10:46):

Henrik Böving said:

Jakub Nowak said:

Looking at the generated C code I think that Lean was able to optimize away copy'ing row.

You cannot directly see array copies in the generated code, the original code without modify very much does always copy

Oh, wait, why is that? Is the code generated by lean --c=main.c Main.lean not the same as compiled by lake build? I can see the call to lean_copy_expand_array in assembly of the generated binary, but couldn't find where is it called from in the C code.

Henrik Böving (Jan 30 2026 at 10:50):

In Array.set!, all relevant linearity checks for array happen inside of the runtime.

Jakub Nowak (Jan 30 2026 at 11:12):

Ah, yes, I found it, the copy is in lean_ensure_exclusive_array. Thanks!
And there's lean_inc call for the row, so the copy will happen.

Daniel Skinner (Jan 30 2026 at 13:56):

when I benchmarked the 3 implementations, compiled, fillArrayIO had performance similar to fillArrayNonlinear and not fillArray

Daniel Skinner (Jan 30 2026 at 15:04):

also, noting the comment on ordering in fillArrayNonlinear, I revisited using structure with private array issue i was having and see that it is possible to do what I wanted without needing to introduce BaseIO at that level, so the rest is just however I want to build out my graph. Thanks for the assistance

Daniel Skinner (Jan 31 2026 at 01:32):

replacing ref.modifyGet from fillArrayIO example as follows then puts it in comparable runtime as first example

    ref.modify fun x => x.push i
    sizeSum := sizeSum + (<- ref.get).size

though if lines are reversed it then becomes comparable to second; so, however ref.modifyGet is handled does impact runtime of the example.

I was also able to define things like this too and it just seemed to work as I might expect

def Discrete : Type := IO.Ref (Array Nat)

def Discrete.fill(sig : Discrete) : IO Unit := ...

Robin Arnez (Jan 31 2026 at 01:38):

Hmm I see you're probably on a version without lean4#11983. I'll also note though that the PR title is somewhat misleading because the floatLetIn compiler pass still doesn't ensure linearity (floatLetIn is basically a pass for moving let declarations using the purity guarantee but this reorder can break linearity).

Daniel Skinner (Jan 31 2026 at 01:41):

ah yeah that looks pretty recent

$ cat lean-toolchain
leanprover/lean4:v4.27.0

Robin Arnez (Jan 31 2026 at 01:45):

Yeah right, lean4#11983 is only on 4.28.0-rc1

Robin Arnez (Jan 31 2026 at 01:47):

Otherwise, one way you can prevent such problems is to offload all critical operations to IO functions

Robin Arnez (Jan 31 2026 at 01:48):

so e.g.

def refArraySize (ref : IO.Ref (Array Nat)) : BaseIO Nat :=
  Array.size <$> ref.get

def refArrayPush (ref : IO.Ref (Array Nat)) (val : Nat) : BaseIO Nat :=
  ref.modify fun arr => arr.push val

and then

sizeSum := sizeSum + (<- refArraySize ref)
refArrayPush ref i

Daniel Skinner (Jan 31 2026 at 02:02):

hey that's cute, I looked away from your example and tried myself, just lifted offending line into IO.Ref (Array Nat) -> BaseIO Nat and it instantly sped back up.

At some level I'm going to be pushing operations to IO functions, but I imagine that will be higher up the design stack as this kind of idea is still new to me.

i also need to see what the norm is for benchmarking lean programs so I can identify when I introduce problems.

Daniel Skinner (Feb 11 2026 at 15:40):

I have a follow up question for identifying problematic copying. I have this minimal example of three different prepare functions.

  • prepareA is problematic
  • prepareB resolves the problem but has roughly same performance
  • prepareC matches performance expectations running at 7x speed of previous but I don't understand why

Using dbgTraceIfShared helped me identify one of the problems with prepareA but I assume there are more still and using dbgTraceIfShared on other parts doesn't appear to yield anything. Is there some other method to identify the issue? And, I don't know what the issue is or why the global resolves.

def Δt : Float := 440 / 48000
def Float.fract (x : Float) : Float := x - x.floor

-- original code uses class and instance but found I needed inline, otherwise prepareC below has same performance
-- removing inline does not appear to affect prepareA or prepareB benchmarks below
@[inline] def FloatArray.sample (a : FloatArray) (i : Float) : Float := a.get! (i.abs.fract * a.size.toFloat).toUInt64.toNat

structure Osc where
  inp : FloatArray := FloatArray.mk (Array.replicate 4096 0)
  out : FloatArray := FloatArray.mk (Array.replicate 256 0)
  phase : Float := 0

-- prepareA osc.out is shared so RC > 1 and copies
def Osc.prepareA (osc : Osc) (_ : UInt64) : Osc := Id.run do
  let mut out := dbgTraceIfShared "osc.out" osc.out
  let mut phase := osc.phase
  for i in [0:out.size] do
    out := out.set! i (osc.inp.sample phase) -- problematic osc.inp
    phase := phase + Δt
  return { osc with out := out, phase := phase }

-- prepareB osc.out no longer prints shared but has same performance as prepareA ???
def Osc.prepareB (osc : Osc) (_ : UInt64) : Osc := Id.run do
  let inp := osc.inp
  let mut out := dbgTraceIfShared "osc.out" osc.out
  let mut phase := osc.phase
  for i in [0:out.size] do
    out := out.set! i (inp.sample phase)
    phase := phase + Δt
  return { osc with out := out, phase := phase }

-- prepareC uses global input versus stored input on Osc and has very different performance
-- which is near what I would expect for no copying
def g_inp : FloatArray := FloatArray.mk (Array.replicate 4096 0)
def Osc.prepareC (osc : Osc) (_ : UInt64) : Osc := Id.run do
  let mut out := dbgTraceIfShared "osc.out" osc.out
  let mut phase := osc.phase
  for i in [0:out.size] do
    out := out.set! i (g_inp.sample phase)
    phase := phase + Δt
  return { osc with out := out, phase := phase }

-- 1.794s
@[inline] def bench_prepareA (N : UInt64) : IO Unit := do
  let mut osc : Osc := { }
  for n in *...N do osc := osc.prepareA n

-- 1.665s
@[inline] def bench_prepareB (N : UInt64) : IO Unit := do
  let mut osc : Osc := { }
  for n in *...N do osc := osc.prepareB n

-- 0.254s
@[inline] def bench_prepareC (N : UInt64) : IO Unit := do
  let mut osc : Osc := { }
  for n in *...N do osc := osc.prepareC n

def main : IO Unit :=
  let N := 100_000

  -- bench_prepareA N
  -- bench_prepareB N
  bench_prepareC N

Daniel Skinner (Feb 11 2026 at 16:13):

Well it appears to have something to do with converting Nat to Float in FloatArray.sample but only if the array is in a structure? This is on lean v4.28.0-rc1

-- inline FloatArray.sample and extract osc.inp.size, but still convert to float in loop
-- 1.733s
def Osc.prepareD (osc : Osc) (_ : UInt64) : Osc := Id.run do
  let inp := osc.inp
  let inpsize := inp.size
  let mut out := dbgTraceIfShared "osc.out" osc.out
  let mut phase := osc.phase
  for i in [0:out.size] do
    out := out.set! i (inp.get! (phase.abs.fract * inpsize.toFloat).toUSize.toNat)
    phase := phase + Δt
  return { osc with out := out, phase := phase }

-- inline again but this time convert to float outside of loop
-- yet, prepareC just uses FloatArray.sample as-is without issue and same performance
-- 0.271s
def Osc.prepareE (osc : Osc) (_ : UInt64) : Osc := Id.run do
  let inp := osc.inp
  let inpsize := inp.size.toFloat
  let mut out := dbgTraceIfShared "osc.out" osc.out
  let mut phase := osc.phase
  for i in [0:out.size] do
    out := out.set! i (inp.get! (phase.abs.fract * inpsize).toUSize.toNat)
    phase := phase + Δt
  return { osc with out := out, phase := phase }

Daniel Skinner (Feb 11 2026 at 16:14):

heaptrack helped identify the issue in this case


Last updated: Feb 28 2026 at 14:05 UTC