Zulip Chat Archive

Stream: general

Topic: How to reason about optimizing IO code?

nrs (Nov 13 2024 at 16:39):

I've been following the cat example in the Lean 4 programming book, and have code that looks like the following:

def bufsize : USize := 20 * 1024
partial def accumulateFromStream (stream : IO.FS.Stream) (accumulator : List String) : IO (List String) := do
  match (<- stream.read bufsize) with
  | .mk #[] => .pure accumulator
  | buf => accumulateFromStream stream (accumulator ++ stdinInputAsListOfLines buf)

This is fine for small text files but I'm trying to parse a text log of an strace of over 3 million lines, and I think that this code is loading the entirety of the accumulator on every loop (but am not fully sure, it is extremely slow however). How could I reason about what parts of my code are imposing a heavier burden on the parsing?

Kyle Miller (Nov 13 2024 at 16:40):

Appending to the end of a List is slow (in the sense that it contributes O(n^2) in a loop like this). You could change it to List (List String) and then flatten it later with docs#List.flatten, or you could change it to an Array String

nrs (Nov 13 2024 at 16:42):

ah great I'll try these out, thank you for the recommendations!

Jannis Limperg (Nov 13 2024 at 16:42):

(or push at the front and reverse in the end, but Array is likely the better data structure here)

Kyle Miller (Nov 13 2024 at 16:43):

Jannis's suggestion could look like doing (stdinInputAsListOfLines buf).reverse ++ accumulator and then doing a List.reverse at the end

Kyle Miller (Nov 13 2024 at 16:43):

The key is the running time of a ++ b is O(a.length).

nrs (Nov 13 2024 at 16:47):

I'll try both of these techniques, thank you very much!

nrs (Nov 13 2024 at 17:09):

execution time went from over 15 minutes to less than 20 seconds, tyvm for tips!!

Andrii Kurdiumov (Nov 14 2024 at 04:51):

Does this still require load everything into memory? Or processing can be done peacemeal?

Last updated: May 02 2025 at 03:31 UTC