Zulip Chat Archive
Stream: general
Topic: How to reason about optimizing IO code?
nrs (Nov 13 2024 at 16:39):
I've been following the cat
example in the Lean 4 programming book, and have code that looks like the following:
def bufsize : USize := 20 * 1024
partial def accumulateFromStream (stream : IO.FS.Stream) (accumulator : List String) : IO (List String) := do
match (<- stream.read bufsize) with
| .mk #[] => .pure accumulator
| buf => accumulateFromStream stream (accumulator ++ stdinInputAsListOfLines buf)
This is fine for small text files but I'm trying to parse a text log of an strace of over 3 million lines, and I think that this code is loading the entirety of the accumulator on every loop (but am not fully sure, it is extremely slow however). How could I reason about what parts of my code are imposing a heavier burden on the parsing?
Kyle Miller (Nov 13 2024 at 16:40):
Appending to the end of a List is slow (in the sense that it contributes O(n^2) in a loop like this). You could change it to List (List String)
and then flatten it later with docs#List.flatten, or you could change it to an Array String
nrs (Nov 13 2024 at 16:42):
ah great I'll try these out, thank you for the recommendations!
Jannis Limperg (Nov 13 2024 at 16:42):
(or push at the front and reverse
in the end, but Array
is likely the better data structure here)
Kyle Miller (Nov 13 2024 at 16:43):
Jannis's suggestion could look like doing (stdinInputAsListOfLines buf).reverse ++ accumulator
and then doing a List.reverse at the end
Kyle Miller (Nov 13 2024 at 16:43):
The key is the running time of a ++ b
is O(a.length)
.
nrs (Nov 13 2024 at 16:47):
I'll try both of these techniques, thank you very much!
nrs (Nov 13 2024 at 17:09):
execution time went from over 15 minutes to less than 20 seconds, tyvm for tips!!
Andrii Kurdiumov (Nov 14 2024 at 04:51):
Does this still require load everything into memory? Or processing can be done peacemeal?
Last updated: May 02 2025 at 03:31 UTC