Zulip Chat Archive

Stream: lean4

Topic: Iterator Library: Request for Feedback


Paul Reichert (Apr 17 2025 at 14:08):

Fellow Lean users,

I am currently developing an iterator framework for the Lean standard library. After three weeks of work, I have a first draft and would be glad to find out how you like it.

The latest working version can be found in this repository. It's not part of the lean4 repository yet and you should be able to just clone and open the project like any other.

What are iterators and why do we need them?

Let's answer by example first. Iterators allow you to obtain efficient code from expressions like this:

def hideEggs : IO Unit := do
  -- Repeat colors and locations indefinitely
  let colors := Iter.unfold Id (0 : Nat) (· + 1) |>.map fun n =>
    #["green", "red", "yellow"][n % 3]!
  let locations := Iter.unfold Id (0 : Nat) (· + 1) |>.map fun n =>
    #["in a boot", "underneath a lampshade", "under the cushion", "in the lawn"][n % 4]!
  -- Summon the chickens
  let chickens := ["Clucky", "Patches", "Fluffy"].iter Id
  -- Gather, color and hide the eggs
  let eggs := chickens.flatMap (fun x : String => Iter.unfold Id x id |>.take 3)
    |>.zip colors
    |>.zip locations
  -- Report the results (top secret)
  let eggsIO := eggs.switchMonad (pure :  {γ}, γ  IO γ) -- Obtain an IO-monadic iterator
  for x in eggsIO do
    IO.println s! "{x.1.1} hides a {x.1.2} egg {x.2}."
  -- Alternative : eggsIO.mapM (fun x => IO.println s!"{x.1.1} hides a {x.1.2} egg {x.2}.") |>.drain

/--
info: Clucky hides a green egg in a boot.
Clucky hides a red egg underneath a lampshade.
Clucky hides a yellow egg under the cushion.
Patches hides a green egg in the lawn.
Patches hides a red egg in a boot.
Patches hides a yellow egg underneath a lampshade.
Fluffy hides a green egg under the cushion.
Fluffy hides a red egg in the lawn.
Fluffy hides a yellow egg in a boot.
-/
#guard_msgs in
#eval hideEggs

It's also possible to manually iterate:

def deepSum (l : List (List Nat)) : Nat :=
  go (l.iter Id |>.flatMap fun l' => l'.iter Id) 0
where
  @[specialize]
  go it acc :=
    match it.step with
    | .yield it' n _ => go it' (acc + n)
    | .skip it' _ => go it' acc
    | .done _ => acc
  termination_by it.terminationByFinite

/-- info: 10 -/
#guard_msgs in
#eval deepSum [[1, 2], [3, 4]]

Iterators are objects that allow you to step through a sequence of values, either manually or by calling some predefined consumer. For example, List.iter converts a list into an iterator over its elements. Iterators essentially consist of a single next or step function that potentially provides an output value and a successor iterator. They exist in most programming languages like Java (Streams API), Rust (Iter trait), Python (itertools) or Haskell (streamly package).

Iterators sometimes help to concisely write down complicated loops. This is especially interesting in Lean where it can be tedious to prove the termination of such a loop. The iterator library automatically keeps track of the termination behavior so that you usually do not need to provide an explicit termination proof.

Thanks for your time and feedback! Let's make it a really awesome iterator library while we can still work on the fundamentals without a lot of breakage.

Objectives of the library

It should be fast (comparable to Rust) and easy to use. It should work in a monadic setting and -- if users are ready to accept the additional complexity -- allow changing the monad and even the universe level. Verification should be possible in a reasonable amount of time.

But most importantly, it should be useful in real-world situations.

What I'd like to know from you

I'm interested in basically all your thoughts about the current state iterator API!

  • Do you wish for some functionality or feature that's still missing?
  • Is something more tedious than you'd like?
  • Is some part of the API hard to understand? Iterators should be as easy as in other languages (except if you start doing crazy stuff with universes or multiple different monads...).
  • Is there something you especially like?
  • Do you have a compelling use case that would be valuable for me to keep in mind?
  • ...

What's known to be deficient

  • Lemmas are planned but currently nonexistent except for termination proofs.
  • The library has been carefully designed for the possibility of switching monads and universes but we're still missing an iterator combinator that actually allows switching the universe.
  • There are still occasionally problems and bad elaborator error messages. Keeping track of the complicated underlying types of the iterators is difficult for Lean right now. Nevertheless, please report such situations to me so I can work on improving the situation.
  • It's still a work in progress, so the API can still change heavily in the next weeks.
  • There's no string iterator yet. I'm planning to implement one and replace the legacy string iterator in the standard library.
  • The iterators aren't producing highly efficient code yet. While there are things I can do in the library, it also partly depends on the compiler, which is undergoing some heavy (positive) changes right now. This is still a work in progress.
  • I have heard the wish for groupby, which is a combinator in Python's itertools. Unfortunately, I don't really know yet how to make that one in a functional-but-in-place language. I guess it will somehow involve Thunk to avoid duplication. If you, too, would like to see this combinator, feel free to just react with :otter: .

Patrick Massot (Apr 17 2025 at 16:10):

Thanks! My feedback is that this code is very hard to read for me. I have a very hard time guessing what it’s meant to express. I could not have predicted the outcome of #eval hideEggs from the function names. Maybe there could be macros turning this into code that is easy to understand?

Paul Reichert (Apr 17 2025 at 21:04):

Thanks, Patrick! Do you find both examples hard to read? I'd also be curious about what you think about readability compared to other iterator libraries if you have used one of those before (Rust, Java, Python, ...) or what they might be doing better.

I'm not sure about the necessity of macros yet. What would you hope macros can do that we can't do without them? I think lot can be achieved with convenience library functions that just don't exist yet. Without macros, just with better library support, the complex example might look as follows:

def hideEggs : IO Unit := do
  -- Repeat colors and locations indefinitely
  let colors := ["green", "red", "yellow"].iter.repeatForever
  let locations := ["in a boot", "underneath a lampshade", "under the cushion", "in the lawn"].iter.repeatForever
  -- Summon the chickens
  let chickens := ["Clucky", "Patches", "Fluffy"].iter
  -- Gather, color and hide the eggs
  let eggs := Iter.combine
    (chickens.flatMap (fun chicken  => [chicken].iter.repeat 3), colors, locations)
  -- Report the results (top secret)
  for (chicken, color, location) in eggs do
    IO.println s!"{chicken} hides a {color} egg {location}."

(Just a rough sketch, those function names should be consistent with existing ones for List etc.)

I should say that I basically crammed all of the interesting combinators we currently have into one example, which might not be the best way to produce readable code in the first place. I think I'll add some more real-world examples on Tuesday when I'm back at work.

Tomas Skrivan (Apr 17 2025 at 23:53):

Nice! I find the code quite readable. A bit confused by Iter.unfold Id 0 (. + 1) but from the comment I got the message that you want to start at zero and progress by adding one. The really confusing part is chickens.flatMap (fun x : String => Iter.unfold Id x id |>.take 3) in particular the function name take, I would expect repeat or replicate. However, I understand that this is a low level API, the envisioned higher level one is perfectly clear to me.

(Ohh I get it now, you create an infinite stream of identical strings and you take the first three)

I'm curious, what is the future of the class Stream. Is it going to be deprecated and subsumed by this? Or is there a place for both?

These are internal iterators, right? How do they compare performance wise to external iterators like ForIn for Std.Range? I understand that it is not optimized yet, so at least what is the expected performance difference? The last time I tried I had a really hard time getting the internal iterators fast.

Paul Reichert (Apr 18 2025 at 06:39):

Tomas Skrivan said:

These are internal iterators, right? How do they compare performance wise to external iterators like ForIn for Std.Range? I understand that it is not optimized yet, so at least what is the expected performance difference? The last time I tried I had a really hard time getting the internal iterators fast.

Which internal iterators have you tried in the past? As far as I understand the terminology, internal iterators manage the control flow themselves while external iterators allow the user to handle the control flow, so where would the performance penalties of internal iterators occur?

Fundamentally these iterators are external iterators. The heart of each iterator is its step function so that users can control the control flow themselves if they want. On top of this, there is an API of consumers such as toList, toArray, count etc. that allow you to use the iterators as internal iterators, deferring the control flow to some library function.

There should be no difference in doing the following things:

  • computing the sum of elements of l in a structurally recursive loop
  • computing the sum using ForIn to iterate over l
  • using ForIn to iterate over the iterator l.iter
  • using some special "internal iterator" API function, perhaps fold or even just sum.

The performance of the iterator library depends primarily on aggressive inlining in order to enable simple optimizations, which is intended to finally produce code that is just a plain loop or tail-recursive function, so there should be no overhead to using iterators here ("stream fusion").

The principle is the same with more complex combinators such as take or flatMap. The idea (aggressive inlining produces efficient code) is the same but the compiler needs to do more to obtain an efficient result, which it currently doesn't. (In the flatMap case, we would naturally write two nested loops. This is obviously impossible when the user manually iterates over the iterator in a single loop, so there might be some performance penalty to that, too.)

However, I still have the slight feeling that we're using the term "internal iterator" in different ways... Please clarify if I missed your point. If you have a concrete example in mind, we can also take a look at that one if you want.

Paul Reichert (Apr 18 2025 at 06:46):

Tomas Skrivan said:

I'm curious, what is the future of the class Stream. Is it going to be deprecated and subsumed by this? Or is there a place for both?

I think that the new iterators will be able to do everything the current Stream class can, so I do think they will subsume Stream.

Paul Reichert (Apr 18 2025 at 06:50):

Tomas Skrivan said:

A bit confused by Iter.unfold Id 0 (. + 1) but from the comment I got the message that you want to start at zero and progress by adding one. [...] (Ohh I get it now, you create an infinite stream of identical strings and you take the first three)

Yes, I agree this is very confusing, I should make the naming more clear and create convenience functions. :smile:

Jovan Gerbscheid (Apr 18 2025 at 08:01):

Maybe Iter.unfold Id 0 (. + 1) could be abbreviated to [0..] like in Haskell :smiley:

Edit: technically, that's notation for an infinite List Nat in Haskell, but this can only be modelled using an iterator in Lean.

Michael Rothgang (Apr 18 2025 at 08:28):

My two cents: I found Iter.unfold Id 0 (. + 1) very hard to parse. The version with repeatForever was much nicer and almost immediately clear.

Patrick Massot (Apr 18 2025 at 09:30):

Paul Reichert said:

Thanks, Patrick! Do you find both examples hard to read? I'd also be curious about what you think about readability compared to other iterator libraries if you have used one of those before (Rust, Java, Python, ...) or what they might be doing better.

The new version (with currently non-existing convenience functions) is infinitely easier to read for me. I mentioned macros because I expected that the absence of convenience function meant they were hard to write for some reason, and maybe macros could hep.

Eric Wieser (Apr 18 2025 at 11:19):

Why do many of the initial functions take an explicit monad argument?

Eric Wieser (Apr 18 2025 at 11:22):

Is the idea to avoid the map/mapM duplication present elsewhere in Lean? I think for something very programmatic like iterators it would be fine to just always assume a monad context

Eric Wieser (Apr 18 2025 at 11:22):

Id.run can always be used on the outside to use the API in a pure context

Paul Reichert (Apr 18 2025 at 12:11):

The explicit monad argument is a rough edge that I want to smoothen. Actually, it's an autoParam, but for some reason the elaborator takes issue with me not explicitly supplying it (see this bug).

I will try out what the elaborator thinks of having it as an implicit argument. Having iter and iterM is definitely something I consider, too.

Eric Wieser (Apr 18 2025 at 12:13):

I think it should just be implicit, and you can probably have only the iterM version (and of course drop the M from the name)

Paul Reichert (Apr 18 2025 at 12:14):

Jovan Gerbscheid said:

Maybe Iter.unfold Id 0 (. + 1) could be abbreviated to [0..] like in Haskell :smiley:

Edit: technically, that's notation for an infinite List Nat in Haskell, but this can only be modelled using an iterator in Lean.

We also have Std.Range with some notation in Lean -- making that more polymorphic is another topic I'll take on in a few weeks. It certainly makes sense to integrate it well with the iterator library.

Jovan Gerbscheid (Apr 18 2025 at 12:25):

Another issue is the direction in which to loop through an Array. I find it a bit annoying that there is only a ForIn instance for looping from the beginning of an array. The same would apply for the iterator of an Array. Maybe there should be Array.iterr and Array.iterl?

edit: or maybe the names should be Array.iter and Array.iterRev. Then the same can be used for TreeMap.

Paul Reichert (Apr 18 2025 at 17:46):

Yes, good idea! Speaking of going backwards, I'm still trying to figure out what it would mean for a monadic iterator to go back to a previous step. We certainly can't unwind the effects that have already been caused, but do we expect them to be executed a second time when we step forward again? Or is this up to the iterator?

The background is that the already-existing String iterator needs a way of going a step back and I'd like to migrate that one to the new framework, too. The cheap version would be to introduce the concept of going back only for Id iterators until we have a good idea for monadic iterators.

Jovan Gerbscheid (Apr 18 2025 at 19:32):

Yeah, unless there is a use case for iterators that can go backwards in a monadically interesting way, we can just stick to iterators with pure monadic actions. Are there more examples than String and Array that want to be able to go backwards? I guess any iterator can be forced to remember its results by using a List, and then you could go backwards in that.

On the topic of monadic effects, my first reaction is that it would be nice for the default to be monad-free (i.e. using Id). Thus also requiring an iterM version.

Eric Wieser (Apr 18 2025 at 19:42):

for notation only works in a monad, right?

Eric Wieser (Apr 18 2025 at 19:43):

Paul Reichert said:

Yes, good idea! Speaking of going backwards, I'm still trying to figure out what it would mean for a monadic iterator to go back to a previous step.

I think Python handled this by allowing each iterator to specify what it means to be reversed; that could be a typeclass in Lean

Paul Reichert (Apr 19 2025 at 20:44):

Eric Wieser said:

for notation only works in a monad, right?

Yes, I think so -- which message were you reacting to? I'm sorry, I think I'm missing the connection right now. :)

Jovan Gerbscheid (Apr 19 2025 at 21:41):

I think Erik might be arguing about the question of whether the iterators should be monadic by default. And since for loops are always monadic, so should the iterators. But I don't quite agree with this, because I think there are many situations where Iterators can be used non-monadically even when working in a monad, because the iteration itself is a non-monadic action. In fact I'm not even convinced we need the monads at all in iterators.

James Sully (May 01 2025 at 06:43):

Exciting stuff. I'm trying to try this out, but after cloning and opening the examples file to start playing around, I'm getting dsimp made no progress on lines 125 and 128 of Iterator.Basic. Revision is current main (db951c65a5d6fd5d2322d269e8c27adbbd853a75)

error: ././././Iterator/Basic.lean:125:6: dsimp made no progress
error: ././././Iterator/Basic.lean:128:6: dsimp made no progress

Paul Reichert (May 01 2025 at 07:15):

I should have linked to a stable version above. Currently, a fundamental refactoring is underway and main is still broken. I'll update the link in a second. (The refactoring doesn't affect the user-facing API a lot, so the older version should do for now)

James Sully (May 01 2025 at 07:18):

last working commit for me is 67d43c28

Paul Reichert (May 01 2025 at 07:20):

That one probably works, too. I guess it's time for me to create a branch with a more stable version

Update: The main branch is the running snapshot now.

James Sully (May 01 2025 at 07:57):

My experience with the api is that I'm running into

typeclass instance problem is stuck, it is often due to metavariables
  ComputableSmall ?m.48

frequently, and I don't know how to resolve it. For example

-- this is fine
def count := Iter.unfold Id 0 Nat.succ

-- error on the return annotation
def count2 : Iter Id Nat := Iter.unfold Id 0 Nat.succ

Paul Reichert (May 01 2025 at 11:10):

Not a solution yet, but here are a few explanations what is happening here. Please forgive the wall of text. I agree that things need to be more reliable when the library is released, so you don't need to think about these internals.

  • The iterators have a similar design like in Rust. That means that the compiler will look at the type of an iterator and will, hopefully, generate very efficient code for it. Your iterator in fact has a more complicated type than it seems: Iter Id Nat has an implicit parameter, in your case the type UnfoldIterator Nat.succ. This type can be quite complicated, so I made it implicit. Internally, the iterator wraps an object of that type that encodes its current state. Because of universe complications, we need this state to be "small" enough to fit into the monad of the iterator.
  • These error messages sometimes hide an other problem that led to Lean not inferring the type. The example you gave needs to work, however.
  • In the meantime, I've moved away from ComputableSmall, but the new solution isn't ready yet. This will help with examples like yours. I'll keep you up-to-date when the new design is ready.
  • Still, the iterators heavily rely on type inference and typeclass resolution, which is scary but important for efficiency.

Anyways, thanks for testing! Don't hesitate to report other problems.

James Sully (May 01 2025 at 12:52):

Good to know, thanks! I'm looking forward to seeing how this evolves. I'm a big fan of both rust and streamly.


Last updated: May 02 2025 at 03:31 UTC