Zulip Chat Archive

Stream: lean4

Topic: scan combinator for iterators


cmlsharp (Dec 15 2025 at 18:40):

I noticed there is no Std Iterator combinator for scan (like fold but it accumulates the partial results) so [1,2,3].iter.scan (. + .) 0 |>.toList would be [0,1,3,6]. You can see this in Haskell, Rust for example. I have an implementation that I believe works which I could package into a PR, but I had a bit of trouble precisely determining what i ought to include in the IsPlausibleStep definition, so i would appreciate some feedback/sanity checking if possible. I'm also decently new to Lean.

Implementation here

Paul Reichert (Dec 16 2025 at 08:18):

Hi, @cmlsharp, thanks for your suggestion. I think scan is a welcome addition to the standard library, so I'd be happy to see a PR for this combinator.

In Lean, we can only reason about code that provably terminates*, so while you can execute [1,2,3].iter.scan (. + .) 0 |>.toList, you will not be able to prove anything about the returned list without proving that [1,2,3].iter.scan (. + .) 0 is a finite iterator.

The IsPlausibleStep predicate is needed for proving that your iterator is finite by providing an Std.Iterators.Finite instance. It should simply be as strong as possible. Proving finiteness of it.scan f init given that it is finite (and productiveness given that it is productive), you will usually notice if your IsPlausibleStep predicate is too weak. In your case, I'm optimistic that it's strong enough since you have included all the proofs from the branches done (h, hs, hp) in it.

Before we can add scan to the standard library, the PR should include Std.Iterators.Finite and Std.Iterators.Productive instances for scan and there should be lemmas analogous to Std.Data.Iterators.Lemmas.Combinators.Drop, so that we have verified the basic properties of the iterator. Optionally -- and depending on how much time you are willing to invest -- you could also try implementing a generalization that takes a monadic accumulation function scanM similarly to foldM or mapM.

*) slight oversimplification

cmlsharp (Dec 21 2025 at 08:18):

Thanks for the feedback and the explanation. I believe I have come up with instances for both Productive and Finite (taking a lot of inspiration from the existing combinators)

I will package this up into a PR then. I have not yet implemented a monadic generalization, but I will attempt this as well

cmlsharp (Dec 21 2025 at 15:37):

and there should be lemmas analogous to Std.Data.Iterators.Lemmas.Combinators.Drop, so that we have verified the basic properties of the iterator.

to follow up slightly on this, I think one of the most natural lemmas to prove about this iterator is that it is equivalent to its 'List' version, however, the standard library does not contain List.scanl. There is List.scanl/List.scanr in Batteries however.

Are there any plans to bring List.scanl into the standard library?

Kim Morrison (Dec 21 2025 at 18:33):

I think this would be a good idea. I would want parity between List/Array/Vector. I haven't reviewed the theorems for scanl available in Batteries. Are they complete? Improving these things, either in Batteries or a PR to lean4, would be welcome.

cmlsharp (Dec 21 2025 at 18:50):

The List implementations for scanl/scanr in Batteries are fairly complete wrt lemmas, however there is no implementation for Array.scanl/scanr. There is an implementation for List.Vector with a subset of the lemmas that exist for List.

I'd love to improve the state of Lean's standard library, so if there is interest, I'll happily propose a PR that attempts to bring scan (in iterator/vector/array/list) form into Std. That said if this is more appropriate for Batteries, I can propose it there too. I'm not really sure what the general rules are about what things should go where.

Kim Morrison (Dec 21 2025 at 20:28):

I quickly reviewed the material in Batteries on List.scanl. It seems reasonable, I made a few improvements in https://github.com/leanprover-community/batteries/pull/1569.

Kim Morrison (Dec 21 2025 at 20:31):

I would be happy for this to be extended to cover Array and Vector (not Mathlib's List.Vector), plus iterators. It may actually make sense to initially PR to Batteries, where it's fine to work incrementally (e.g. do Array before Vector, etc). For lean4, I'd like to have evidence that we'll actually get complete coverage before starting to merge things. But I would like this material there, too.

cmlsharp (Dec 21 2025 at 20:36):

Ok, awesome! I appreciate your guidance here. I'll put together some initial PRs!

Seong-Heon Jung (Dec 21 2025 at 23:51):

I have been looking exactly for something like this! Thanks so much for starting work on it.
Considering the user is expected to provide the initial value anyways, I think it would be fine if the iterator doesn't yield this initial value. So,

[1, 2, 3].iter.scan (. + .) 0 |>.toList

would return [1, 3, 6] instead of [0, 1, 3, 6]
This would remove a conditional in the step function, which may have a tangible impact in tight loops.

cmlsharp (Dec 21 2025 at 23:59):

I’m glad you find this useful. I’d be a little hesitant to make this modification to scan itself, because scan is a well-defined combinator with well-defined semantics elsewhere (indeed this is how Rust’s Iterator::scan works, how Haskell’s scanl work and how Batteries’ List.scanl work).

As for performance, I’d be interested in seeing benchmarks here. It’s true that there’s necessarily a branch, but it’s the type of thing the branch predictor _should_ be good at predicting (it’s false and then true forever after).

cmlsharp (Dec 22 2025 at 00:01):

That actually makes me wonder, does lean have cold path hints?

cmlsharp (Dec 22 2025 at 00:38):

Also, I’m not sure how good lean is at compiling chained iterator combinators into simple loops in practice, but this is also the kind of thing that an optimizing compiler should be able to hoist out of the loop via unrolling.

Seong-Heon Jung (Dec 22 2025 at 02:20):

cmlsharp said:

I’m glad you find this useful. I’d be a little hesitant to make this modification to scan itself, because scan is a well-defined combinator with well-defined semantics elsewhere (indeed this is how Rust’s Iterator::scan works, how Haskell’s scanl work and how Batteries’ List.scanl work).

As for performance, I’d be interested in seeing benchmarks here. It’s true that there’s necessarily a branch, but it’s the type of thing the branch predictor _should_ be good at predicting (it’s false and then true forever after).

Here's a slightly modified impl that does not yield the initial element. I only have my windows machine at the moment, and the timing functionality (IO.monoMsNow) seems to be broken for windows, so I cannot provide any reliable benchmark numbers.

Jakub Nowak (Dec 22 2025 at 09:14):

I agree with @cmlsharp that we should use standard behaviour for scan. In Haskell there's scanl that doesn't take initial element, and it works like this:

scanl1 (+) [1, 2, 3] = [1,3,6]

Maybe we could have this?

Alfredo Moreira-Rosa (Dec 22 2025 at 13:39):

implementation is pretty staightforward :

def scanl1 (f : α  α  α) : List α  List α
| [] => []
| x :: xs => scanl f x xs

#guard [1,2,3,4].scanl1 (· + ·) = [1,3,6,10]
#guard ["a","b","c"].scanl1 (· ++ ·) = ["a","ab","abc"]

Seong-Heon Jung (Dec 22 2025 at 15:26):

Alfredo Moreira-Rosa said:

implementation is pretty staightforward :

def scanl1 (f : α  α  α) : List α  List α
| [] => []
| x :: xs => scanl f x xs

#guard [1,2,3,4].scanl1 (· + ·) = [1,3,6,10]
#guard ["a","b","c"].scanl1 (· ++ ·) = ["a","ab","abc"]

The discussion is about scan on Iterators, not lists

Alfredo Moreira-Rosa (Dec 22 2025 at 15:50):

To be fair, it evolved to extending it also for Array and Vector, and i was answering about behaviour exemple just above my comment for scanl1 that also does not exist yet for Lists.

Alfredo Moreira-Rosa (Dec 22 2025 at 22:02):

@cmlsharp here is a version that conforms to current implementation of Map by having f being part of the Scan signature.

Jakub Nowak (Dec 22 2025 at 23:28):

It would make the code simpler if you store @IterM α m β instead of α. You just convert back and forth between the two.

cmlsharp (Dec 23 2025 at 03:05):

Alfredo Moreira-Rosa said:

cmlsharp here is a version that conforms to current implementation of Map by having f being part of the Scan signature.

Thanks! I had made this change locally but I appreciate it nonetheless!

cmlsharp (Dec 23 2025 at 03:06):

Well not this exact change, I still store the iterator internally

cmlsharp (Dec 23 2025 at 03:06):

Is there a reason to store \a instead?

cmlsharp (Dec 23 2025 at 19:47):

Jakub Nowak said:

I agree with cmlsharp that we should use standard behaviour for scan. In Haskell there's scanl that doesn't take initial element, and it works like this:

scanl1 (+) [1, 2, 3] = [1,3,6]

Maybe we could have this?

scanl1 (and foldl1) are useful combinations for sure (for example when your accumulator doesn’t have an identity element), though you need a little more care in designing them because of the possibility of empty structures.

Haskell throws an exception which is not suitable in lean of course (I suppose you could always define a ! variant)

I suppose for list and array you could have it take a proof that the structure is non-empty. For Vector just make the input Vector a (n + 1), and for Iterator I suppose you could define the combinator to implement Iterator only if the inner iterator is productive.

cmlsharp (Dec 23 2025 at 19:48):

(Sorry, bug in the Zulip app caused message to send twice).

Seong-Heon Jung (Dec 25 2025 at 00:54):

cmlsharp said:

scanl1 (and foldl1) are useful combinations for sure (for example when your accumulator doesn’t have an identity element), though you need a little more care in designing them because of the possibility of empty structures.

I see two directions to handle this case. The first is, as you say, requiring a proof of non-emptyness of the input. The other is to place a constraint that the type of the accumulator is inhabited.

Jakub Nowak (Dec 25 2025 at 01:00):

scanl1 (+) [] is just []. Only foldl1 throws an exception for empty list.

Jakub Nowak (Dec 25 2025 at 01:03):

To be in line with other API we could handle this like e.g. List.head. So, we can have foldl1, foldl1?, foldl1!, foldl1D.

cmlsharp (Dec 25 2025 at 02:53):

Jakub Nowak said:

scanl1 (+) [] is just []. Only foldl1 throws an exception for empty list.

Oh right of course good point

cmlsharp (Dec 28 2025 at 03:03):

Ok this is not the iterator combinator per se, but I've made the first work-in-progress PR towards this goal (adding scanlM, scanrM to list and scanl,scanr,scanrM,scanlM to Array). Once this is completed, I plan on doing Iterator and then maybe vector.

https://github.com/leanprover-community/batteries/pull/1581

cmlsharp (Dec 31 2025 at 04:54):

Iterator combinator PR (building off the above)

https://github.com/leanprover-community/batteries/pull/1585

Alfredo Moreira-Rosa (Dec 31 2025 at 08:56):

It's a big PR, maybe to simplify review, it could be divided into 3 distinct ones. One for List, Array, Iterator ?

cmlsharp (Dec 31 2025 at 13:43):

It depends on tbe earlier batteries#1581 PR which adds the List and Array stuff. If that PR gets merged, the lines of code difference should be smaller. Is there a better way to handle PRs that depend on one another like this?

Vlad Tsyrklevich (Dec 31 2025 at 13:51):

I believe github has support for showing dependencies. See e.g. https://github.com/leanprover-community/batteries/pull/815 (also for PRs in not-mathlib, you can prepend the name of the repo e.g. batteries#1581 instead of links)

cmlsharp (Dec 31 2025 at 13:54):

Thanks for both tips!

cmlsharp (Dec 31 2025 at 15:08):

Per a comment on my initial PR, I split these two PRs into 3


Last updated: Feb 28 2026 at 14:05 UTC