Zulip Chat Archive

Stream: std4

Topic: Porting just functionality


Henrik Böving (Apr 28 2023 at 19:50):

I've been thinking for a while about "just" porting functionality from other programming languages' stdlibs like e.g. https://std.rs to std4 without any theorem support etc. whatsoever. Would that be a welcome addition? Quite often when working with Lean I just feel these...minor annoyances when I have to work around something that is there by default in the stdlib in other languages

Mario Carneiro (Apr 28 2023 at 19:57):

Sure, if it is being used as an API guide that sounds fine

Mario Carneiro (Apr 28 2023 at 19:58):

a lot of what is there already is copied pretty directly from the haskell standard library

Mario Carneiro (Apr 28 2023 at 20:00):

of course you might have trouble making the types line up if you pick a language that differs too much. Good luck writing split_at_mut in lean

Henrik Böving (Apr 28 2023 at 20:09):

ST.Ref I'm coming!

Henrik Böving (Apr 29 2023 at 13:56):

https://github.com/leanprover/std4/pull/125 added a bunch of splitting APIs. Rust can obviously do much better than us with pointer stuff. Do we add that in std4 as well? If so shall I add it right now or do we merge as is for now an fully optimize later?

Henrik Böving (Apr 29 2023 at 13:59):

Sadly I noticed that many of my fav non trivial built-in APIs like the chunks and windows one do of course require an iterator abstraction to be present and simply return structs that are meant to be consumed as iterators. Is there a particular reason we do not have an iterator style API yet?

Mario Carneiro (Apr 29 2023 at 14:12):

you can do ad hoc iterators and implement Stream for them

Mario Carneiro (Apr 29 2023 at 14:14):

I mean, Stream and ToStream are basically the Iterator and IntoIterator traits from rust

Mario Carneiro (Apr 29 2023 at 14:14):

and in particular they hook into the do notation for multiple-for

Henrik Böving (Apr 29 2023 at 14:14):

Ah! I did of course only search for Iter and Iterator /o\

Mario Carneiro (Apr 29 2023 at 14:15):

although you can't do a for loop over a stream directly without some partial stuff

Mario Carneiro (Apr 29 2023 at 14:17):

I think it wouldn't be too hard to write a wrapper which does it anyway though

Mario Carneiro (Apr 29 2023 at 14:29):

Actually I take it back, this appears to already work out of the box

structure Numbers where n : Nat

instance : Stream Numbers Nat where
  next? | n => some (n, n + 1⟩)

#eval show IO Unit from do
  for i in Numbers.mk 0 do
    println! i
    if i > 100 then
      break

Mario Carneiro (Apr 29 2023 at 14:30):

thanks to this instance:

protected partial def Stream.forIn [Stream ρ α] [Monad m] (s : ρ) (b : β) (f : α  β  m (ForInStep β)) : m β := do
  let _ : Inhabited (m β) := pure b
  let rec visit (s : ρ) (b : β) : m β := do
    match Stream.next? s with
    | some (a, s) => match ( f a b) with
      | ForInStep.done b  => return b
      | ForInStep.yield b => visit s b
    | none => return b
  visit s b

instance (priority := low) [Stream ρ α] : ForIn m ρ α where
  forIn := Stream.forIn

Henrik Böving (Apr 29 2023 at 14:33):

As a general architectural question. If I am for example implementing the Windows iterator/stream. Do we want it to return Array a or { xs : Array a // xs.size = windowSize } as to allow provable code while iterating over a Windows

Mario Carneiro (Apr 29 2023 at 14:34):

The second one might be useful but I would give it a non-default name

Mario Carneiro (Apr 29 2023 at 14:34):

like windowsDep

Mario Carneiro (Apr 29 2023 at 14:35):

maybe we need a naming convention for those kind of functions

Henrik Böving (Apr 29 2023 at 14:41):

Hm, actually I don't think I can denote this as a Stream instance right now?

instance : Stream (WindowsDep α) (Subarray α) where
  next? := sorry

I don't have a Windows object to refer to on the outParam of Stream so I cannot write down the invariant :(

Mario Carneiro (Apr 29 2023 at 14:42):

Not sure exactly what the question is, but WindowsDep should have windowSize as a parameter

Mario Carneiro (Apr 29 2023 at 14:43):

do you have a complete example?

Henrik Böving (Apr 29 2023 at 14:44):

Oh you mean it should be indexed by the windowSize instead of just carrying at as parameter, then it works out of course, thank you.

Mario Carneiro (Apr 29 2023 at 14:46):

(terminological note: the parameters of a structure are the things before the where , and the fields are the things after it)

Mario Carneiro (Apr 29 2023 at 14:47):

inductive types also have indexes, which are like parameters but they can change in recursive references. Structures can't have indexes because they are not recursive

Henrik Böving (Apr 30 2023 at 10:03):

@Mario Carneiro next architectural question. Do we want to imitate the full rust style stream/iterator API with the Map and Zip etc. "Higher order" iterators/streams? in that case do we want to add a FromStream class to imitate collect() behavior? And I am guessing it would also be nice to have things like a functor instance for streams (if that isnt a thing yet, didn't check).

Mario Carneiro (Apr 30 2023 at 10:04):

I've been thinking about a WellFoundedStream class which would help with making folds and things not trivially partial

Mario Carneiro (Apr 30 2023 at 10:05):

FromStream would need to handle well foundedness in some way

Mario Carneiro (Apr 30 2023 at 10:09):

also the lack of dot notation for typeclass methods makes it pretty unergonomic unless we take some short name in the root namespace

Mario Carneiro (Apr 30 2023 at 10:11):

Maybe it wouldn't be too bad to have functions like Array.fromStream, i.e. no FromStream typeclass but a collection of functions generic over the stream type

Mario Carneiro (Apr 30 2023 at 10:12):

and you need a default / low priority instance for collecting a non-well-founded stream via partial

Henrik Böving (Apr 30 2023 at 10:26):

Mario Carneiro said:

Maybe it wouldn't be too bad to have functions like Array.fromStream, i.e. no FromStream typeclass but a collection of functions generic over the stream type

I have to say I really don't like the way that many of our APIs are currently designed with functions like mapM forM etc. fromList toList (in general from and to functions you also never quite know which of the two are implemented (sometimes both) so you end up needlessly searching) just floating around. I think it would be very nice if we could consolidate at least such basic functionality in typeclasses to make it consistent and the desired API clear.

Mario Carneiro (Apr 30 2023 at 10:27):

Well we could implement the typeclasses but I don't think it would make things easier to use

Mario Carneiro (Apr 30 2023 at 10:28):

at least not without changes to the language

Mario Carneiro (Apr 30 2023 at 10:28):

it would improve discoverability, but aliases would do that too

Mario Carneiro (Apr 30 2023 at 10:29):

and when everything is behind a system of typeclasses it can be hard to find the definitions of things (or useful documentation to go with the definition)

Mario Carneiro (Apr 30 2023 at 10:54):

Here's a demo of the approach:

class WellFoundedStream (σ : Type _) [Stream σ α] : Prop where
  wf : WellFounded fun s' s : σ =>  a, Stream.next? s = some (a, s')

class FoldM (m : Type _  Type _) (σ : Type _) (α : outParam (Type _)) where
  foldM : σ  (β  α  m β)  β  m β

partial def Stream.foldM [Monad m] [Stream σ α]
    (s : σ) (f : β  α  m β) (b : β) : m β :=
  match Stream.next? s with
  | none => pure b
  | some (a, s') => f b a >>= Stream.foldM s' f

instance (priority := low) [Monad m] [ToStream ρ σ] [Stream σ α] : FoldM m ρ α where
  foldM s := Stream.foldM (ToStream.toStream s)

def Stream.wfFoldM [Monad m] [Stream σ α] [WellFoundedStream σ]
    (s : σ) (f : β  α  m β) (b : β) : m β :=
  match h : Stream.next? s with
  | none => pure b
  | some (a, s') =>
    have :  a, next? s = some (a, s') := _, h
    f b a >>= Stream.wfFoldM s' f
termination_by' invImage (·.1) _, WellFoundedStream.wf

instance [Monad m] [ToStream ρ σ] [Stream σ α] [WellFoundedStream σ] : FoldM m ρ α where
  foldM s := Stream.wfFoldM (ToStream.toStream s)

def Array.collect [FoldM Id σ α] (s : σ) : Array α :=
  Id.run <| FoldM.foldM s (fun arr a => arr.push a) #[]

instance : WellFoundedStream (List α) where
  wf := Subrelation.wf (h₂ := (measure List.length).wf) @fun
    | _, _::_, _, rfl => Nat.lt_succ_self _

#eval Array.collect [1:10:2] -- #[1, 3, 5, 7, 9]
#eval Array.collect [1, 2] -- #[1, 2]
#reduce Array.collect [1:10:2] -- gets stuck
#reduce Array.collect [1, 2] -- works

Henrik Böving (Apr 30 2023 at 12:07):

Mario Carneiro said:

also the lack of dot notation for typeclass methods makes it pretty unergonomic unless we take some short name in the root namespace

I mean haskell does get through with this approach right? They have stuff like fmap fold traverse etc. all bound at the top level. It could even be written in a way like this in Lean: toStream arr |> map func1 |> filter func2 |> fold func3

If we decide against this I think we could provide some sort of meta programming facility too derive this functionality if possible? That is, given a Stream instance on a type T we could provide a macro that automatically figures out functions like T.sMap, T.sFold etc. For stream map, stream fold etc. that way we don't collide with existing names in case someone wants to define the functions straight on the type as well but we retain the ability to do do notation like so: toStream arr |>.sMap func1 |>.sFilter func2 |>.sFold func3

This does ease discoverability and does not pollute the global namespace...documentation could theoretically speaking be automatically placed on the functions as well? Seeing the definition of the function would of course be a little harder but since they work the same on each type the definition should not be too interesting? Furthermore it would allow for type specific specializations right? For example if we have a clever idea on how to do sMap on a Subarray we could write a custom sMap there instead of the default variant.

On a more general note regarding the Haskell approach I am also wondering if we do want Haskell style typeclasses like Traversable, Foldable etc.?

Mario Carneiro (Apr 30 2023 at 12:13):

Yeah I was also thinking we could have macros which stick a bunch of functions on the type namespace. You can even do some fancy footwork with declaration ranges if you want to make go to def do the right thing, whatever that is

Mario Carneiro (Apr 30 2023 at 12:15):

I would not give them funny names like sMap, instead the macro would have enough customization options to avoid name clashes if necessary. The whole point of stamping out all these functions is to produce a convenient interface similar to what you would have done with the manual approach

Mario Carneiro (Apr 30 2023 at 12:17):

On a more general note regarding the Haskell approach I am also wondering if we do want Haskell style typeclasses like Traversable, Foldable etc.?

These things have been discussed at some length already, @James Gallicchio 's LeanColls library is experimenting with these iterator typeclasses. It seems like a bit of a rabbit hole to design these typeclasses well, so I'm focusing on end user outcomes and working backwards to the required typeclasses to make things work.

Mario Carneiro (Apr 30 2023 at 12:18):

see also https://leanprover.zulipchat.com/#narrow/stream/348111-std4/topic/collection.20typeclasses

Alex Keizer (Apr 30 2023 at 13:34):

I think typeclasses also have the benefit of improving consistency. If there is a From typeclass, it's clear that is probably an idiomatic way to convert between types, while some free-floating fromList functions don't have that same clear message. For example, there's some types that have fromList, while others implement conversion as ofList. Of course, we could pick some naming convention and try to be more consistent, but I feel that typeclasses are a nice way to codify such a convention.

Alex Keizer (Apr 30 2023 at 13:36):

Maybe a mixed approach could be used: use typeclasses for the implementation, then use a macro that re-exports the typeclass functions in the implementors namespace.

James Gallicchio (Apr 30 2023 at 13:37):

RE: putting stuff at toplevel, I think we will almost certainly want all of the collection functions exported to root. One of the experiments of LeanColls was trying to figure out how to group collection functions into typeclasses that allow maximum flexibility (and thus make sense at top-level) without having one class for every collection function (which would make any hierarchy, explicit or otherwise, a mess).

Had mixed success there -- my reference libraries (I looked at Haskell (there's multiple), Rust, Scala, & F*) are all different, and there's no clear best design... :/

Mario Carneiro (Apr 30 2023 at 14:04):

I pushed the stream / collect demo to std4#127, but I don't have time to finish it now so please consider it as up for grabs

Mario Carneiro (Apr 30 2023 at 14:06):

Fun fact: Std.Range has a stream instance but it is not well founded when the step size is 0. The ForIn instance arbitrarily stops running the iterator after r.stop iterations.

Mario Carneiro (Apr 30 2023 at 14:09):

Hot take: step size in ranges was a mistake. I can count on zero hands the number of times I wanted this from rust's range type

Alex Keizer (Apr 30 2023 at 14:40):

Pretty sure Rust's range does have a step size, sort of. It's not exposed directly, but as some iterator wrapper that uses advance_by or nth or one of those iterator methods

Mario Carneiro (Apr 30 2023 at 15:41):

there is a step_by iterator adapter, but the range type itself doesn't have a third member

Mario Carneiro (Apr 30 2023 at 15:42):

(also I don't think step_by takes 0 as an argument)

Sebastian Ullrich (Apr 30 2023 at 15:50):

Mario Carneiro said:

Hot take: step size in ranges was a mistake. I can count on zero hands the number of times I wanted this from rust's range type

https://github.com/leanprover/lean4/issues/1962

Alex Keizer (Apr 30 2023 at 21:16):

Indeed, Rust docs suggest that step_by(0) will just panic. For Lean, it might be cleaner to define a step size of 0 to actually just be the same as step size 1 (in the same way that division by 0 is defined, even though it is wrong to do), that at least gets rid of the potential source of non-wellfoundedness in otherwise bounded ranges

Mario Carneiro (May 01 2023 at 06:40):

I think a better default behavior for step_by 0 is to return an empty iterator; this is also the suggested behavior in lean4#2126

Mario Carneiro (May 01 2023 at 06:43):

but we don't want to check the step in the iterator next function itself, which suggests that 0 < step should be part of the data structure, and any fixup behavior is tacked on via smart constructors which either infer that 0 < step or return a completely different iterator in that case

James Gallicchio (May 02 2023 at 14:38):

it is also way easier to prove things when step size is 1, moving the linear calculation into the body. I think gcc O1 is good enough to generalize the loop variable and remove the multiplications (for any users that do care that much about performance)

James Gallicchio (May 02 2023 at 14:39):

so if we are going to re-visit the .. notation and Std.Range I would definitely advocate for not having a step size at all

James Gallicchio (May 02 2023 at 14:41):

I was also always unsatisfied with code in other languages that used non-default step sizes... I think the hard-to-prove-stuff is a code smell/red flag

Sebastian Ullrich (May 02 2023 at 14:49):

It is also so much more modular to have a generic "every nth element" stream combinator instead

James Gallicchio (May 02 2023 at 20:59):

I guess once (if?) we allow more complicated csimp lemmas, we could even have the composition of the two simplify to the multiplication version :)

Henrik Böving (May 02 2023 at 23:33):

The new code generators constant folder is extensible with custom folding rules :eyes:

Henrik Böving (May 06 2023 at 15:58):

Maybe this is interesting as a general data point for how to iterator: https://hal.science/hal-03827702v2/


Last updated: Dec 20 2023 at 11:08 UTC