Zulip Chat Archive
Stream: std4
Topic: Porting just functionality
Henrik Böving (Apr 28 2023 at 19:50):
I've been thinking for a while about "just" porting functionality from other programming languages' stdlibs like e.g. https://std.rs to std4 without any theorem support etc. whatsoever. Would that be a welcome addition? Quite often when working with Lean I just feel these...minor annoyances when I have to work around something that is there by default in the stdlib in other languages
Mario Carneiro (Apr 28 2023 at 19:57):
Sure, if it is being used as an API guide that sounds fine
Mario Carneiro (Apr 28 2023 at 19:58):
a lot of what is there already is copied pretty directly from the haskell standard library
Mario Carneiro (Apr 28 2023 at 20:00):
of course you might have trouble making the types line up if you pick a language that differs too much. Good luck writing split_at_mut
in lean
Henrik Böving (Apr 28 2023 at 20:09):
ST.Ref I'm coming!
Henrik Böving (Apr 29 2023 at 13:56):
https://github.com/leanprover/std4/pull/125 added a bunch of splitting APIs. Rust can obviously do much better than us with pointer stuff. Do we add that in std4 as well? If so shall I add it right now or do we merge as is for now an fully optimize later?
Henrik Böving (Apr 29 2023 at 13:59):
Sadly I noticed that many of my fav non trivial built-in APIs like the chunks and windows one do of course require an iterator abstraction to be present and simply return structs that are meant to be consumed as iterators. Is there a particular reason we do not have an iterator style API yet?
Mario Carneiro (Apr 29 2023 at 14:12):
you can do ad hoc iterators and implement Stream
for them
Mario Carneiro (Apr 29 2023 at 14:14):
I mean, Stream
and ToStream
are basically the Iterator
and IntoIterator
traits from rust
Mario Carneiro (Apr 29 2023 at 14:14):
and in particular they hook into the do notation for multiple-for
Henrik Böving (Apr 29 2023 at 14:14):
Ah! I did of course only search for Iter and Iterator /o\
Mario Carneiro (Apr 29 2023 at 14:15):
although you can't do a for loop over a stream directly without some partial stuff
Mario Carneiro (Apr 29 2023 at 14:17):
I think it wouldn't be too hard to write a wrapper which does it anyway though
Mario Carneiro (Apr 29 2023 at 14:29):
Actually I take it back, this appears to already work out of the box
structure Numbers where n : Nat
instance : Stream Numbers Nat where
next? | ⟨n⟩ => some (n, ⟨n + 1⟩)
#eval show IO Unit from do
for i in Numbers.mk 0 do
println! i
if i > 100 then
break
Mario Carneiro (Apr 29 2023 at 14:30):
thanks to this instance:
protected partial def Stream.forIn [Stream ρ α] [Monad m] (s : ρ) (b : β) (f : α → β → m (ForInStep β)) : m β := do
let _ : Inhabited (m β) := ⟨pure b⟩
let rec visit (s : ρ) (b : β) : m β := do
match Stream.next? s with
| some (a, s) => match (← f a b) with
| ForInStep.done b => return b
| ForInStep.yield b => visit s b
| none => return b
visit s b
instance (priority := low) [Stream ρ α] : ForIn m ρ α where
forIn := Stream.forIn
Henrik Böving (Apr 29 2023 at 14:33):
As a general architectural question. If I am for example implementing the Windows
iterator/stream. Do we want it to return Array a
or { xs : Array a // xs.size = windowSize }
as to allow provable code while iterating over a Windows
Mario Carneiro (Apr 29 2023 at 14:34):
The second one might be useful but I would give it a non-default name
Mario Carneiro (Apr 29 2023 at 14:34):
like windowsDep
Mario Carneiro (Apr 29 2023 at 14:35):
maybe we need a naming convention for those kind of functions
Henrik Böving (Apr 29 2023 at 14:41):
Hm, actually I don't think I can denote this as a Stream instance right now?
instance : Stream (WindowsDep α) (Subarray α) where
next? := sorry
I don't have a Windows
object to refer to on the outParam of Stream
so I cannot write down the invariant :(
Mario Carneiro (Apr 29 2023 at 14:42):
Not sure exactly what the question is, but WindowsDep
should have windowSize
as a parameter
Mario Carneiro (Apr 29 2023 at 14:43):
do you have a complete example?
Henrik Böving (Apr 29 2023 at 14:44):
Oh you mean it should be indexed by the windowSize instead of just carrying at as parameter, then it works out of course, thank you.
Mario Carneiro (Apr 29 2023 at 14:46):
(terminological note: the parameters of a structure are the things before the where
, and the fields are the things after it)
Mario Carneiro (Apr 29 2023 at 14:47):
inductive types also have indexes, which are like parameters but they can change in recursive references. Structures can't have indexes because they are not recursive
Henrik Böving (Apr 30 2023 at 10:03):
@Mario Carneiro next architectural question. Do we want to imitate the full rust style stream/iterator API with the Map and Zip etc. "Higher order" iterators/streams? in that case do we want to add a FromStream class to imitate collect() behavior? And I am guessing it would also be nice to have things like a functor instance for streams (if that isnt a thing yet, didn't check).
Mario Carneiro (Apr 30 2023 at 10:04):
I've been thinking about a WellFoundedStream
class which would help with making folds and things not trivially partial
Mario Carneiro (Apr 30 2023 at 10:05):
FromStream
would need to handle well foundedness in some way
Mario Carneiro (Apr 30 2023 at 10:09):
also the lack of dot notation for typeclass methods makes it pretty unergonomic unless we take some short name in the root namespace
Mario Carneiro (Apr 30 2023 at 10:11):
Maybe it wouldn't be too bad to have functions like Array.fromStream
, i.e. no FromStream
typeclass but a collection of functions generic over the stream type
Mario Carneiro (Apr 30 2023 at 10:12):
and you need a default / low priority instance for collecting a non-well-founded stream via partial
Henrik Böving (Apr 30 2023 at 10:26):
Mario Carneiro said:
Maybe it wouldn't be too bad to have functions like
Array.fromStream
, i.e. noFromStream
typeclass but a collection of functions generic over the stream type
I have to say I really don't like the way that many of our APIs are currently designed with functions like mapM
forM
etc. fromList
toList
(in general from
and to
functions you also never quite know which of the two are implemented (sometimes both) so you end up needlessly searching) just floating around. I think it would be very nice if we could consolidate at least such basic functionality in typeclasses to make it consistent and the desired API clear.
Mario Carneiro (Apr 30 2023 at 10:27):
Well we could implement the typeclasses but I don't think it would make things easier to use
Mario Carneiro (Apr 30 2023 at 10:28):
at least not without changes to the language
Mario Carneiro (Apr 30 2023 at 10:28):
it would improve discoverability, but aliases would do that too
Mario Carneiro (Apr 30 2023 at 10:29):
and when everything is behind a system of typeclasses it can be hard to find the definitions of things (or useful documentation to go with the definition)
Mario Carneiro (Apr 30 2023 at 10:54):
Here's a demo of the approach:
class WellFoundedStream (σ : Type _) [Stream σ α] : Prop where
wf : WellFounded fun s' s : σ => ∃ a, Stream.next? s = some (a, s')
class FoldM (m : Type _ → Type _) (σ : Type _) (α : outParam (Type _)) where
foldM : σ → (β → α → m β) → β → m β
partial def Stream.foldM [Monad m] [Stream σ α]
(s : σ) (f : β → α → m β) (b : β) : m β :=
match Stream.next? s with
| none => pure b
| some (a, s') => f b a >>= Stream.foldM s' f
instance (priority := low) [Monad m] [ToStream ρ σ] [Stream σ α] : FoldM m ρ α where
foldM s := Stream.foldM (ToStream.toStream s)
def Stream.wfFoldM [Monad m] [Stream σ α] [WellFoundedStream σ]
(s : σ) (f : β → α → m β) (b : β) : m β :=
match h : Stream.next? s with
| none => pure b
| some (a, s') =>
have : ∃ a, next? s = some (a, s') := ⟨_, h⟩
f b a >>= Stream.wfFoldM s' f
termination_by' invImage (·.1) ⟨_, WellFoundedStream.wf⟩
instance [Monad m] [ToStream ρ σ] [Stream σ α] [WellFoundedStream σ] : FoldM m ρ α where
foldM s := Stream.wfFoldM (ToStream.toStream s)
def Array.collect [FoldM Id σ α] (s : σ) : Array α :=
Id.run <| FoldM.foldM s (fun arr a => arr.push a) #[]
instance : WellFoundedStream (List α) where
wf := Subrelation.wf (h₂ := (measure List.length).wf) @fun
| _, _::_, ⟨_, rfl⟩ => Nat.lt_succ_self _
#eval Array.collect [1:10:2] -- #[1, 3, 5, 7, 9]
#eval Array.collect [1, 2] -- #[1, 2]
#reduce Array.collect [1:10:2] -- gets stuck
#reduce Array.collect [1, 2] -- works
Henrik Böving (Apr 30 2023 at 12:07):
Mario Carneiro said:
also the lack of dot notation for typeclass methods makes it pretty unergonomic unless we take some short name in the root namespace
I mean haskell does get through with this approach right? They have stuff like fmap
fold
traverse
etc. all bound at the top level. It could even be written in a way like this in Lean: toStream arr |> map func1 |> filter func2 |> fold func3
If we decide against this I think we could provide some sort of meta programming facility too derive this functionality if possible? That is, given a Stream
instance on a type T
we could provide a macro that automatically figures out functions like T.sMap
, T.sFold
etc. For stream map, stream fold etc. that way we don't collide with existing names in case someone wants to define the functions straight on the type as well but we retain the ability to do do notation like so: toStream arr |>.sMap func1 |>.sFilter func2 |>.sFold func3
This does ease discoverability and does not pollute the global namespace...documentation could theoretically speaking be automatically placed on the functions as well? Seeing the definition of the function would of course be a little harder but since they work the same on each type the definition should not be too interesting? Furthermore it would allow for type specific specializations right? For example if we have a clever idea on how to do sMap
on a Subarray
we could write a custom sMap
there instead of the default variant.
On a more general note regarding the Haskell approach I am also wondering if we do want Haskell style typeclasses like Traversable, Foldable etc.?
Mario Carneiro (Apr 30 2023 at 12:13):
Yeah I was also thinking we could have macros which stick a bunch of functions on the type namespace. You can even do some fancy footwork with declaration ranges if you want to make go to def do the right thing, whatever that is
Mario Carneiro (Apr 30 2023 at 12:15):
I would not give them funny names like sMap
, instead the macro would have enough customization options to avoid name clashes if necessary. The whole point of stamping out all these functions is to produce a convenient interface similar to what you would have done with the manual approach
Mario Carneiro (Apr 30 2023 at 12:17):
On a more general note regarding the Haskell approach I am also wondering if we do want Haskell style typeclasses like Traversable, Foldable etc.?
These things have been discussed at some length already, @James Gallicchio 's LeanColls library is experimenting with these iterator typeclasses. It seems like a bit of a rabbit hole to design these typeclasses well, so I'm focusing on end user outcomes and working backwards to the required typeclasses to make things work.
Mario Carneiro (Apr 30 2023 at 12:18):
see also https://leanprover.zulipchat.com/#narrow/stream/348111-std4/topic/collection.20typeclasses
Alex Keizer (Apr 30 2023 at 13:34):
I think typeclasses also have the benefit of improving consistency. If there is a From
typeclass, it's clear that is probably an idiomatic way to convert between types, while some free-floating fromList
functions don't have that same clear message. For example, there's some types that have fromList
, while others implement conversion as ofList
. Of course, we could pick some naming convention and try to be more consistent, but I feel that typeclasses are a nice way to codify such a convention.
Alex Keizer (Apr 30 2023 at 13:36):
Maybe a mixed approach could be used: use typeclasses for the implementation, then use a macro that re-exports the typeclass functions in the implementors namespace.
James Gallicchio (Apr 30 2023 at 13:37):
RE: putting stuff at toplevel, I think we will almost certainly want all of the collection functions exported to root. One of the experiments of LeanColls was trying to figure out how to group collection functions into typeclasses that allow maximum flexibility (and thus make sense at top-level) without having one class for every collection function (which would make any hierarchy, explicit or otherwise, a mess).
Had mixed success there -- my reference libraries (I looked at Haskell (there's multiple), Rust, Scala, & F*) are all different, and there's no clear best design... :/
Mario Carneiro (Apr 30 2023 at 14:04):
I pushed the stream / collect demo to std4#127, but I don't have time to finish it now so please consider it as up for grabs
Mario Carneiro (Apr 30 2023 at 14:06):
Fun fact: Std.Range
has a stream instance but it is not well founded when the step size is 0. The ForIn
instance arbitrarily stops running the iterator after r.stop
iterations.
Mario Carneiro (Apr 30 2023 at 14:09):
Hot take: step size in ranges was a mistake. I can count on zero hands the number of times I wanted this from rust's range type
Alex Keizer (Apr 30 2023 at 14:40):
Pretty sure Rust's range does have a step size, sort of. It's not exposed directly, but as some iterator wrapper that uses advance_by
or nth
or one of those iterator methods
Mario Carneiro (Apr 30 2023 at 15:41):
there is a step_by iterator adapter, but the range type itself doesn't have a third member
Mario Carneiro (Apr 30 2023 at 15:42):
(also I don't think step_by
takes 0 as an argument)
Sebastian Ullrich (Apr 30 2023 at 15:50):
Mario Carneiro said:
Hot take: step size in ranges was a mistake. I can count on zero hands the number of times I wanted this from rust's range type
https://github.com/leanprover/lean4/issues/1962
Alex Keizer (Apr 30 2023 at 21:16):
Indeed, Rust docs suggest that step_by(0)
will just panic. For Lean, it might be cleaner to define a step size of 0 to actually just be the same as step size 1 (in the same way that division by 0 is defined, even though it is wrong to do), that at least gets rid of the potential source of non-wellfoundedness in otherwise bounded ranges
Mario Carneiro (May 01 2023 at 06:40):
I think a better default behavior for step_by 0 is to return an empty iterator; this is also the suggested behavior in lean4#2126
Mario Carneiro (May 01 2023 at 06:43):
but we don't want to check the step in the iterator next function itself, which suggests that 0 < step
should be part of the data structure, and any fixup behavior is tacked on via smart constructors which either infer that 0 < step
or return a completely different iterator in that case
James Gallicchio (May 02 2023 at 14:38):
it is also way easier to prove things when step size is 1, moving the linear calculation into the body. I think gcc O1 is good enough to generalize the loop variable and remove the multiplications (for any users that do care that much about performance)
James Gallicchio (May 02 2023 at 14:39):
so if we are going to re-visit the ..
notation and Std.Range
I would definitely advocate for not having a step size at all
James Gallicchio (May 02 2023 at 14:41):
I was also always unsatisfied with code in other languages that used non-default step sizes... I think the hard-to-prove-stuff is a code smell/red flag
Sebastian Ullrich (May 02 2023 at 14:49):
It is also so much more modular to have a generic "every nth element" stream combinator instead
James Gallicchio (May 02 2023 at 20:59):
I guess once (if?) we allow more complicated csimp lemmas, we could even have the composition of the two simplify to the multiplication version :)
Henrik Böving (May 02 2023 at 23:33):
The new code generators constant folder is extensible with custom folding rules :eyes:
Henrik Böving (May 06 2023 at 15:58):
Maybe this is interesting as a general data point for how to iterator: https://hal.science/hal-03827702v2/
Last updated: Dec 20 2023 at 11:08 UTC