Zulip Chat Archive
Stream: lean4
Topic: megaparsec parse from array?
James Gallicchio (Apr 03 2023 at 23:54):
I'm trying out yatima's quite nice port of megaparsec, and I've defined a very clean lexer from String
to Array Token
. But how do I parse over non-String types? The streaming stuff is very complicated and largely undocumented
James Gallicchio (Apr 04 2023 at 00:02):
(Also, the streaming abstraction does seem a bit odd... I get having the chunk type fixed (e.g. String for Char) but is there much reason to also fix the source type? Seems like an odd choice.)
Hanting Zhang (Apr 04 2023 at 04:01):
@cognivore @Ilona Prikule ^
cognivore_ (Apr 04 2023 at 11:36):
Yeah, documentation of Straume is a bit lacking right now, I think that the worst thing about it is that it has a lot of dead code committed
cognivore_ (Apr 04 2023 at 11:39):
But the things that are actually used such as Aka
, Iterable
, Chunk
, ... have documentation. The only used thing that doesn't have documentation is MonadEmit
. Nonetheless, thanks for the feedback, we'll remove dead code and make sure that every module is at least documented.
cognivore_ (Apr 04 2023 at 11:48):
Regarding fixing the stream type, it's not quite fixed, at least not with outParam.
This feature is also completely optional. If you don't care about streaming files straight into your parser, you can just separate reading from a file or a wire and parsing.
If you care, however, the model we implement is buffered reading. To do this, you need to implement MonadEmit from Straume and mention buffer type and stream type in the argument of ParsecT
/ Parsec
.
cognivore_ (Apr 04 2023 at 12:31):
Now regarding your question. If you want to have to be Tokens, and to be Token, then—sadly—you'll have to define your own MonadParsec instance. We currently didn't have a need for non-bijective relationships between Tokens and Token types, so we didn't implement an instance like that yet.
An alternative will be for you to explicitly wrap your Array and your alphas into newtypes ArrayOfTokens
and TokenOfArray
correspondingly.
Now you can define a bijective Iterator
. Indeed, can only be gotten out of and only out of it.
If you pick this strategy, then you won't need to write your own MonadParsec
with different constraints! It'll be enough for you to define Iterable
like here: https://github.com/lurk-lab/straume/blob/main/Straume/Iterator.lean#L45.
After you're done with this, your next step will be to implement Straume
. Since you want to work with Arrays, you won't need the infinite capabilities. So just implement take1
and takeN
from to in here: https://github.com/lurk-lab/straume/blob/main/Straume.lean#L32
Use Chunk
as f
.
Now you will have everything ready for theInstance
to just work :tm: on your wrapper types.
cognivore_ (Apr 04 2023 at 12:37):
Please note that most of the peculiarities in this design stem from the difference in typeclass treatment between Lean and Haskell.
cognivore_ (Apr 04 2023 at 12:49):
Note that we abuse type class features of Lean a lot both with the "tag" type class Itearable.Bijection
and by using @defaultInstance
.
I hope that this helps! If you manage to make it work, send PRs, we'll gladly add ergonomic streaming of arrays (or one done via wrapper) to Megaparsec or Straume.
P.S.
We plan to do performance-related maintenance cycle for the libraries, because they currently choke on inputs that aren't even too large so perhaps we'll add better support for finite streams such as arrays of things and lists of things if you won't manage to implement something you're happy enough to PR.
cognivore_ (Apr 04 2023 at 12:50):
P.P.S.
Feel free to ask follow-up questions on Lurk Lab / Yatima zulip: https://zulip.yatima.io/#narrow/stream/24-yatima-tools/topic/.5BMegaparsec.2Elean.5D.20chat
Me and @Ilona Prikule are there 24/7, whereas here we need to defer to our amazing colleagues to scout for megaparsec questions :)
We'll add the link to the README.
James Gallicchio (Apr 04 2023 at 15:33):
Thank you so much! I’ll hop on the Lurk zulip
Last updated: Dec 20 2023 at 11:08 UTC