Zulip Chat Archive

Stream: lean4

Topic: megaparsec parse from array?


James Gallicchio (Apr 03 2023 at 23:54):

I'm trying out yatima's quite nice port of megaparsec, and I've defined a very clean lexer from String to Array Token. But how do I parse over non-String types? The streaming stuff is very complicated and largely undocumented

James Gallicchio (Apr 04 2023 at 00:02):

(Also, the streaming abstraction does seem a bit odd... I get having the chunk type fixed (e.g. String for Char) but is there much reason to also fix the source type? Seems like an odd choice.)

Hanting Zhang (Apr 04 2023 at 04:01):

@cognivore @Ilona Prikule ^

cognivore_ (Apr 04 2023 at 11:36):

Yeah, documentation of Straume is a bit lacking right now, I think that the worst thing about it is that it has a lot of dead code committed

cognivore_ (Apr 04 2023 at 11:39):

But the things that are actually used such as Aka, Iterable, Chunk, ... have documentation. The only used thing that doesn't have documentation is MonadEmit. Nonetheless, thanks for the feedback, we'll remove dead code and make sure that every module is at least documented.

cognivore_ (Apr 04 2023 at 11:48):

Regarding fixing the stream type, it's not quite fixed, at least not with outParam.

This feature is also completely optional. If you don't care about streaming files straight into your parser, you can just separate reading from a file or a wire and parsing.

If you care, however, the model we implement is buffered reading. To do this, you need to implement MonadEmit from Straume and mention buffer type and stream type in the \wp argument of ParsecT / Parsec.

cognivore_ (Apr 04 2023 at 12:31):

Now regarding your question. If you want to have Array α \text{Array} \ \alpha to be Tokens, and α \alpha to be Token, then—sadly—you'll have to define your own MonadParsec instance. We currently didn't have a need for non-bijective relationships between Tokens and Token types, so we didn't implement an instance like that yet.

An alternative will be for you to explicitly wrap your Array and your alphas into newtypes ArrayOfTokens and TokenOfArray correspondingly.

Now you can define a bijective Iterator. Indeed, TokenOfArray α \text{TokenOfArray} \ \alpha can only be gotten out of ArrayOfTokens (Array α)\text{ArrayOfTokens} \ (\text{Array} \ \alpha) and only out of it.

If you pick this strategy, then you won't need to write your own MonadParsec with different constraints! It'll be enough for you to define Iterable like here: https://github.com/lurk-lab/straume/blob/main/Straume/Iterator.lean#L45.

After you're done with this, your next step will be to implement Straume. Since you want to work with Arrays, you won't need the infinite capabilities. So just implement take1 and takeN from Array α \text{Array} \ \alpha to Array α \text{Array} \ \alpha in here: https://github.com/lurk-lab/straume/blob/main/Straume.lean#L32

Use Chunk as f.

Now you will have everything ready for theInstance to just work :tm: on your wrapper types.

cognivore_ (Apr 04 2023 at 12:37):

Please note that most of the peculiarities in this design stem from the difference in typeclass treatment between Lean and Haskell.

cognivore_ (Apr 04 2023 at 12:49):

Note that we abuse type class features of Lean a lot both with the "tag" type class Itearable.Bijection and by using @defaultInstance.

I hope that this helps! If you manage to make it work, send PRs, we'll gladly add ergonomic streaming of arrays (or one done via wrapper) to Megaparsec or Straume.

P.S.

We plan to do performance-related maintenance cycle for the libraries, because they currently choke on inputs that aren't even too large so perhaps we'll add better support for finite streams such as arrays of things and lists of things if you won't manage to implement something you're happy enough to PR.

cognivore_ (Apr 04 2023 at 12:50):

P.P.S.

Feel free to ask follow-up questions on Lurk Lab / Yatima zulip: https://zulip.yatima.io/#narrow/stream/24-yatima-tools/topic/.5BMegaparsec.2Elean.5D.20chat

Me and @Ilona Prikule are there 24/7, whereas here we need to defer to our amazing colleagues to scout for megaparsec questions :)

We'll add the link to the README.

James Gallicchio (Apr 04 2023 at 15:33):

Thank you so much! I’ll hop on the Lurk zulip


Last updated: Dec 20 2023 at 11:08 UTC