Zulip Chat Archive

Stream: lean4

Topic: Parsec


Greg B (Mar 09 2023 at 18:58):

Hi. I am new here. I wanted to add some more combinators to Lean.Data.Parsec (Eg, sepBy and sepBy1) and the contributions readme suggested I should ask here first if that would be a good idea. I was going to borrow implementations and comments from lean/library/data/buffer/parser.lean in lean3. Would that be a good pull request to do?

François G. Dorais (Mar 09 2023 at 19:32):

In general, things with a Lean prefix are intended for primary use by Lean itself (nothing prevents you from using it but only things in Init are intended for general use). To keep Lean small, contributions that do not directly contribute to Lean itself are not welcome there. Of course, there can be a need for a general use version. That could go in Std or as a separate library. If you look around on github you will find several Lean parser libraries (including mine). Just shop around, find one that meets your needs, and contribute there if anything is missing.

Greg B (Mar 09 2023 at 20:21):

Perfect! I am delighted to use your library! Thank you.

cognivore (Mar 09 2023 at 20:40):

We've implemented Megaparsec, with a design constraint of it being a replica of Haskell codebase.

https://github.com/yatima-inc/Megaparsec.lean

We have some documentation in README and the functions are documented reasonably well.

Also, @Arthur Paulino made a great tutorial for it: https://github.com/yatima-inc/Megaparsec.lean/blob/main/Examples/Tutorial.lean

We would appreciate some industrial use. We're using it ourselves for our Wasm runtime in Lean and so far it holds, but we never threw a lot of data on it yet.

cognivore (Mar 09 2023 at 20:42):

@François G. Dorais, a very nice library! We'll have a look at it and perhaps learn a trick or two :)

Our stream implementation is in a separate library: https://github.com/yatima-inc/Straume

We've implemented a lot of stuff to hedge for the use with infinite streams, but we've currently halted the development because we've implemented enough stuff to support Megaparsec use cases.

François G. Dorais (Mar 09 2023 at 22:06):

Thanks @cognivore! My students and I needed parser combinators for another project, I only noticed your library
after we had mostly done all the work for our own. I decided to release it to the world as a spinoff since we like our work and we haven't yet found need to replace it with a more sophisticated one. I'm happy if you can use some tricks for your Lean Megaparsec. I'm also interested in broader Unicode support but https://github.com/xubaiw/Unicode.lean seems to be not actively maintained at this time. Do let me know if you have thoughts about that.

Henrik Böving (Mar 09 2023 at 22:19):

I do have push access to it in principle to guarantee doc-gen4 stability, I just dont know if @Xubai Wang wants me to merge things

François G. Dorais (Mar 09 2023 at 22:23):

@Henrik Böving Can you take a quick look at the failed check on my PR? It doesn't seem to have anything to do with the code I submitted.

François G. Dorais (Mar 09 2023 at 22:27):

By the way, thank you! I know you're busy right now. This is not urgent, I just haven't been able to get any kind of response for a while so I got excited
.

Henrik Böving (Mar 09 2023 at 23:09):

The CI is failing because the doc-gen4 integration it is working with hasn't been updated in ages and is thus not at all compatible with modern doc-gen4 again...unfortunately the way doc-gen is called here is pulled out into another action (which did make sense at the time but not nowadays) ...so it is not really detecting something that is wrong with your code its mostly the repo not being cared for enough yeah.

Henrik Böving (Mar 09 2023 at 23:10):

Also thinking further about it I dont think we want to pull std4 in here. std4 already (conditionally) depends on doc-gen4 to be able to build documentation so pulling it into the dependency chain of doc-gen4 will create a cyclic dependency.

Mario Carneiro (Mar 10 2023 at 00:02):

Henrik Böving said:

Also thinking further about it I dont think we want to pull std4 in here. std4 already (conditionally) depends on doc-gen4 to be able to build documentation so pulling it into the dependency chain of doc-gen4 will create a cyclic dependency.

I would really like this not to have to be a choice. It should be possible to build docs for a project without making doc-gen a dependency of that project

François G. Dorais (Mar 10 2023 at 00:52):

It would be easy to factor out the one data file that's currently needed for doc-gen4. Unless doc-gen4 is planning for more advanced Unicode stuff, I could make that happen next week.

François G. Dorais (Mar 10 2023 at 00:53):

@Henrik Böving you just need the character names, or are you planning something more sophisticated?

Henrik Böving (Mar 10 2023 at 06:45):

No just the names indeed

cognivore (Mar 17 2023 at 21:34):

Thanks for more details, François!

We're currently mostly in maintenance mode when it comes to Megaparsec, as we're focused on using it for other projects, but if I have the time, I'll give Unicode stuff a look.

As a matter of fact, I'm currently reviving an unmaintained pretty printing library, so perhaps the skills shall transfer.

P.S.

I didn't read the rest of the thread, maybe you have figured it out already :)

François G. Dorais (Mar 19 2023 at 16:06):

@cognivore Yes, I did figure it out to some extent. I created my own library: lean4-unicode-basic. It already has a sound API so a 0.1.0 release should happen soon.


Last updated: Dec 20 2023 at 11:08 UTC