Zulip Chat Archive

Stream: general

Topic: How can I parse custom tokens with the Lean Parser?


Spanky (Jan 02 2025 at 17:17):

I'm trying to use the Lean Parser to parse some custom syntax. I need to support different tokens from the builtin ones (think of a different syntax for number literals or identifiers).
I believe this requires to define some Lean.Parser.Parsers and Lean.Parser.ParserFns, but after spending a whole day on it, I'm still completely lost. Providing those objects doesn't seem to be enough, as other parts of the language come to byte me (I'm getting "don't know how to generate formatter for non-definition '{ ... }'" errors... wat?)

Is it possible to extend Lean's parser with some custom tokens? How can I do that?
Yesterday Frédéric Dupuis suggested using a different parser, lean4-parser, which could be a good alternative. Unfortunately it comes with no expression parsing capabilities and no elaborators to improve the experience.
Either approach seem to require a huge amount of work.

Eric Wieser (Jan 02 2025 at 17:39):

Are you trying to parse a superset of Lean syntax, or your own language that only partially overlaps with Lean?

Spanky (Jan 02 2025 at 18:27):

My own language. It should be possible to use the syntax command to define grammar, but the grammar terms are not compatible with e.g. ident.

Spanky (Jan 03 2025 at 03:44):

I noticed that:

  • Lean.Data.Json.Parser and Lean.Data.Xml.Parser don't use Lean.Parser, but rather Std.Internal.Parsec.
  • There's a project named C-parsing-for-Lean4 which uses the Lean4's parser to parse C code. However they use a special version of Lean4, with a modified parser: Kha/lean4:vcustom-tokens.

This suggests to me that it's currently not possible (or at least not practical) to extend the standard Parser of the official Lean releases to parse arbitrary tokens.

I'm new here and I don't know how things work, but I'd be really interested in help improve the parser. What should I do, if I wanted to try?
Also, why was Kha/lean4:vcustom-tokens never merged? Should I try to get in touch with Kha (would it be polite to ping Sebastian Ullrich here?) to understand what went wrong and if it would make sense to continue working on that effort?

Kyle Miller (Jan 03 2025 at 04:44):

Here's the draft PR for the custom-tokens branch: lean4#2293. (I guess vcustom-tokens is a tag for some older version of it?)

#lean4 > Haskell-like custom parsers in quasiquotations? @ 💬 is the last mention of the tag here. (@Sebastian Ullrich has been busy with other features in the meantime, but I'm sure it's still true that hearing people are looking forward to it increases its priority!)

Spanky (Jan 03 2025 at 05:19):

Thanks a lot for finding that PR and those Zulip messages!

Through those links I learned about Alloy, which is doing exactly what I need.
At this point I'm going to copy/fork its parser and roll with it to keep prototyping.

I see that Sebastian Ullrich and Mac Malone are both working on cool features that might make parsing much nicer in the future (the custom-tokens API and the μdo notation). Really excited about how nice parsing is going to be soon!

David Thrane Christiansen (Jan 05 2025 at 08:40):

Here's a relatively self-contained example of a custom token in the wild: Verso's embedded HTML templates

Asei Inoue (Jan 29 2025 at 13:23):

@Spanky What is μdo notation?


Last updated: May 02 2025 at 03:31 UTC