Zulip Chat Archive

Stream: general

Topic: writing simple parsers


Kim Morrison (May 28 2024 at 10:57):

I'm embarrassed to admit my almost complete ignorance about writing simple parsers in Lean, but could someone show me the lowest overhead way to write a parser that takes a string representation of a list of pairs of natural numbers, e.g. "[(1, 2), (3, 4)]" and returns a List (\nat \times \nat)?

I'm trying to help some people who are currently implementing this by splitting on comma, filtering non-digit characters, and then chunking, and I'm horrified I don't know what to tell them to do instead.

No error handling required, happy to just return the empty list if anything goes wrong. I just want String \to List (\nat \times \nat) as the signature.

(Mathlib compatible answers, i.e. no extra requires preferred but not essential.)

Damiano Testa (May 28 2024 at 11:01):

How does this look like?

#eval
  let input := (1, 2)
  let digs : String := Nat.toDigits 10 input.1 ++ Nat.toDigits 10 input.2
  digs.toNat!

a hand-rolled parser!

Damiano Testa (May 28 2024 at 11:01):

Oh, I missed that you start with a string!

Henrik Böving (May 28 2024 at 11:25):

Kim Morrison said:

I'm embarrassed to admit my almost complete ignorance about writing simple parsers in Lean, but could someone show me the lowest overhead way to write a parser that takes a string representation of a list of pairs of natural numbers, e.g. "[(1, 2), (3, 4)]" and returns a List (\nat \times \nat)?

I'm trying to help some people who are currently implementing this by splitting on comma, filtering non-digit characters, and then chunking, and I'm horrified I don't know what to tell them to do instead.

No error handling required, happy to just return the empty list if anything goes wrong. I just want String \to List (\nat \times \nat) as the signature.

(Mathlib compatible answers, i.e. no extra requires preferred but not essential.)

The proper way not write non trivial parsers would be to use docs#Lean.Parsec. You can still get away with split based stuff for a certain complexity level but at some point you'll want to write a real parser.

Kim Morrison (May 28 2024 at 12:18):

Are there basic examples of Parsec available somewhere that I can point people to?

Henrik Böving (May 28 2024 at 12:24):

Lean core has a JSON parser in parsec at src/Lean/Data/Json/Parser.lean. Parsec is also a big name in the haskell eco system so there should be lots of tutorials there that use libraries with very similar (though often more featureful) APIs than our Parsec.

Utensil Song (May 28 2024 at 12:26):

There were many useful links in this old topic.

Henrik Böving (May 28 2024 at 12:26):

There is for example this very long haskell tutorial on (one of, there are multiple forks) parsec https://jakewheat.github.io/intro_to_parsing/. Of course the syntax doesn't translate 1:1 but the library that we have is very heavily inspired by and has very similar functions to Haskell parsec.

I was in general hoping to improve the Parsec situation now that having parsers is actually starting to become a necessity for projects that are on the FRO roadmap.

Matthew Ballard (May 28 2024 at 12:32):

Kim Morrison said:

Are there basic examples of Parsec available somewhere that I can point people to?

Sounds like a good excuse for people to link to their old Advent of Code solutions :)

Kyle Miller (May 28 2024 at 12:50):

I used parsec most of my AoC entries. For example, day 2

Something to know about Parsec is that it's really algorithmic vs grammar-based parsers. It commits to parses by default, and it's up to you to say where you'd want any lookahead. Not really any support for efficient nondeterminism. That's not needed for a list of pairs of nats though.

Shreyas Srinivas (May 28 2024 at 14:27):

Kyle Miller said:

I used parsec most of my AoC entries. For example, day 2

Something to know about Parsec is that it's really algorithmic vs grammar-based parsers. It commits to parses by default, and it's up to you to say where you'd want any lookahead. Not really any support for efficient nondeterminism. That's not needed for a list of pairs of nats though.

You can do static precomputations better with Arrows. See tutorial : https://www.haskell.org/arrows/
EDIT: The original paper introduced them to write efficient lookahead parsers: https://www.cse.chalmers.se/~rjmh/Papers/arrows.pdf

Frédéric Dupuis (May 28 2024 at 14:33):

I have a simple bibtex parser that uses Lean.Parsec here.

Shreyas Srinivas (May 28 2024 at 14:35):

When I was learning parsing with lean, I found Arthur Paulino's fxylang a useful non-parsec example : https://github.com/arthurpaulino/FxyLang

Kyle Miller (May 28 2024 at 15:27):

A caveat there is that using runParserCategory means you're tied to the compile-time Lean environment (i.e., you need an Environment). For compiled programs you can get around by either including the Lean interpreter and having it import oleans or by using a project like https://github.com/tydeu/lean4-partax to compile the parser definitions.

Regarding arrows, they're interesting, but just to clarify, you're not suggesting that there's any Lean implementation, right?

Shreyas Srinivas (May 28 2024 at 16:11):

Nope. I don't think there is a lean implementation of Arrows yet

Andrés Goens (Jun 04 2024 at 14:19):

there's also @Mac Malone 's partax for using lean syntax extensions but avoiding the whole runParserCategory stuff

Paige Thomas (Jun 05 2024 at 04:36):

I'm also starting to learn parsing. I figured I would start with trying to write a lexer, and for that I assume that I need regular expressions, hence I started trying to implement a computable version of those in Lean.

Paige Thomas (Jun 05 2024 at 04:41):

I'm hoping to end up with a LR parser generator in Lean, but that may be more ambitious than I can accomplish in a reasonable amount of time.


Last updated: May 02 2025 at 03:31 UTC