Zulip Chat Archive

Stream: lean4

Topic: parsing < and > as brackets

Jakob von Raumer (Jun 13 2023 at 14:56):

I'm trying to use Lean's tools (abusing the ident and num categoris) to parse a language which uses < and > as brackets, and everything works fine until >> or <- appear next to each other without a separating whitespace. My guess is because Lean itself uses >> and <-. How can I avoid this problem?

Mario Carneiro (Jun 13 2023 at 15:45):

you probably need to edit the token table to remove those tokens (or all tokens and install your own). Is this a DSL which is meant to coexist with lean code, or a full file parser where lean doesn't matter?

Jakob von Raumer (Jun 14 2023 at 07:24):

The latter, thanks for the pointer to the token table, will edit that one

Jakob von Raumer (Jun 14 2023 at 08:32):

Hum, there's no Trie.remove :sweat_smile:

Jannis Limperg (Jun 14 2023 at 13:01):

If this is the Trie from DiscrTree, std should have removal functions.

Jakob von Raumer (Jun 14 2023 at 16:18):

It's not, its defined in Lean.Data.Trie. Maybe these should be merged? ^^

Jannis Limperg (Jun 14 2023 at 16:22):

Ah no, that's an entirely different data structure. (The DiscrTree one is a trie over expressions, not over strings.)

Sebastian Ullrich (Jun 15 2023 at 08:25):

As long as these tokens are not used to index the parsing tables (i.e. are not at the beginning of a parser registered in a category), you can also use rawCh '>' (trailingWs := true). This is a Parser though, not a ParserDescr like in syntax.

Jakob von Raumer (Oct 11 2023 at 11:28):

There's no easy way to mix ParserDescrs and custom Parsers, or is there? Say I want to use a custom Parser inside a syntax command or embed the generated Parser from a ParserDescr in a Parser definition?

Jakob von Raumer (Oct 11 2023 at 11:32):

Ah nvm, the first one does seem to be possible

Mac Malone (Oct 12 2023 at 05:59):

Jakob von Raumer said:

or embed the generated Parser from a ParserDescr in a Parser definition?

evalParserConst (from Lean.Parser) can be used to do this.

Jakob von Raumer (Oct 12 2023 at 07:12):

And how do I turn a ParserFn into a Parser with some sort of obvious/trivial formatter? If I want to make a parser out of whitespace, for example, and do that by

def whitespaceParser : Parser := { fn := whitespace }

then I'll get a message

don't know how to generate formatter for non-definition '{
  info := { collectTokens := id, collectKinds := id, firstTokens := FirstTokens.unknown }, fn := whitespace }'

whenever I try to use it anywhere else

Mac Malone (Oct 12 2023 at 18:24):

If the parser in question is produces an Syntax.atom, the following is a useful parser combinator:

def raw (fn : ParserFn) (trailingWs := false) : Parser where
  fn := rawFn fn trailingWs

@[combinator_formatter raw] def raw.formatter := visitAtom Name.anonymous
@[combinator_parenthesizer raw] def raw.parenthesizer := visitToken

For whitespace (which should generally be named whitespaceFn if it is a ParserFn not a Parser) this would be:

def whitespace := raw whitespaceFn

For non-Syntax.atom parsers, a proper parser with non-trivial formatters and parenthesizers is likely necessary.

Mac Malone (Oct 12 2023 at 18:26):

The raw parser is also only good if used sparingly / not at the start of a syntax. If the parser in question is meant to be a general token, you want to provide with some metadata like so:

def whitespace  : Parser where
  fn := rawFn whitespaceFn
  info := mkAtomicInfo "whitespace"

@[combinator_formatter raw] def raw.formatter := visitAtom Name.anonymous
@[combinator_parenthesizer raw] def raw.parenthesizer := visitToken

Mac Malone (Oct 12 2023 at 18:29):

Also, if you actually are trying to wrap whitespace, that may be a problem. Whitespace is best checked via either checkWsBefore or checkNoWsBefore (which are ws and noWs respectively in a syntax declaration).

Jakob von Raumer (Oct 13 2023 at 08:32):

Ah, thanks for all the pointers.

Jakob von Raumer (Oct 13 2023 at 08:37):

The thing I'm trying to parse compasses cmd1;cmd2, cmd1; /- comment -/ cmd2;' etc., no neither ws nor noWs does it for me, but the parsed stuff gets discarded anyway, so I guess raw` is the right thing to use

Last updated: May 02 2025 at 03:31 UTC