Zulip Chat Archive
Stream: lean4
Topic: parsing < and > as brackets
Jakob von Raumer (Jun 13 2023 at 14:56):
I'm trying to use Lean's tools (abusing the ident and num categoris) to parse a language which uses < and > as brackets, and everything works fine until >> or <- appear next to each other without a separating whitespace. My guess is because Lean itself uses >> and <-. How can I avoid this problem?
Mario Carneiro (Jun 13 2023 at 15:45):
you probably need to edit the token table to remove those tokens (or all tokens and install your own). Is this a DSL which is meant to coexist with lean code, or a full file parser where lean doesn't matter?
Jakob von Raumer (Jun 14 2023 at 07:24):
The latter, thanks for the pointer to the token table, will edit that one
Jakob von Raumer (Jun 14 2023 at 08:32):
Hum, there's no Trie.remove :sweat_smile:
Jannis Limperg (Jun 14 2023 at 13:01):
If this is the Trie from DiscrTree, std should have removal functions.
Jakob von Raumer (Jun 14 2023 at 16:18):
It's not, its defined in Lean.Data.Trie. Maybe these should be merged? ^^
Jannis Limperg (Jun 14 2023 at 16:22):
Ah no, that's an entirely different data structure. (The DiscrTree one is a trie over expressions, not over strings.)
Sebastian Ullrich (Jun 15 2023 at 08:25):
As long as these tokens are not used to index the parsing tables (i.e. are not at the beginning of a parser registered in a category), you can also use rawCh '>' (trailingWs := true). This is a Parser though, not a ParserDescr like in syntax.
Jakob von Raumer (Oct 11 2023 at 11:28):
There's no easy way to mix ParserDescrs and custom Parsers, or is there? Say I want to use a custom Parser inside a syntax command or embed the generated Parser from a ParserDescr in a Parser definition?
Jakob von Raumer (Oct 11 2023 at 11:32):
Ah nvm, the first one does seem to be possible
Mac Malone (Oct 12 2023 at 05:59):
Jakob von Raumer said:
or embed the generated
Parserfrom aParserDescrin aParserdefinition?
evalParserConst (from Lean.Parser) can be used to do this.
Jakob von Raumer (Oct 12 2023 at 07:12):
And how do I turn a ParserFn into a Parser with some sort of obvious/trivial formatter? If I want to make a parser out of whitespace, for example, and do that by
def whitespaceParser : Parser := { fn := whitespace }
then I'll get a message
don't know how to generate formatter for non-definition '{
info := { collectTokens := id, collectKinds := id, firstTokens := FirstTokens.unknown }, fn := whitespace }'
whenever I try to use it anywhere else
Mac Malone (Oct 12 2023 at 18:24):
If the parser in question is produces an Syntax.atom, the following is a useful parser combinator:
def raw (fn : ParserFn) (trailingWs := false) : Parser where
fn := rawFn fn trailingWs
@[combinator_formatter raw] def raw.formatter := visitAtom Name.anonymous
@[combinator_parenthesizer raw] def raw.parenthesizer := visitToken
For whitespace (which should generally be named whitespaceFn if it is a ParserFn not a Parser) this would be:
def whitespace := raw whitespaceFn
For non-Syntax.atom parsers, a proper parser with non-trivial formatters and parenthesizers is likely necessary.
Mac Malone (Oct 12 2023 at 18:26):
The raw parser is also only good if used sparingly / not at the start of a syntax. If the parser in question is meant to be a general token, you want to provide with some metadata like so:
def whitespace : Parser where
fn := rawFn whitespaceFn
info := mkAtomicInfo "whitespace"
@[combinator_formatter raw] def raw.formatter := visitAtom Name.anonymous
@[combinator_parenthesizer raw] def raw.parenthesizer := visitToken
Mac Malone (Oct 12 2023 at 18:29):
Also, if you actually are trying to wrap whitespace, that may be a problem. Whitespace is best checked via either checkWsBefore or checkNoWsBefore (which are ws and noWs respectively in a syntax declaration).
Jakob von Raumer (Oct 13 2023 at 08:32):
Ah, thanks for all the pointers.
Jakob von Raumer (Oct 13 2023 at 08:37):
The thing I'm trying to parse compasses cmd1;cmd2, cmd1; /- comment -/ cmd2;' etc., no neither ws nor noWs does it for me, but the parsed stuff gets discarded anyway, so I guess raw` is the right thing to use
Last updated: May 02 2025 at 03:31 UTC