Zulip Chat Archive
Stream: lean4
Topic: parsing < and > as brackets
Jakob von Raumer (Jun 13 2023 at 14:56):
I'm trying to use Lean's tools (abusing the ident
and num
categoris) to parse a language which uses <
and >
as brackets, and everything works fine until >>
or <-
appear next to each other without a separating whitespace. My guess is because Lean itself uses >>
and <-
. How can I avoid this problem?
Mario Carneiro (Jun 13 2023 at 15:45):
you probably need to edit the token table to remove those tokens (or all tokens and install your own). Is this a DSL which is meant to coexist with lean code, or a full file parser where lean doesn't matter?
Jakob von Raumer (Jun 14 2023 at 07:24):
The latter, thanks for the pointer to the token table, will edit that one
Jakob von Raumer (Jun 14 2023 at 08:32):
Hum, there's no Trie.remove
:sweat_smile:
Jannis Limperg (Jun 14 2023 at 13:01):
If this is the Trie
from DiscrTree
, std should have removal functions.
Jakob von Raumer (Jun 14 2023 at 16:18):
It's not, its defined in Lean.Data.Trie
. Maybe these should be merged? ^^
Jannis Limperg (Jun 14 2023 at 16:22):
Ah no, that's an entirely different data structure. (The DiscrTree
one is a trie over expressions, not over strings.)
Sebastian Ullrich (Jun 15 2023 at 08:25):
As long as these tokens are not used to index the parsing tables (i.e. are not at the beginning of a parser registered in a category), you can also use rawCh '>' (trailingWs := true)
. This is a Parser
though, not a ParserDescr
like in syntax
.
Jakob von Raumer (Oct 11 2023 at 11:28):
There's no easy way to mix ParserDescr
s and custom Parser
s, or is there? Say I want to use a custom Parser inside a syntax
command or embed the generated Parser
from a ParserDescr
in a Parser
definition?
Jakob von Raumer (Oct 11 2023 at 11:32):
Ah nvm, the first one does seem to be possible
Mac Malone (Oct 12 2023 at 05:59):
Jakob von Raumer said:
or embed the generated
Parser
from aParserDescr
in aParser
definition?
evalParserConst
(from Lean.Parser
) can be used to do this.
Jakob von Raumer (Oct 12 2023 at 07:12):
And how do I turn a ParserFn
into a Parser
with some sort of obvious/trivial formatter? If I want to make a parser out of whitespace
, for example, and do that by
def whitespaceParser : Parser := { fn := whitespace }
then I'll get a message
don't know how to generate formatter for non-definition '{
info := { collectTokens := id, collectKinds := id, firstTokens := FirstTokens.unknown }, fn := whitespace }'
whenever I try to use it anywhere else
Mac Malone (Oct 12 2023 at 18:24):
If the parser in question is produces an Syntax.atom
, the following is a useful parser combinator:
def raw (fn : ParserFn) (trailingWs := false) : Parser where
fn := rawFn fn trailingWs
@[combinator_formatter raw] def raw.formatter := visitAtom Name.anonymous
@[combinator_parenthesizer raw] def raw.parenthesizer := visitToken
For whitespace
(which should generally be named whitespaceFn
if it is a ParserFn
not a Parser
) this would be:
def whitespace := raw whitespaceFn
For non-Syntax.atom
parsers, a proper parser with non-trivial formatters and parenthesizers is likely necessary.
Mac Malone (Oct 12 2023 at 18:26):
The raw
parser is also only good if used sparingly / not at the start of a syntax. If the parser in question is meant to be a general token, you want to provide with some metadata like so:
def whitespace : Parser where
fn := rawFn whitespaceFn
info := mkAtomicInfo "whitespace"
@[combinator_formatter raw] def raw.formatter := visitAtom Name.anonymous
@[combinator_parenthesizer raw] def raw.parenthesizer := visitToken
Mac Malone (Oct 12 2023 at 18:29):
Also, if you actually are trying to wrap whitespace
, that may be a problem. Whitespace is best checked via either checkWsBefore
or checkNoWsBefore
(which are ws
and noWs
respectively in a syntax
declaration).
Jakob von Raumer (Oct 13 2023 at 08:32):
Ah, thanks for all the pointers.
Jakob von Raumer (Oct 13 2023 at 08:37):
The thing I'm trying to parse compasses cmd1;cmd2
, cmd1; /- comment -/ cmd2;' etc., no neither
ws nor
noWs does it for me, but the parsed stuff gets discarded anyway, so I guess
raw` is the right thing to use
Last updated: Dec 20 2023 at 11:08 UTC