Zulip Chat Archive
Stream: metaprogramming / tactics
Topic: Parsing a chain of characters
Paul Lezeau (Jun 22 2025 at 13:18):
Say I declare a syntax category mySyntax and declare
syntax "A" : mySyntax
syntax "B" : mySyntax
What would the idiomatic way to allow arbitrary combinations of A and B with no spaces in between be?
E.g. something that made the following work would be great:
import Lean
open Lean Meta Elab Command
declare_syntax_cat mySyntax
syntax "A" : mySyntax
syntax "B" : mySyntax
-- This doesn't work
syntax mySyntax noWs mySyntax : mySyntax
elab "#testing" t:mySyntax : command => do
Lean.logInfo "Success"
#testing A -- Success
#testing B -- Success
#testing AB --unexpected identifier; expected mySyntax
Robin Arnez (Jun 22 2025 at 14:18):
AB parses as a single token, an identifier. This is not really avoidable without hacks so what I'd do is:
import Lean
open Lean Meta Elab Command
declare_syntax_cat mySyntax
syntax ident : mySyntax
elab "#testing" t:mySyntax : command => do
match t with
| `(mySyntax| $i:ident) =>
let n := i.getId
let .str .anonymous s := n | throwError "Expected single name component"
unless s.all (fun c => c == 'A' || c == 'B') do
throwError "Invalid characters"
Lean.logInfo "Success"
| _ => throwUnsupportedSyntax
#testing A -- Success
#testing B -- Success
#testing AB -- Success
#testing C -- Invalid characters
#testing C.D -- Expected single name component
=> just parse identifiers in general and then check later whether the identifier is valid
Paul Lezeau (Jun 22 2025 at 14:36):
That's very helpful - thanks a lot!
Yakov Pechersky (Jun 22 2025 at 14:54):
Is there a way to hook into the parsers that get generated by the syntax declarations? For example, I might have a syntax category 'element' with tokens H, C, N, O, S, Cl. And I'd want to parse HCl as a molecule.
Robin Arnez (Jun 22 2025 at 16:40):
You can certainly do that although, again, it's a bit hacky:
import Lean
open Lean Parser
def rawCharsNoAntiquotFn (tk : String) : ParserFn := fun c s =>
let str := c.input
let pos := s.pos
if str.substrEq pos tk 0 tk.utf8ByteSize then
let s := s.setPos (pos + tk)
let s := whitespace c s
let stx := .atom (.original (mkEmptySubstringAt str pos) pos ⟨str, pos + tk, s.pos⟩ (pos + tk)) tk
s.pushSyntax stx
else
s.mkUnexpectedError "unexpected token" [tk]
def rawCharsNoAntiquot (str : String) : Parser where
fn := rawCharsNoAntiquotFn str
info := nonReservedSymbolInfo str true
#print Parser.nonReservedSymbolInfo
open PrettyPrinter Formatter in
@[combinator_formatter rawCharsNoAntiquot]
def rawCharsNoAntiquot.formatter (str : String) : Formatter := symbolNoAntiquot.formatter str
open PrettyPrinter Parenthesizer in
@[combinator_parenthesizer rawCharsNoAntiquot]
def rawCharsNoAntiquot.parenthesizer (str : String) : Parenthesizer := symbolNoAntiquot.parenthesizer str
def rawChars (str : String) : Parser := Parser.tokenWithAntiquot (rawCharsNoAntiquot str)
declare_syntax_cat element
@[element_parser] def hydrogen : Lean.Parser.Parser := leading_parser rawChars "H"
@[element_parser] def carbon : Lean.Parser.Parser := leading_parser rawChars "C"
@[element_parser] def nitrogen : Lean.Parser.Parser := leading_parser rawChars "N"
@[element_parser] def oxygen : Lean.Parser.Parser := leading_parser rawChars "O"
@[element_parser] def sulfur : Lean.Parser.Parser := leading_parser rawChars "S"
@[element_parser] def chloride : Lean.Parser.Parser := leading_parser rawChars "Cl"
elab "#testing" m:element* : command => do
for el in m do
match el with
| `(element| H%$_tk) => Lean.logInfo "Hydrogen!"
| `(element| C) => Lean.logInfo "Carbon!"
| `(element| N) => Lean.logInfo "Nitrogen!"
| `(element| O) => Lean.logInfo "Oxygen!"
| `(element| S) => Lean.logInfo "Sulfur!"
| `(element| Cl) => Lean.logInfo "Chloride!"
| _ => Elab.throwUnsupportedSyntax
#testing HClSClOC -- Success
#testing HCSABCDE -- Error on ABCDE
Robin Arnez (Jun 22 2025 at 16:41):
Also I figured out how to confuse the hell out of the parser:
macro "ident" : term => `(42)
#check ident -- identifier expected
Last updated: Dec 20 2025 at 21:32 UTC