Zulip Chat Archive

Stream: metaprogramming / tactics

Topic: Parsing a chain of characters


Paul Lezeau (Jun 22 2025 at 13:18):

Say I declare a syntax category mySyntax and declare

syntax "A" : mySyntax
syntax "B" : mySyntax

What would the idiomatic way to allow arbitrary combinations of A and B with no spaces in between be?

E.g. something that made the following work would be great:

import Lean

open Lean Meta Elab Command

declare_syntax_cat mySyntax

syntax "A" : mySyntax
syntax "B" : mySyntax

-- This doesn't work
syntax mySyntax noWs mySyntax : mySyntax

elab "#testing" t:mySyntax : command => do
  Lean.logInfo "Success"

#testing A -- Success
#testing B -- Success
#testing AB --unexpected identifier; expected mySyntax

Robin Arnez (Jun 22 2025 at 14:18):

AB parses as a single token, an identifier. This is not really avoidable without hacks so what I'd do is:

import Lean

open Lean Meta Elab Command

declare_syntax_cat mySyntax

syntax ident : mySyntax

elab "#testing" t:mySyntax : command => do
  match t with
  | `(mySyntax| $i:ident) =>
    let n := i.getId
    let .str .anonymous s := n | throwError "Expected single name component"
    unless s.all (fun c => c == 'A' || c == 'B') do
      throwError "Invalid characters"
    Lean.logInfo "Success"
  | _ => throwUnsupportedSyntax

#testing A -- Success
#testing B -- Success
#testing AB -- Success
#testing C -- Invalid characters
#testing C.D -- Expected single name component

=> just parse identifiers in general and then check later whether the identifier is valid

Paul Lezeau (Jun 22 2025 at 14:36):

That's very helpful - thanks a lot!

Yakov Pechersky (Jun 22 2025 at 14:54):

Is there a way to hook into the parsers that get generated by the syntax declarations? For example, I might have a syntax category 'element' with tokens H, C, N, O, S, Cl. And I'd want to parse HCl as a molecule.

Robin Arnez (Jun 22 2025 at 16:40):

You can certainly do that although, again, it's a bit hacky:

import Lean

open Lean Parser

def rawCharsNoAntiquotFn (tk : String) : ParserFn := fun c s =>
  let str := c.input
  let pos := s.pos
  if str.substrEq pos tk 0 tk.utf8ByteSize then
    let s := s.setPos (pos + tk)
    let s := whitespace c s
    let stx := .atom (.original (mkEmptySubstringAt str pos) pos str, pos + tk, s.pos (pos + tk)) tk
    s.pushSyntax stx
  else
    s.mkUnexpectedError "unexpected token" [tk]

def rawCharsNoAntiquot (str : String) : Parser where
  fn := rawCharsNoAntiquotFn str
  info := nonReservedSymbolInfo str true

#print Parser.nonReservedSymbolInfo

open PrettyPrinter Formatter in
@[combinator_formatter rawCharsNoAntiquot]
def rawCharsNoAntiquot.formatter (str : String) : Formatter := symbolNoAntiquot.formatter str

open PrettyPrinter Parenthesizer in
@[combinator_parenthesizer rawCharsNoAntiquot]
def rawCharsNoAntiquot.parenthesizer (str : String) : Parenthesizer := symbolNoAntiquot.parenthesizer str

def rawChars (str : String) : Parser := Parser.tokenWithAntiquot (rawCharsNoAntiquot str)

declare_syntax_cat element

@[element_parser] def hydrogen : Lean.Parser.Parser := leading_parser rawChars "H"
@[element_parser] def carbon : Lean.Parser.Parser := leading_parser rawChars "C"
@[element_parser] def nitrogen : Lean.Parser.Parser := leading_parser rawChars "N"
@[element_parser] def oxygen : Lean.Parser.Parser := leading_parser rawChars "O"
@[element_parser] def sulfur : Lean.Parser.Parser := leading_parser rawChars "S"
@[element_parser] def chloride : Lean.Parser.Parser := leading_parser rawChars "Cl"

elab "#testing" m:element* : command => do
  for el in m do
    match el with
    | `(element| H%$_tk) => Lean.logInfo "Hydrogen!"
    | `(element| C) => Lean.logInfo "Carbon!"
    | `(element| N) => Lean.logInfo "Nitrogen!"
    | `(element| O) => Lean.logInfo "Oxygen!"
    | `(element| S) => Lean.logInfo "Sulfur!"
    | `(element| Cl) => Lean.logInfo "Chloride!"
    | _ => Elab.throwUnsupportedSyntax

#testing HClSClOC -- Success
#testing HCSABCDE -- Error on ABCDE

Robin Arnez (Jun 22 2025 at 16:41):

Also I figured out how to confuse the hell out of the parser:

macro "ident" : term => `(42)
#check ident -- identifier expected

Last updated: Dec 20 2025 at 21:32 UTC