# Documentation

Lean.Parser.Basic

# Basic Lean parser infrastructure #

The Lean parser was developed with the following primary goals in mind:

• flexibility: Lean's grammar is complex and includes indentation and other whitespace sensitivity. It should be possible to introduce such custom "tweaks" locally without having to adjust the fundamental parsing approach.
• extensibility: Lean's grammar can be extended on the fly within a Lean file, and with Lean 4 we want to extend this to cover embedding domain-specific languages that may look nothing like Lean, down to using a separate set of tokens.
• losslessness: The parser should produce a concrete syntax tree that preserves all whitespace and other "sub-token" information for the use in tooling.
• performance: The overhead of the parser building blocks, and the overall parser performance on average-complexity input, should be comparable with that of the previous parser hand-written in C++. No fancy optimizations should be necessary for this.

Given these constraints, we decided to implement a combinatoric, non-monadic, lexer-less, memoizing recursive-descent parser. Using combinators instead of some more formal and introspectible grammar representation ensures ultimate flexibility as well as efficient extensibility: there is (almost) no pre-processing necessary when extending the grammar with a new parser. However, because the all results the combinators produce are of the homogeneous Syntax type, the basic parser type is not actually a monad but a monomorphic linear function ParserState → ParserState, avoiding constructing and deconstructing countless monadic return values. Instead of explicitly returning syntax objects, parsers push (zero or more of) them onto a syntax stack inside the linear state. Chaining parsers via >> accumulates their output on the stack. Combinators such as node then pop off all syntax objects produced during their invocation and wrap them in a single Syntax.node object that is again pushed on this stack. Instead of calling node directly, we usually use the macro leading_parser p, which unfolds to node k p where the new syntax node kind k is the name of the declaration being defined.

The lack of a dedicated lexer ensures we can modify and replace the lexical grammar at any point, and simplifies detecting and propagating whitespace. The parser still has a concept of "tokens", however, and caches the most recent one for performance: when tokenFn is called twice at the same position in the input, it will reuse the result of the first call. tokenFn recognizes some built-in variable-length tokens such as identifiers as well as any fixed token in the ParserContext's TokenTable (a trie); however, the same cache field and strategy could be reused by custom token parsers. Tokens also play a central role in the prattParser combinator, which selects a leading parser followed by zero or more trailing parsers based on the current token (via peekToken); see the documentation of prattParser for more details. Tokens are specified via the symbol parser, or with symbolNoWs for tokens that should not be preceded by whitespace.

The Parser type is extended with additional metadata over the mere parsing function to propagate token information: collectTokens collects all tokens within a parser for registering. firstTokens holds information about the "FIRST" token set used to speed up parser selection in prattParser. This approach of combining static and dynamic information in the parser type is inspired by the paper "Deterministic, Error-Correcting Combinator Parsers" by Swierstra and Duponcheel. If multiple parsers accept the same current token, prattParser tries all of them using the backtracking longestMatchFn combinator. This is the only case where standard parsers might execute arbitrary backtracking. Repeated invocations of the same category or concrete parser at the same position are cached where possible; see withCache.

Finally, error reporting follows the standard combinatoric approach of collecting a single unexpected token/... and zero or more expected tokens (see Error below). Expected tokens are e.g. set by symbol and merged by <|>. Combinators running multiple parsers should check if an error message is set in the parser state (hasError) and act accordingly. Error recovery is left to the designer of the specific language; for example, Lean's top-level parseCommand loop skips tokens until the next command keyword on error.

Equations
• One or more equations did not get rendered due to their size.
Equations
@[noinline]
Equations
Equations
Equations
• = let s := p c s; if then s else q c s
@[noinline]
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
Equations
• = let iniSz := ; let s := p c s;
Equations
• = let iniSz := ; let s := p c s;
@[noinline]
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
• One or more equations did not get rendered due to their size.

Generate an error at the position saved with the withPosition combinator. If delta == true, then it reports at saved position+1. This useful to make sure a parser consumed at least one character.

Equations

Succeeds if c.prec <= prec

Equations
• One or more equations did not get rendered due to their size.

Succeeds if c.lhsPrec >= prec

Equations
• One or more equations did not get rendered due to their size.
Equations
• Lean.Parser.setLhsPrecFn prec x s = if then s else { stxStack := s.stxStack, lhsPrec := prec, pos := s.pos, cache := s.cache, errorMsg := s.errorMsg }
Equations
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
@[noinline]
Equations
• One or more equations did not get rendered due to their size.

Run p, falling back to q if p failed without consuming any input.

NOTE: In order for the pretty printer to retrace an orelse, p must be a call to node or some other parser producing a single node kind. Nested orelse calls are flattened for this, i.e. (node k1 p1 <|> node k2 p2) <|> ... is fine as well.

Equations
Equations
@[noinline]
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
@[noinline]
Equations
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
• = let iniSz := ; let s := ;
Equations
• = { info := , fn := }
Equations
• = let iniSz := ; let s := ;
Equations
Equations
@[noinline]
Equations
@[noinline]
Equations
• = { collectTokens := p.collectTokens sep.collectTokens, collectKinds := p.collectKinds sep.collectKinds, firstTokens := p.firstTokens }
Equations
Equations

Apply f to the syntax object produced by p

Equations
• One or more equations did not get rendered due to their size.
@[noinline]
Equations
Equations
• = { info := , fn := }
Equations
def Lean.Parser.satisfyFn (p : ) (errorMsg : optParam String "unexpected character") :
Equations
• One or more equations did not get rendered due to their size.
partial def Lean.Parser.finishCommentBlock (pushMissingOnError : Bool) (nesting : Nat) :
Equations

Consume whitespace and comments

Equations
• = { str := s, startPos := p, stopPos := p }

Match an arbitrary Parser and return the consumed String in a Syntax.atom.

Equations
def Lean.Parser.chFn (c : Char) (trailingWs : ) :
Equations
def Lean.Parser.rawCh (c : Char) (trailingWs : ) :
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations

Push (Syntax.node tk ) onto syntax stack if parse was successful.

Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• = let tkc := s.cache.tokenCache; if (tkc.startPos == s.pos) = true then (s, Except.ok tkc.token) else

Treat keywords as identifiers.

Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
• = { collectTokens := fun tks => sym :: tks, collectKinds := id, firstTokens := }
Equations
Equations

Check if the following token is the symbol or identifier sym. Useful for parsing local tokens that have not been added to the token table (but may have been so by some unrelated code).

For example, the universe max Function is parsed using this combinator so that it can still be used as an identifier outside of universe (but registering it as a token in a Term Syntax would not break the universe Parser).

Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
def Lean.Parser.unicodeSymbolFnAux (sym : String) (asciiSym : String) (expected : ) :
Equations
Equations
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
• = { collectTokens := id, collectKinds := id, firstTokens := }
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
• One or more equations did not get rendered due to their size.
def Lean.Parser.ParserState.keepPrevError (oldStackSize : Nat) (oldStopPos : String.Pos) (oldError : ) (oldLhsPrec : Nat) :
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations

Auxiliary function used to execute parsers provided to longestMatchFn. Push left? into the stack if it is not none, and execute p.

Remark: p must produce exactly one syntax node. Remark: the left? is not none when we are processing trailing parsers.

Equations
• One or more equations did not get rendered due to their size.
def Lean.Parser.longestMatchStep (left? : ) (startSize : Nat) (startLhsPrec : Nat) (startPos : String.Pos) (prevPrio : Nat) (prio : Nat) (p : Lean.Parser.ParserFn) :
Equations
• One or more equations did not get rendered due to their size.
Equations
def Lean.Parser.longestMatchFnAux (left? : ) (startSize : Nat) (startLhsPrec : Nat) (startPos : String.Pos) (prevPrio : Nat) (ps : ) :
Equations
def Lean.Parser.longestMatchFnAux.parse (left? : ) (startSize : Nat) (startLhsPrec : Nat) (startPos : String.Pos) (prevPrio : Nat) (ps : ) :
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations

A multimap indexed by tokens. Used for indexing parsers by their leading token.

Equations
def Lean.Parser.TokenMap.insert {α : Type} (map : ) (k : Lean.Name) (v : α) :
Equations
Equations
• Lean.Parser.TokenMap.instInhabitedTokenMap = { default := Lean.RBMap.empty }
Equations
• Lean.Parser.TokenMap.instEmptyCollectionTokenMap = { emptyCollection := Lean.RBMap.empty }
instance Lean.Parser.TokenMap.instForInTokenMapProdNameList {m : Type u_1 → Type u_2} {α : Type} :
ForIn m () ()
Equations
• leadingTable :
• leadingParsers :
• trailingTable :
• trailingParsers :
Instances For
Equations
• LeadingIdentBehavior.default: if the leading token is an identifier, then prattParser just executes the parsers associated with the auxiliary token "ident".

default: Lean.Parser.LeadingIdentBehavior
• LeadingIdentBehavior.symbol: if the leading token is an identifier , and there are parsers P associated with the toek , then it executes P. Otherwise, it executes only the parsers associated with the auxiliary token "ident".

symbol: Lean.Parser.LeadingIdentBehavior
• LeadingIdentBehavior.both: if the leading token an identifier , the it executes the parsers associated with token and parsers associated with the auxiliary token "ident".

both: Lean.Parser.LeadingIdentBehavior

The type LeadingIdentBehavior specifies how the parsing table lookup function behaves for identifiers. The function prattParser uses two tables leadingTable and trailingTable. They map tokens to parsers.

We use LeadingIdentBehavior.symbol and LeadingIdentBehavior.both and nonReservedSymbol parser to implement the tactic parsers. The idea is to avoid creating a reserved symbol for each builtin tactic (e.g., apply, assumption, etc.). That is, users may still use these symbols as identifiers (e.g., naming a function).

Instances For
• The name of a declaration which will be used as the target of go-to-definition queries and from which doc strings will be extracted. This is a dummy declaration of type Lean.Parser.Category created by declare_syntax_cat, but for builtin categories the declaration is made manually and passed to registerBuiltinParserAttribute.

declName : Lean.Name
• The list of syntax nodes that can parse into this category. This can be used to list all syntaxes in the category.

• The parsing tables, which consist of a dynamic set of parser functions based on the syntaxes that have been declared so far.

• The LeadingIdentBehavior, which specifies how the parsing table lookup function behaves for the first identifier to be parsed. This is used by the tactic parser to avoid creating a reserved symbol for each builtin tactic (e.g., apply, assumption, etc.).

Each parser category is implemented using a Pratt's parser. The system comes equipped with the following categories: level, term, tactic, and command. Users and plugins may define extra categories.

The method

categoryParser term prec


executes the Pratt's parser for category term with precedence prec. That is, only parsers with precedence at least prec are considered. The method termParser prec is equivalent to the method above.

Instances For
Equations
@[inline]
Equations
def Lean.Parser.indexed {α : Type} (map : ) (behavior : Lean.Parser.LeadingIdentBehavior) :
Equations
• One or more equations did not get rendered due to their size.
@[inline]
Equations
Equations
Equations
• One or more equations did not get rendered due to their size.

# Antiquotations #

Fail if previous token is immediately followed by ':'.

Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
def Lean.Parser.mkAntiquot (name : String) (kind : Lean.SyntaxNodeKind) (anonymous : ) (isPseudoKind : ) :

Define parser for $e (if anonymous == true) and $e:name. kind is embedded in the antiquotation's kind, and checked at syntax match unless isPseudoKind is true. Antiquotations can be escaped as in e, which produces the syntax tree for $e. Equations • One or more equations did not get rendered due to their size. Equations • One or more equations did not get rendered due to their size. Optimized version of mkAntiquot ... <|> p. Equations Equations Parse $[p]suffix, e.g. $[p],*. Equations • One or more equations did not get rendered due to their size. Parse suffix after an antiquotation, e.g. $x,*, and put both into a new node.

Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.

# End of Antiquotations #

def Lean.Parser.sepBy (p : Lean.Parser.Parser) (sep : String) (psep : ) (allowTrailingSep : ) :
Equations
def Lean.Parser.sepBy1 (p : Lean.Parser.Parser) (sep : String) (psep : ) (allowTrailingSep : ) :
Equations
Equations
• One or more equations did not get rendered due to their size.
Equations

Implements a variant of Pratt's algorithm. In Pratt's algorithms tokens have a right and left binding power. In our implementation, parsers have precedence instead. This method selects a parser (or more, via longestMatchFn) from leadingTable based on the current token. Note that the unindexed leadingParsers parsers are also tried. We have the unidexed leadingParsers because some parsers do not have a "first token". Example:

syntax term:51 "≤" ident "<" term "|" term : index


Example, in principle, the set of first tokens for this parser is any token that can start a term, but this set is always changing. Thus, this parsing rule is stored as an unindexed leading parser at leadingParsers. After processing the leading parser, we chain with parsers from trailingTable/trailingParsers that have precedence at least c.prec where c is the ParsingContext. Recall that c.prec is set by categoryParser.

Note that in the original Pratt's algorith, precedences are only checked before calling trailing parsers. In our implementation, leading and trailing parsers check the precendece. We claim our algorithm is more flexible, modular and easier to understand.

antiquotParser should be a mkAntiquot` parser (or always fail) and is tried before all other parsers. It should not be added to the regular leading parsers because it would heavily overlap with antiquotation parsers nested inside them.

Equations
• One or more equations did not get rendered due to their size.
Equations
• One or more equations did not get rendered due to their size.
Equations
def Lean.Syntax.foldArgsM {m : Type u_1 → Type u_2} [inst : ] {β : Type u_1} (s : Lean.Syntax) (f : Lean.Syntaxβm β) (b : β) :
m β
Equations
def Lean.Syntax.foldArgs {β : Type u_1} (s : Lean.Syntax) (f : Lean.Syntaxββ) (b : β) :
β
Equations
def Lean.Syntax.forArgsM {m : TypeType u_1} [inst : ] (s : Lean.Syntax) (f : ) :
Equations