Zulip Chat Archive

Stream: lean4

Topic: Retrieve tokens from the Lean parser

jthulhu (Jan 29 2026 at 14:43):

Semantic syntax highlighting provided by the LSP is great, but is a bit slow for my taste. Hence, I like to also turn on the "regular" syntax highlighting of my text editor, which is incomplete (ie. it doesn't know about modifications brought to the parser implemented with meta programming) but instantaneous. The result I'm aiming for is that I have mostly correct instantaneous syntax highlighting, and when the LSP finally outputs the correct highlighting, if it differs, it gets applied. However, the regular syntax highlighter is getting old, in the sense that many keywords have been added to the "builtin" syntax of Lean over the time, and so many of them are missing from the syntax highlighting table in the Lean plugin for the text editor. I walked through the documentation to collect every keyword I could find, to add them, but this is both time consuming and error prone, and would add a large burden on maintenance.

Hence, I would like to know if it were possible to retrieve this information directly from Lean itself. AFAIK, Lean's parser is scannerless, and is not built on an inspectable grammar, but rather on parser combinators, so I'm not sure this is possible.

Eric Paul (Jan 29 2026 at 16:42):

This approach should give you most of the reserved and nonreserved keywords:

import Lean
open Lean Meta Parser

run_meta
  let state := parserExtension.getState (←getEnv)
  let mut keywords : Std.HashSet Token := Std.HashSet.ofArray state.tokens.values

  let cats := state.categories
  for (_, cat) in cats do
    keywords := keywords.insertMany <| cat.tables.leadingTable.keys.map toString
    keywords := keywords.insertMany <| cat.tables.trailingTable.keys.map toString

  Lean.logInfo m!"{keywords.toArray.insertionSort}"

It will miss any nonreserved keywords that are not used to index a parser.

David Thrane Christiansen (Jan 30 2026 at 15:09):

Note that many of these are only keywords in some contexts, so you may want to be a bit judicious. Here, intro and grind are both identifiers in the theorem statement and keywords in the proof:

theorem f : ∀ (intro grind : Nat), intro + grind = grind + intro := by
  intro
  grind

jthulhu (Jan 30 2026 at 15:18):

David Thrane Christiansen said:

Note that many of these are only keywords in some contexts, so you may want to be a bit judicious. Here, intro and grind are both identifiers in the theorem statement and keywords in the proof:
theorem f : ∀ (intro grind : Nat), intro + grind = grind + intro := by
  intro
  grind

Yes of course, this is why I need both syntax highlighters. I'm not trying to re-implement Lean's parser in the syntax highlighter of my text editor, this is what the LSP is for.

Lua Viana Reis (Jan 30 2026 at 23:07):

I'd like to have a tree-sitter parser that instead of using a pre-defined list of keywords, uses the surrounding indentation to guess what should be a command/tactic/etc, with special cases for more complex forms such as have, show. I had a semi-working prototype, but I got stuck making it work efficiently with certain syntax like records { f1 := ..., f2 := ... }, that mix indentation and "items separated by commas" in a somewhat cursed way.

Lua Viana Reis (Jan 30 2026 at 23:09):

I think it's possible, but presently for me it was hard to justify the effort

Last updated: Feb 28 2026 at 14:05 UTC