Zulip Chat Archive

Stream: lean4

Topic: Querying category parsers

Mario Carneiro (Jul 07 2021 at 21:46):

Is there a way to print all syntax extensions supported by a given category parser? Something like lean 3's #print notation to show all expr notations, but for the attr, term, command, tactic categories instead

Mac (Jul 07 2021 at 22:21):

What information about the category's parsers do you want to print?

Mario Carneiro (Jul 07 2021 at 22:27):

What parsers exist?

Mario Carneiro (Jul 07 2021 at 22:28):

Something analogous to a reconstructed syntax command would be great

Mario Carneiro (Jul 07 2021 at 22:29):

also what elaborators are attached to the notation, if possible, although that's probably a separate issue

Mario Carneiro (Jul 07 2021 at 22:31):

but it would be super helpful to be able to #print #print and find out where the #print syntax/elab are defined, independently of server support for this via go to definition, as well as #print category term to see all declared term parsers. But I'm mainly interested in the backend information rather than the presentation right now

Mario Carneiro (Jul 07 2021 at 22:39):

Looking at the parser code, I think this might be impossible without additional information gathering on the part of the parser infrastructure. Parsers are eagerly turned into ParserFn when they are stuck into the parse tables, and the only data collection that is supported is collectTokens and collectKinds, which helps a bit (you can look at the syntax kinds to figure out what parser is getting called) but it's not enough to reconstruct a syntax invocation unless you just try probing the ParserFn which doesn't sound like much fun

Mac (Jul 07 2021 at 22:44):

Yeah, that was why I was asking, the parser functions for a given syntax can be arbitrary and no information is recorded about the syntax kinds it produces or the strings it parses outside of the tokens that compose it (with the most significant piece being its first token).

Mac (Jul 07 2021 at 22:48):

You can, however, do something like the following to inspect what information there is though:

import Lean
open Lean Parser Elab Command

syntax (name := inspectCatCmd) "#inspect_cat " ident : command

@[commandElab inspectCatCmd]
def elabInspectCat : CommandElab
| `(#inspect_cat%$kw $catId:ident) => do
  let catName := catId.getId
  let cats := parserExtension.getState (← getEnv) |>.categories
  match cats.find? catName with
  | some cat =>
    logInfoAt kw <| repr <| cat.tables.leadingTable.toList.map (·.1)
  | none =>
    logErrorAt catId s!"unknown category `{catName}`"
| _ => throwUnsupportedSyntax

declare_syntax_cat myCat
syntax strLit : myCat

#inspect_cat myCat -- [`strLit, `«$»]
#inspect_cat foo   -- unknown category `foo`

The above code creates a command that logs the category's known leading token kinds (and could be modified to print other information).

Mario Carneiro (Jul 07 2021 at 23:30):

For posterity, here's a utility that dumps all syntax node kinds, organized by category and leading token/precedence:

import Lean
open Lean Lean.Parser

#eval show CoreM Unit from do
  let env ← getEnv
  let c := mkParserContext (mkInputContext "" "") { env := env, options := {} }
  let cats := (parserExtension.getState c.env).categories
  cats.forM fun name cat => do
    println! "---------------- category '{name}'"
    let tables := cat.tables
    let forTables (f : Bool → Option Name → Parser × Nat → CoreM Unit) : CoreM Unit := do
      tables.leadingParsers.forM (f true none)
      tables.leadingTable.forM fun n l => l.forM (f true n)
      tables.trailingParsers.forM (f false none)
      tables.trailingTable.forM fun n l => l.forM (f false n)
    forTables fun lead n (p, prio) => do
      println! "-- {if lead then "leading" else "trailing"}: {n.getD `none} at {prio}"
      let kinds := p.info.collectKinds {}
      let map := kinds.foldl (init := #[]) fun map k _ => map.push k
      let map := map.insertionSort Name.lt
      for k in map do
        if (env.find? k).isSome then
          println! "#print {k}"
        else
          let k2 := if let Name.str k "antiquot" _ := k then k else k
          if (env.find? k2).isSome then
            println! "#print {k2} -- .antiquot"
      println! ""
    println! ""

Incidentally, it outputs a file containing thousands of #print commands, which made me notice that the current behavior of #print seems to be quadratic in something, because it slows down pretty quickly and it looks like it might take a very long time to get to the end

Sebastian Ullrich (Jul 08 2021 at 07:11):

Seems fine on the cmdline? I'm afraid to open it in an editor.

Mario Carneiro (Jul 08 2021 at 09:03):

The #eval that produces the script is itself very fast, but if you copy the output into a lean file the #print statements take progressively longer to check. By the time there are 400 of them it's taking more than a second per #print, which seems very suspicious. I know that vscode isn't the culprit because it will be updating diagnostics all the while - they tend to flicker due to server response time when the number of diagnostics gets into the thousands but it's still responsive

Sebastian Ullrich (Jul 08 2021 at 09:06):

Emacs/flycheck starts shutting down at around 200 messages, haha. Since it's fast on the cmdline, it is probably the server's fault.

Mario Carneiro (Jul 08 2021 at 09:10):

It seems like the server wants to update the client after every command, which probably leads to the quadratic behavior since it has to send (and re-layout?) all previous messages each time

Sebastian Ullrich (Jul 08 2021 at 09:10):

Oh yes, that is certainly quadratic

Mario Carneiro (Jul 08 2021 at 09:11):

in this case it would be advantageous for the server to back off and not update as frequently, which is probably easier to implement than making the diagnostic marshalling asynchronous

Sebastian Ullrich (Jul 08 2021 at 09:13):

Now that Gabriel has separated file progress notifications from diagnostics, that does sound like a good idea

Sebastian Ullrich (Jul 08 2021 at 09:21):

Well, not sure. As soon as a single #print command + diagnostics publishing is as slow as the back off, you would be back at the current performance. I'm not sure this is a use case we need to support, or at least it's one that LSP makes very hard to properly support.

Mario Carneiro (Jul 08 2021 at 09:23):

The approach I would take is something like: progress reporting is not allowed to take more than X% of total run time. If it gets above that, start reporting less often

Mario Carneiro (Jul 08 2021 at 09:24):

there are ways to approximate that without doing detailed profiling all the time

Mario Carneiro (Jul 08 2021 at 09:27):

Actually, one very easy metric is that you shouldn't send a progress update if the last progress update was less than X ms ago (not counting the time to send the progress report itself). That is, lean should always get a large enough time slice, to avoid a situation like this one where the commands are super fast but the mandatory progress reports between each one are slow

Gabriel Ebner (Jul 08 2021 at 09:27):

This is the reason why in Lean 3 diagnostic messages are sent using a timer (i.e., every 100ms).

Sebastian Ullrich (Jul 08 2021 at 09:30):

Does it really make sense to look at such monster files in the editor? Does VSCode scale to thousands of messages?

Mario Carneiro (Jul 08 2021 at 09:32):

the file I was looking at was about 8000 lines of #print statements. There are mathlib files larger than that

Gabriel Ebner (Jul 08 2021 at 09:34):

~~8000 messages shouldn't really be an issue (if updates are infrequent enough).~~

Gabriel Ebner (Jul 08 2021 at 09:39):

Looks like vscode performance regressed quite a bit. They apparently cap the number of diagnostics at 1000 now. Infoview is still peachy though.

Daniel Fabian (Jul 08 2021 at 09:41):

(could we possibly only add diagnostics to whatever's visible?) Is that info available?

Gabriel Ebner (Jul 08 2021 at 09:45):

That might be a workaround in case there's too many diagnostics. We could also cap the diagnostics to 1k/file in vscode. The infoview can still show all diagnostics.

Gabriel Ebner (Jul 08 2021 at 09:45):

Ideally, this would be fixed in vscode though. I just tried with neovim and Lean 3, and it displays 10k diagnostics just fine.

Daniel Fabian (Jul 08 2021 at 09:48):

hmm, it looks like at least for some api's you get the view port info: https://code.visualstudio.com/updates/v1_54

Daniel Fabian (Jul 08 2021 at 09:49):

and as for doing it in vscode, it's still a bad idea to do the computation for 10k things if you only ever show 20 on the screen.

Mario Carneiro (Jul 08 2021 at 09:50):

well you can still see them on the minimap / gutter

Mac (Jul 08 2021 at 14:46):

Gabriel Ebner said:

This is the reason why in Lean 3 diagnostic messages are sent using a timer (i.e., every 100ms).

Daniel Fabian said:

(could we possibly only add diagnostics to whatever's visible?) Is that info available?

This can be useful for other reasons. I've observed noticeable slowdown in just a 100x line file when editing, for example, macros in the same file. The server re-computes the entire file one each partial edit, thus often resulting in half-edited macro being processed. This breaks the rest of the file thereby producing tons of errors. This happens multiple times per second and results in input lag while editing the macro in VS Code. I definitely think some level of processing delay for the rest of the file would be desirable. For my case, I know there is the 'elaboration delay' setting, but the problem isn't really elaborating the code under the cursor (which I do want fast feedback for) -- it is the fact that it also re-elaborates the rest of the file each edit that causes problems.

Mac (Jul 08 2021 at 14:49):

Mario Carneiro said:

well you can still see them on the minimap / gutter

While this is certainly nice, I think elaboration of the file outside the input view should be done lazily. That is, when, by some heuristic, the server has free time to spare . For example, when there is more than enough free processing power and the server is not currently being requested to do something else.

Last updated: May 02 2025 at 03:31 UTC