Note [Incremental Parsing] #

In the language server, we want to minimize the work we do after each edit by reusing previous state where possible. This is true for both parsing, i.e. reusing syntax trees without running the parser, and elaboration. For both, we currently assume that we have to reprocess at least everything from the point of change downwards. This note is about how to find the correct starting point for incremental parsing; for elaboration, we then start with the first changed syntax tree.

One initial thought about incremental parsing could be that it's not necessary as parsing is very fast compared to elaboration; on mathlib we average 41ms parsing per 1000 LoC. But there are quite a few files >= 1kloc (up to 4.5kloc) in there, so near the end of such files lag from always reparsing from the beginning may very well be noticeable.

So if we do want incremental parsing, another thought might be that a user edit can only invalidate commands at or after the location of the change. Unfortunately, that's not true; take the (partial) input def a := b private def c. If we remove the space after private, the two commands syntactically become one with an application of privatedef to b even though the edit was strictly after the end of the first command.

So obviously we must include at least the extent of the token that made the parser stop parsing a command as well such that invalidating the private token invalidates the preceding command. Unfortunately this is not sufficient either, given the following input:

structure a where /-- b -/ @[c] private axiom d : Nat

This is a syntactically valid sequence of a field-less structure and a declaration. If we again delete the space after private, it becomes a syntactically correct structure with a single field privateaxiom! So clearly, because of uses of atomic in the grammar, an edit can affect a command syntax tree even across multiple tokens.

What we did in Lean 3 was to always reparse the last command completely preceding the edit location. If its syntax tree is unchanged, we preserve its data and reprocess all following commands only, otherwise we reprocess it fully as well. This worked well but did seem a bit arbitrary given that even if it works for a grammar at some point, it can certainly be extended in ways that break the assumption.

With grammar changes in Lean 4, we found that the following example indeed breaks this assumption:

structure Signature where
  /-- a docstring -/
  Sort : Type
    --^ insert: "s"

As the keyword Sort is not a valid start of a structure field and the parser backtracks across the docstring in that case, this is parsed as the complete command structure Signature where followed by the partial command /-- a docstring -/ <missing>. If we insert an s after the t, the last command completely preceding the edit location is the partial command containing the docstring. Thus we need to go up two commands to ensure we reparse the structure command as well. This kind of nested docstring is the only part of the grammar to our knowledge that requires going up at least two commands; as we never backtrack across more than one docstring, going up two commands should also be sufficient.

Finally, a more actually principled and generic solution would be to invalidate a syntax tree when the parser has reached the edit location during parsing. If it did not, surely the edit cannot have an effect on the syntax tree in question. Sadly such a "high-water mark" parser position does not exist currently and likely it could at best be approximated by e.g. "furthest tokenFn parse". Thus we remain at "go up two commands" at this point.

Note [Incremental Command Elaboration] #

Because of Lean's use of persistent data structures, incremental reuse of fully elaborated commands is easy because we can simply snapshot the entire state after each command and then restart elaboration using the stored state at the next command above the point of change. However, incrementality within elaboration of a single command such as between tactic steps is much harder because the existing control flow does not allow us to simply return from those points to the language processor in a way that we can later resume from there. Instead, we exchange the need for continuations with some limited mutability: by allocating an IO.Promise "cell" in the language processor, we can both pass it to the elaborator to eventually fill it using Promise.resolve as well as convert it to a Task that will wait on that resolution using Promise.result and return it as part of the command snapshot created by the language processor. The elaborator can then in turn create new promises itself and store their result when resolving an outer promise to create an arbitrary tree of promise-backed snapshot tasks. Thus, we can enable incremental reporting and reuse inside the elaborator using the same snapshot tree data structures as outside without having to change the elaborator's control flow.

While ideally we would decide what can be reused during command elaboration using strong hashes over the full state and inputs, currently we rely on simpler syntactic checks: if all the syntax inspected up to a certain point is unchanged, we can assume that the old state can be reused. The central SnapshotBundle type passed inwards through the elaborator for this purpose combines the following data:

the IO.Promise to be resolved to an elaborator snapshot (whose type depends on the specific elaborator part we're in, e.g. TacticParsedSnapshot, BodyProcessedSnapshot)
if there was a previous run:
- a SnapshotTask holding the corresponding snapshot of the run
- the relevant Syntax of the previous run to be compared before any reuse

Note that as we do not wait for the previous run to finish before starting to elaborate the next one, the old SnapshotTask task may not be finished yet. Indeed, if we do find that we can reuse the contained state because of a successful syntax comparison, we always want to explicitly wait for the task instead of redoing the work. On the other hand, the Syntax is not surrounded by a task so that we can immediately access it for comparisons, even if the snapshot task may, eventually, give access to the same syntax tree.

For the most part, inside an elaborator participating in incrementality, we just have to ensure that we stop forwarding the old run's data as soon as we notice a relevant difference between old and new syntax tree. For example, allowing incrementality inside the cdot tactic combinator is as simple as

builtin_initialize registerBuiltinIncrementalTactic ``cdot
@[builtin_tactic cdot] def evalTacticCDot : Tactic := fun stx => do
  ...
  closeUsingOrAdmit do
    -- save state before/after entering focus on `·`
    ...
    Term.withNarrowedArgTacticReuse (argIdx := 1) evalTactic stx

The Term.withNarrowedArgTacticReuse combinator will focus on the given argument of stx, which in this case is the nested tactic sequence, and run evalTactic on it. But crucially, it will first compare all preceding arguments, in this case the cdot token itself, with the old syntax in the current snapshot bundle, which in the case of tactics is stored in Term.Context.tacSnap?. Indeed it is important here to check if the cdot token is identical because its position has been saved in the info tree, so it would be bad if we later restored some old state that uses a different position for it even if everything else is unchanged. If there is any mismatch, the bundle's old value is set to none in order to prevent reuse from this point on. Note that in any case we still want to forward the "new" promise in order to provide incremental reporting as well as to construct a snapshot tree for reuse in future document versions! Note also that we explicitly opted into incrementality using registerBuiltinIncrementalTactic as any tactic combinator not written with these concerns in mind would likely misbehave under incremental reuse.

While it is generally true that we can provide incremental reporting even without reuse, we generally want to avoid that when it would be confusing/annoying, e.g. when a tactic block is run multiple times because otherwise the progress bar would snap back and forth multiple times. For this purpose, we can disable both incremental modes using Term.withoutTacticIncrementality, assuming we opted into incrementality because of other parts of the combinator. induction is an example of this because there are some induction alternatives that are run multiple times, so we disable all of incrementality for them.

Using induction as a more complex example than cdot as it calls into evalTactic multiple times, here is a summary of what it has to do to implement incrementality:

Narrow down to the syntax of alternatives, disabling reuse if anything before them changed
allocate one new promise for each given alternative, immediately resolve passed promise to a new snapshot tree node holding them so that the language server can wait on them
when executing an alternative,
- we put the corresponding promise into the context
- we disable reuse if anything in front of the contained tactic block has changed, including previous alternatives
- we disable reuse and reporting if the tactic block is run multiple times, e.g. in the case of a wildcard pattern