Note [Incremental Parsing] #
In the language server, we want to minimize the work we do after each edit by reusing previous state where possible. This is true for both parsing, i.e. reusing syntax trees without running the parser, and elaboration. For both, we currently assume that we have to reprocess at least everything from the point of change downwards. This note is about how to find the correct starting point for incremental parsing; for elaboration, we then start with the first changed syntax tree.
One initial thought about incremental parsing could be that it's not necessary as parsing is very fast compared to elaboration; on mathlib we average 41ms parsing per 1000 LoC. But there are quite a few files >= 1kloc (up to 4.5kloc) in there, so near the end of such files lag from always reparsing from the beginning may very well be noticeable.
So if we do want incremental parsing, another thought might be that a user edit can only invalidate
commands at or after the location of the change. Unfortunately, that's not true; take the (partial)
input def a := b private def c
. If we remove the space after private
, the two commands
syntactically become one with an application of privatedef
to b
even though the edit was
strictly after the end of the first command.
So obviously we must include at least the extent of the token that made the parser stop parsing a command as well such that invalidating the private token invalidates the preceding command. Unfortunately this is not sufficient either, given the following input:
structure a where /-- b -/ @[c] private axiom d : Nat
This is a syntactically valid sequence of a field-less structure and a declaration. If we again delete the space after private, it becomes a syntactically correct structure with a single field privateaxiom! So clearly, because of uses of atomic in the grammar, an edit can affect a command syntax tree even across multiple tokens.
What we did in Lean 3 was to always reparse the last command completely preceding the edit location. If its syntax tree is unchanged, we preserve its data and reprocess all following commands only, otherwise we reprocess it fully as well. This worked well but did seem a bit arbitrary given that even if it works for a grammar at some point, it can certainly be extended in ways that break the assumption.
With grammar changes in Lean 4, we found that the following example indeed breaks this assumption:
structure Signature where
/-- a docstring -/
Sort : Type
--^ insert: "s"
As the keyword Sort
is not a valid start of a structure field and the parser backtracks across the
docstring in that case, this is parsed as the complete command structure Signature where
followed
by the partial command /-- a docstring -/ <missing>
. If we insert an s
after the t
, the last
command completely preceding the edit location is the partial command containing the docstring. Thus
we need to go up two commands to ensure we reparse the structure
command as well. This kind of
nested docstring is the only part of the grammar to our knowledge that requires going up at least
two commands; as we never backtrack across more than one docstring, going up two commands should
also be sufficient.
Finally, a more actually principled and generic solution would be to invalidate a syntax tree when
the parser has reached the edit location during parsing. If it did not, surely the edit cannot have
an effect on the syntax tree in question. Sadly such a "high-water mark" parser position does not
exist currently and likely it could at best be approximated by e.g. "furthest tokenFn
parse". Thus
we remain at "go up two commands" at this point.
Note [Incremental Command Elaboration] #
Because of Lean's use of persistent data structures, incremental reuse of fully elaborated commands
is easy because we can simply snapshot the entire state after each command and then restart
elaboration using the stored state at the next command above the point of change. However,
incrementality within elaboration of a single command such as between tactic steps is much harder
because the existing control flow does not allow us to simply return from those points to the
language processor in a way that we can later resume from there. Instead, we exchange the need for
continuations with some limited mutability: by allocating an IO.Promise
"cell" in the language
processor, we can both pass it to the elaborator to eventually fill it using Promise.resolve
as
well as convert it to a Task
that will wait on that resolution using Promise.result
and return
it as part of the command snapshot created by the language processor. The elaborator can then in
turn create new promises itself and store their result
when resolving an outer promise to create
an arbitrary tree of promise-backed snapshot tasks. Thus, we can enable incremental reporting and
reuse inside the elaborator using the same snapshot tree data structures as outside without having
to change the elaborator's control flow.
While ideally we would decide what can be reused during command elaboration using strong hashes over
the full state and inputs, currently we rely on simpler syntactic checks: if all the syntax
inspected up to a certain point is unchanged, we can assume that the old state can be reused. The
central SnapshotBundle
type passed inwards through the elaborator for this purpose combines the
following data:
- the
IO.Promise
to be resolved to an elaborator snapshot (whose type depends on the specific elaborator part we're in, e.g.TacticParsedSnapshot
,BodyProcessedSnapshot
) - if there was a previous run:
- a
SnapshotTask
holding the corresponding snapshot of the run - the relevant
Syntax
of the previous run to be compared before any reuse
- a
Note that as we do not wait for the previous run to finish before starting to elaborate the next
one, the old SnapshotTask
task may not be finished yet. Indeed, if we do find that we can reuse
the contained state because of a successful syntax comparison, we always want to explicitly wait for
the task instead of redoing the work. On the other hand, the Syntax
is not surrounded by a task so
that we can immediately access it for comparisons, even if the snapshot task may, eventually, give
access to the same syntax tree.
For the most part, inside an elaborator participating in incrementality, we just have to ensure that we stop forwarding the old run's data as soon as we notice a relevant difference between old and new syntax tree. For example, allowing incrementality inside the cdot tactic combinator is as simple as
builtin_initialize registerBuiltinIncrementalTactic ``cdot
@[builtin_tactic cdot] def evalTacticCDot : Tactic := fun stx => do
...
closeUsingOrAdmit do
-- save state before/after entering focus on `·`
...
Term.withNarrowedArgTacticReuse (argIdx := 1) evalTactic stx
The Term.withNarrowedArgTacticReuse
combinator will focus on the given argument of stx
, which in
this case is the nested tactic sequence, and run evalTactic
on it. But crucially, it will first
compare all preceding arguments, in this case the cdot token itself, with the old syntax in the
current snapshot bundle, which in the case of tactics is stored in Term.Context.tacSnap?
. Indeed
it is important here to check if the cdot token is identical because its position has been saved in
the info tree, so it would be bad if we later restored some old state that uses a different position
for it even if everything else is unchanged. If there is any mismatch, the bundle's old value is
set to none
in order to prevent reuse from this point on. Note that in any case we still want to
forward the "new" promise in order to provide incremental reporting as well as to construct a
snapshot tree for reuse in future document versions! Note also that we explicitly opted into
incrementality using registerBuiltinIncrementalTactic
as any tactic combinator not written with
these concerns in mind would likely misbehave under incremental reuse.
While it is generally true that we can provide incremental reporting even without reuse, we
generally want to avoid that when it would be confusing/annoying, e.g. when a tactic block is run
multiple times because otherwise the progress bar would snap back and forth multiple times. For this
purpose, we can disable both incremental modes using Term.withoutTacticIncrementality
, assuming we
opted into incrementality because of other parts of the combinator. induction
is an example of
this because there are some induction alternatives that are run multiple times, so we disable all of
incrementality for them.
Using induction
as a more complex example than cdot
as it calls into evalTactic
multiple
times, here is a summary of what it has to do to implement incrementality:
Narrow
down to the syntax of alternatives, disabling reuse if anything before them changed- allocate one new promise for each given alternative, immediately resolve passed promise to a new snapshot tree node holding them so that the language server can wait on them
- when executing an alternative,
- we put the corresponding promise into the context
- we disable reuse if anything in front of the contained tactic block has changed, including previous alternatives
- we disable reuse and reporting if the tactic block is run multiple times, e.g. in the case of a wildcard pattern
Lean-specific processing context.
- firstDiffPos? : Option String.Pos
Position of the first file difference if there was a previous invocation.
- oldCancelTk? : Option IO.CancelToken
Cancellation token of the previous invocation, if any.
- newCancelTk : IO.CancelToken
Cancellation token of the current run.
Instances For
Monad transformer holding all relevant data for Lean processing.
Instances For
Monad holding all relevant data for Lean processing.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Embeds a LeanProcessingM
action into ProcessingM
, optionally using the old input string to speed
up reuse analysis and supplying a cancellation token that should be triggered as soon as reuse is
ruled out.
Equations
- One or more equations did not get rendered due to their size.
Instances For
Returns true if there was a previous run and the given position is before any textual change compared to it.
Equations
- Lean.Language.Lean.isBeforeEditPos pos = do let __do_lift ← read pure (Option.any (fun (x : String.Pos) => decide (pos < x)) __do_lift.firstDiffPos?)
Instances For
Result of retrieving additional metadata about the current file after parsing imports. In the
language server, these are derived from the lake setup-file
result. On the cmdline and for similar
simple uses, these can be computed eagerly without looking at the imports.
- mainModuleName : Lean.Name
Module name of the file being processed.
- opts : Lean.Options
Options provided outside of the file content, e.g. on the cmdline or in the lakefile.
- trustLevel : UInt32
Kernel trust level.
Instances For
Performance option used by cmdline driver.
Parses values of options registered during import and left by the C++ frontend as strings, fails if any option names remain unknown.
Instances For
Entry point of the Lean language processor.
The setupImports
function is called after the header has been parsed and before the first command
is parsed in order to supply additional file metadata (or abort with a given terminal snapshot); see
SetupImportsResult
.
old?
is a previous resulting snapshot, if any, to be reused for incremental processing.
Instances For
Equations
- One or more equations did not get rendered due to their size.
Instances For
Instances For
Instances For
Convenience function for tool uses of the language processor that skips header handling.
Instances For
Waits for and returns final command state, if importing was successful.
Equations
- Lean.Language.Lean.waitForFinalCmdState? snap = do let snap ← snap.result? let snap ← snap.processedSnap.get.result? Lean.Language.Lean.waitForFinalCmdState?.goCmd snap.firstCmdSnap.get