Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: REPL: automated incremental state reuse across commands

Auguste Poiroux (Jun 30 2025 at 12:37):

Hello everyone! I have been working on an experimental feature for the Lean REPL that adds automatic incremental state reuse across commands to improve performance.

How it works

The idea, originally from @Kim Morrison in the repl++ branch, is to automatically reuse incremental states from previous commands and files submitted to the REPL. If you are familiar with the VSCode Lean extension, this works similarly to the orange progress bar that shows incremental elaboration only restarts from where changes occur.

The key difference in the current PR is that incremental state reuse happens across all previously submitted commands and files, not just the latest version of a single file.

Example

A concrete example is always better than a long explanation:

Files

File1.lean:

import Mathlib

def fib : Nat → Nat
  | 0 => 0
  | 1 => 1
  | n + 2 => fib (n + 1) + fib n
#eval fib 35

theorem foo : n = n := by rfl

File2.lean:

import Mathlib

theorem mathd_numbertheory_188 : Nat.gcd 180 168 = 12 := by norm_num

File3.lean:

import Mathlib

def fib : Nat → Nat
  | 0 => 0
  | 1 => 1
  | n + 2 => fib (n + 1) + fib n
#eval fib 35

theorem bar : n + 0 = n := by rfl

Executing {"path": "File1.lean"}, {"path": "File2.lean"}, and {"path": "File3.lean"} in order produces the following results:

File1.lean: elaborated from scratch (~8s on my laptop)
File2.lean: reuses the import Mathlib from File1.lean (~0.02s)
File3.lean: reuses everything from File1.lean up to the foo theorem (~0.01s)

Output after running the 3 commands above

Use cases

Instead of manually decomposing Lean content into several commands individually sent to the REPL to reuse states later, you can now just submit whole files or commands, and the REPL will automatically determine the best incremental state to reuse.
In the LLM era where test-time scaling is important, thousands of sampled file edits can be checked efficiently using this feature.

Try it yourself

I would be happy to get your opinion and feedback on this feature, as it's still in development. This feature is backported and available for all Lean versions between v4.8.0-rc1 and v4.21.0, so choose the one you prefer ;)

You can try it with LeanInteract:

LeanInteract example

Setup:

pip install lean-interact==0.6.2

from lean_interact import AutoLeanServer, Command, LeanREPLConfig, TempRequireProject, LocalProject

# Running this will take a few minutes the first time, as it will set up a project depending on mathlib
config = LeanREPLConfig(
    lean_version="v4.21.0",  # or any other version you want to test
    project=TempRequireProject("mathlib"),  # swap this for your own local project with `LocalProject`
    repl_rev="v1.1.0-dev",
    verbose=True,
)
server = AutoLeanServer(config)

Running examples:

File1.lean:

import time

start_time = time.time()
print(server.run(Command(cmd="""
import Mathlib

def fib : Nat → Nat
  | 0 => 0
  | 1 => 1
  | n + 2 => fib (n + 1) + fib n
#eval fib 35

theorem foo : n = n := by rfl
""")))
print("Execution time:", time.time() - start_time)

File2.lean:

start_time = time.time()
print(server.run(Command(cmd="""
import Mathlib

theorem mathd_numbertheory_188 : Nat.gcd 180 168 = 12 := by norm_num
""")))
print("Execution time:", time.time() - start_time)

File3.lean:

start_time = time.time()
print(server.run(Command(cmd="""
import Mathlib

def fib : Nat → Nat
  | 0 => 0
  | 1 => 1
  | n + 2 => fib (n + 1) + fib n
#eval fib 35

theorem bar : n + 0 = n := by rfl
""")))
print("Execution time:", time.time() - start_time)

If you prefer to use the REPL directly or your own wrapper, you can simply clone my repl fork and checkout v1.1.0-dev_lean-toolchain-{lean_version}, or directly the PR using Lean v4.21.0.

Implementation details

The prefix matching is implemented using a trie, meaning that the complexity of finding the best match is constant in the number of commands, and linear in the length of the command (i.e., the number of characters).
In practice, so far I observed that the overhead, both in terms of runtime and memory, is negligible compared to elaborating a command. I would be happy to hear your experience and feedback on this.

Current limitations

~~- Incremental state reuse is only applied at the command-block level, meaning that changes in proofs still require a full re-elaboration of the proofs.~~

If the content of the modules you import changes, the REPL has to be restarted (similar to having to do Restart File in VSCode). I am open to suggestions on how to improve this.

Justin Asher (Jun 30 2025 at 13:55):

Thanks! This seems super helpful, and I am really excited to use it.

Auguste Poiroux (Jun 30 2025 at 14:10):

I think this feature could be quite useful in a MCP server. Current AI code agents tend to generate multiple file edits. This feature would allow the model to check the effect of each edit more efficiently.

Justin Asher (Jun 30 2025 at 14:24):

I agree 100%. This should speed a lot of things up.

Auguste Poiroux (Jun 30 2025 at 15:32):

Turns out that I was wrong. Changes in proofs also benefit from incremental states out of the box (just like in VS Code). Good news for researchers generating thousands of proofs, you can now optimize your stack to maximize prefix matching per REPL instance to check proofs faster and with less memory ;)

Oliver Dressler (Jun 30 2025 at 15:47):

Auguste Poiroux said:

I think this feature could be quite useful in a MCP server. Current AI code agents tend to generate multiple file edits. This feature would allow the model to check the effect of each edit more efficiently.

This sounds interesting! I have a use case where an agent can effectively check multiple single-line snippets to find the best way forward:
https://github.com/oOo0oOo/lean-lsp-mcp/blob/87f23ffccec16a2d77c47b6c55d383e3a13ce7f0/src/lean_lsp_mcp/server.py#L397

Currently this is a crude serial implementation using file updates via LSP. Am I right, that this could be sped up quite a bit using LeanInteract?

Auguste Poiroux (Jun 30 2025 at 16:39):

Hmm maybe, I am not sure. The LSP is also relying on incremental states, right? So it should elaborate files starting from the line you edit. Or maybe I am missing something?

Oliver Dressler (Jun 30 2025 at 17:48):

You are right, it is incremental. There might still be some gains, as the LSP performs a relatively complex analysis. Another thing complicating the comparison is the (relatively new) parallel elaboration in the LSP.
I'll run some benchmarks soon to compare.

Auguste Poiroux (Jun 30 2025 at 18:04):

Oh nice, I am interested in the results :)

Auguste Poiroux (Jun 30 2025 at 19:13):

Now that I think of it, this opens up new possibilities. We can basically do tactic by tactic proofs by doing this:
{"cmd": "example ... := by\n tac1"}, then {"cmd": "example ... := by\n tac1\n tac2"}, and so on... The complexity will be linear in nb of tactics because of the incremental states. So, this offers an alternative to the REPL tactic mode, without the buggy parts, while being more flexible and just as efficient.
This "appending tactics one by one" approach is mentioned in this old issue (quadratic complexity), and I believe it is used by itp-interface @George Tsoukalas.
I think this should also make easier the development of "block-by-block generation" proving approaches. Generating proof blocks is probably a good intermediate between tactic-by-tactic and whole-proof generation methods.

Last updated: Feb 28 2026 at 14:05 UTC