Zulip Chat Archive

Stream: mathlib4

Topic: Performance cost of environment extensions


Jovan Gerbscheid (Feb 01 2026 at 22:04):

I discovered today that environment extensions (something like PersistentEnvExtension, ScopedEnvExtension or NameMapExtension) cause a noticeable performance drop in mathlib. This is because at the time of initialization (i.e. when loading the imports), each environment extension has to be loaded. This consists of 3 steps:

  1. Create an array with length equal to the number of imported modules.
  2. For all n, for the n-th imported module, put the entries from that module in position n in the array.
  3. Fold over this array of arrays to obtain the initial state of the environment extension.

Even if the number of entries in the extension is neglegible, with how many files mathlib has, the above steps take a significant amount of time. Note that the time spent on this is proportional to the number of (transitive) edges in the import graph, so the time spent on this while building mathlib may grow quadratically in terms of the number of files in mathlib.

In particular, it was #32438 where I discovered that adding/removing 3 (almost) unused environment extensions gives a difference of about 320G instructions. I then went to see if there are places where we could easily get rid of some other environment extensions.

Jovan Gerbscheid (Feb 01 2026 at 22:04):

#34685 refactors the to_additive/to_dual code, removing 2 environment extensions, and giving a -285.4G (-0.16%) improvement.

Jovan Gerbscheid (Feb 01 2026 at 22:05):

#34679 refactors the @[stacks] attribute implementation to avoid step (3), which saves -117.5G (-0.07%)

Jovan Gerbscheid (Feb 01 2026 at 22:06):

#34696 removes another environment extension in to_additive (and is being benchmarked)

Jovan Gerbscheid (Feb 01 2026 at 22:17):

Note that the speedup here is especially beneficial to people writing import Mathlib. In #34685, we get that the file Mathlib speeds up with -129.9M (-0.83%).

Johan Commelin (Feb 02 2026 at 08:14):

Besides these refactors, do you think the steps (1)-(3) could be optimized?

Jovan Gerbscheid (Feb 02 2026 at 11:31):

I think it might be possible to avoid doing step (3) unnecessarily by wrapping the value in a Thunk. I'll see if I can make that work.

Jovan Gerbscheid (Feb 02 2026 at 11:41):

I also made batteries#1646, which deals with the alias environment extension, which gives -100.8G (-0.06%)

Bhavik Mehta (Feb 02 2026 at 11:44):

Is it possible to do the fold over the modules without explicitly building the whole array of arrays? eg perhaps loading each module lazily to build up the initial state?

Jovan Gerbscheid (Feb 02 2026 at 14:37):

That sounds reasonable, though I think it might not give an improvement. The current implementation is as follows:

The finalizeImport function calls setImportedEntries to create the array of arrays (steps (1) and (2)). finalizeImport then calls finalizePersistentExtensions, which applies addImportedFn (this is usually a folding function) to the arrays of arrays (step (3)).

Each module has a ModuleData which has a field entries : Array (Name × Array EnvExtensionEntry) that stores all of the entries from that file (EnvExtensionEntry is a dummy type that will get unsafeCasted). So, to create all of the arrays of arrays for all of the environment extensions, setImportedEntries simply loops through the entries of each module.

The problem with doing a fold over the modules without explicitly making these arrays, is that you would have to do a lookup in the entries of each module, which is costly. This is avoided by making all of the arrays of arrays at once.

Jovan Gerbscheid (Feb 02 2026 at 14:41):

Another idea for improving performance, could be to have a single environment extension that represents multiple different environment extensions, each of which is identified by a Name. This would allow us to merge various not-so-often-used environment extensions into a single one.

Jovan Gerbscheid (Feb 02 2026 at 14:48):

I wrote some code that measures the time takes by step (3) by various environment extensions.

Code

Here are the results that are over 3ms:

Lean.Meta.simpExtension: 491
Lean.Meta.instanceExtension: 132
Lean.Parser.parserExtension: 104
Batteries.Tactic.Lint.batteriesLinterExt: 92
Mathlib.Tactic.ToAdditive.translations: 24
Lean.Meta.Grind.grindExt: 19
Mathlib.Meta.FunProp.functionTheoremsExt: 19
Lean.Meta.globalInstanceExtension: 16
Mathlib.Tactic.GCongr.gcongrExt: 15
Mathlib.StacksTag.tagExt: 12
Continuous: 12
Lean.Elab.macroAttribute: 11
Mathlib.Tactic.ToDual.argInfoAttr: 11
Lean.Meta.NormCast.normCastExt.down: 10
Batteries.Tactic.Alias.aliasExt: 9
Mathlib.Tactic.ToAdditive.argInfoAttr: 8
Lean.Meta.Ext.extExtension: 8
Mathlib.Linter.deprecatedModuleExt: 8
Mathlib.Linter.UnusedTactic.allowedUnusedTacticExt: 8
Mathlib.Tactic.ToDual.translations: 7
Batteries.Tactic.transExt: 7
Bound: 7
Mathlib.Tactic.Push.pushExt: 7
Lean.PrettyPrinter.Delaborator.appUnexpanderAttribute: 7
Lean.Meta.NormCast.pushCastExt: 6
Mathlib.Tactic.Hint.hintExtension: 6
Mathlib.Tactic.ToAdditive.doTranslateAttr: 6
Mathlib.Meta.Positivity.positivityExt: 6
Mathlib.Tactic.ToDual.doTranslateAttr: 6
Lean.Meta.NormCast.normCastExt.up: 6
eqnsAttribute: 5
Lean.Meta.NormCast.normCastExt.squash: 5
Batteries.CodeAction.tacticCodeActionExt: 5
Mathlib.Tactic.ToDual.ignoreArgsAttr: 5
Lean.namespacesExt: 5
Mathlib.Tactic.GCongr.forwardExt: 5
Mathlib.Meta.NormNum.normNumExt: 5
Mathlib.TacticAnalysis.tacticAnalysisExt: 5
Simps.structureExt: 5
Mathlib.Meta.FunProp.morTheoremsExt: 4
Batteries.Util.LibraryNote.libraryNoteExt: 4
Simps.notationClassAttr: 4
Mathlib.Tactic.Translate.changeNumeralAttr: 4
Mathlib.Meta.FunProp.transitionTheoremsExt: 4
Batteries.CodeAction.tacticSeqCodeActionExt: 4
Mathlib.Tactic.Push.pullExt: 4
Mathlib.Tactic.ToAdditive.ignoreArgsAttr: 4
default: 4
Lean.Meta.Rfl.reflExt: 3
Mathlib.Meta.FunProp.lambdaTheoremsExt: 3
Lean.Elab.Term.termElabAttribute: 3
Lean.Elab.Tactic.tacticElabAttribute: 3
SetLike: 3

Jovan Gerbscheid (Feb 02 2026 at 14:51):

The first few are very reasonable: simp, type class instances, parser. But the env_linter should really not take that much time.

It turns out that there was a little oopsie in the implementation of env_linter. The problem is that every file reports all existing @[env_linter] tags as being created in that file. As a result, most mathlib files claim to have over 30 entries in this environment extension. I fix this in batteries#1648, which gives a speedup of -696.2G (-0.39%) in mathlib.

Johan Commelin (Feb 02 2026 at 14:58):

What is the total percentage of build time that you've shaved off in the past two days? From a quick calculation, it's at least 0.7% already!

Kevin Buzzard (Feb 02 2026 at 16:15):

It's easier to count in G-instructions. For example one of the messages above says "we speed up a file by 0.83%" but it's only 129.9M instructions so this is nothing (a typical PR will add far more than that). But the hundreds-of-G speed-ups are a big deal.

Jovan Gerbscheid (Feb 02 2026 at 16:17):

Yes, the G-instructions are nice to use as a benchmark because they are the most consistent. However, it can be that they do not exactly correspond to build time.

Jovan Gerbscheid (Feb 03 2026 at 17:21):

In batteries#1650, I experimented with letting NameMapExtension wrap its state in a Thunk, so that the state is only computed (this is step (3)) when strictly necessary.

The result is a speedup of -1.0T (-0.58%) (which may overlap a bit with the gains from #34685 and #34696).

What I noticed in the above list of initialization times for environment extensions is that all examples of unreasonably slow environment extensions, such as Mathlib.Tactic.ToDual.ignoreArgsAttr, and Continuous (which is the aesop-tag) are defined outside of core Lean, and hence the code is ran by the interpreter, which is why it is slow.

In particular, many of these use NameMapExtension which is defined in Batteries. Hence the idea to make it lazy. I think we could get even more improvements if all environment extensions were computed lazily by default, but this requires a change in core that would run into very annoying bootstrapping problems.

Yury G. Kudryashov (Feb 03 2026 at 17:23):

Can we make Lean compile them and run compiled code?

Jovan Gerbscheid (Feb 03 2026 at 17:26):

The problem with compiled code is that it differs depending on the operating system. The lean binary goes through the effort of having these different versions, but presumably this is more annoying to do for mathlib cache?

Nevertheless, I think this would be really great to have.

Yury G. Kudryashov (Feb 03 2026 at 17:28):

I wouldn't be surprised if compiling it after lake exe cache get will be faster than interpreting the code thousands of times. How do I force Lean to compile meta code in a file I import? I failed to quickly find it in the manual.

Robin Carlier (Feb 03 2026 at 17:30):

Are simp sets declared via register_simp_attr also environment extensions under the hood?

Jovan Gerbscheid (Feb 03 2026 at 17:32):

there is a lake configuration option precompileModules which will compile all imported modules in that library, not just meta code (this is from before the module system). It would be really nice if we could have a version of this which would only compile meta imported code.

Jovan Gerbscheid (Feb 03 2026 at 17:37):

Robin Carlier said:

Are simp sets declared via register_simp_attr also environment extensions under the hood?

Yes they are. But their overhead is a lot less, thanks to running compiled code. It is about 0.3ms, in the above list where I only showed things over 3ms.

Edward van de Meent (Feb 03 2026 at 17:54):

i wonder what the impact of the ring tactic will be here, since having that precompiled will (presumably) require a decent part of the Ring hierarchy being built and compiled?

Jovan Gerbscheid (Feb 03 2026 at 17:57):

I guess that would be a good test to see how smart the module system is :). Presumably, this could be avoided by not meta-importing the Ring hierarchy.

Edward van de Meent (Feb 04 2026 at 00:18):

unfortunately, file#Mathlib/Tactic/Ring/Basic shows that this is not how it is in current mathlib... i think the use of (types of) typed expressions (e.g. Q(Ring $R)) requires the theory be available at the time of compiling? this also goes for the norm_num tactic, i believe...

Jovan Gerbscheid (Feb 04 2026 at 02:43):

I'm making a PR to improve this

Jovan Gerbscheid (Feb 04 2026 at 03:48):

#34823

Jovan Gerbscheid (Feb 04 2026 at 03:50):

We should probably have a technical debt counter for the number of files outside Mathlib.Tactic/Mathlib.Lean that are meta-imported somewhere.

Jovan Gerbscheid (Feb 12 2026 at 12:41):

Jovan Gerbscheid said:

In batteries#1650, I experimented with letting NameMapExtension wrap its state in a Thunk

I have reopened this in batteries#1667, which now has a successful mathlib build, and has a speedup of -890.3G (-0.49%)

Jovan Gerbscheid (Feb 12 2026 at 12:49):

I do wonder how big of a performance improvement we could get by using a lazy discrimination tree in simp. When writing import Mathlib, 0.5 seconds are spent simply on building the simp discrimination tree.

We don't even need all of the nodes of the tree to be lazy, we could simply wrap all of the depth-1 nodes of the tree in a Thunk.

Jovan Gerbscheid (Feb 14 2026 at 08:07):

It feels a bit like an uphill battle with these performance improvements :sweat_smile:

build times over the last 10 days

Ruben Van de Velde (Feb 14 2026 at 08:26):

Can you plot build times per line of code?

Jovan Gerbscheid (Feb 14 2026 at 08:31):

It turns out we can plot the number of lines of code in the same plot!

afbeelding.png

Jovan Gerbscheid (Feb 14 2026 at 08:33):

Actually, the above shows the build instructions. The actual build times (task clock) are a lot less deterministic:

afbeelding.png


Last updated: Feb 28 2026 at 14:05 UTC