Zulip Chat Archive

Stream: mathlib4

Topic: typeclass synthesis benchmarks

Matthew Ballard (Feb 26 2026 at 14:12):

I've thought for a while that it would be nice to have some concrete benchmarks in CI for detecting when changes materially affect the performance of typeclass synthesis.

One benchmark that seems reasonable is to instantiate the parameters of a typeclass as generically as possible and time how long it takes to fail.

As an experiment, I tasked claude code to do this. At first it decided it was best if all the type arguments were metavariables, instead of free. This was not what I was expecting but the reasoning was the metavariables would specialize to any type carrying an instance.

Performance was materially different from, though the times were very small compared to, my intended setup so this was left as a separate test.

The code for generating each test was pretty good to me at quick inspection.

I then let it freelance on "human-readable" representations of the instance graph traversed. It made a nice flame graph in html which I found much more consumable than the cycle of clicking to unfold traces. The code for html and the consumption of the tracing output was more gross but that might also reflect the current situation somewhat, unsure because I don't try to use lean to generate html.

The code itself can be found at https://github.com/leanprover-community/mathlib4/compare/master...mattrobball:mathlib4_fork:synth-failure-benchmark

For some reason it thought storing the output json in .lake was a good idea. But as I said, I was pretty hands off.

Running it did return some problematic instances with a repro below. The full file is
synth_failure_benchmark_concrete.json

import Mathlib

open CategoryTheory

-- #1: CategoryTheory.GrothendieckTopology.HasSheafCompose
example {C : Type*} [Category C] (J : GrothendieckTopology C)
    {A B : Type*} [Category A] [Category B] (F : A ⥤ B) :
    J.HasSheafCompose F := inferInstance

-- #2: CategoryTheory.Injective
example {C : Type*} [Category C] (J : C) : Injective J := inferInstance

-- #3: CategoryTheory.Projective
example {C : Type*} [Category C] (P : C) : Projective P := inferInstance

-- #4: MeasurableSpace.CountablySeparated
example {α : Type*} [MeasurableSpace α] :
    MeasurableSpace.CountablySeparated α := inferInstance

I think tracking such information could be useful but I don't know enough about the inherent variability in the process to figure out meaningful cutoffs.

Overall, I think a bit more investigation is needed before PRing anything for CI.

Matthew Ballard (Feb 26 2026 at 14:20):

Bonus points if you can guess the problems with Injective and Projective off the top of your head :)

Last updated: Feb 28 2026 at 14:05 UTC