Zulip Chat Archive

Stream: lean4

Topic: Aesop statistics

Jannis Limperg (Feb 17 2024 at 11:43):

Aesop can now collect statistics about the performance of multiple Aesop runs. The collection is enabled by set_option aesop.collectStats true (possibly just for certain Aesop calls). Then, after the relevant modules have been rebuilt, a report can be generated with #aesop_stats. This command displays stats about all Aesop calls in the current module and (transitively) imported modules. E.g. to display stats about Mathlib:

import Mathlib

#aesop_stats

Unfortunately, it's currently not easily possible to set aesop.collectStats for the entirety of Mathlib since aesop.collectStats is not a builtin option and so lean -D refuses it. But it would be easy to look at specific Aesop wrappers, such as aesop_cat, by setting the option in the macro.

Here are stats for current-ish Mathlib. (Zulip won't let me inline them due to the limit on message length.)

Main takeaways:

The average Aesop call is ~240ms, which is not terrible but also not great.
simp dominates everything else by at least an order of magnitude.
Some rules are currently unused but still take up a fair amount of time, e.g. Continuous.star succeeds never but fails 25 times and each failure takes 174ms on average.

Jannis Limperg (Feb 17 2024 at 11:49):

(I'm aware that the subcategories add up to slightly more than the total time. :upside_down: I've spent a lot of time trying to understand why that is, and I still have no idea.)

Geoffrey Irving (Feb 17 2024 at 16:27):

If Aesop is using simp_all, does it effectively cache everything so that only new terms are simplified? Or is that impossible because each new term might simplify some other term?

Jannis Limperg (Feb 17 2024 at 17:08):

It's surprisingly hard:

simp_all results depend on all hypotheses, so are hard to cache across goals.
Anyway FVarIds can change, so expressions involving hypotheses are hard to cache across goals.

I have some ideas for improvements:

The simp_all pass can be split into a simp at * pass and a simp_all pass. The first is goal-independent and the second would hopefully have little to do in most cases.
For the simp at * pass, we should then be able to cache expressions which can't be simplified further. This should be goal-independent and should, in the context of Aesop, avoid a lot of work.
simp could also cache results for fvar-free expressions in a separate cache, which could then be shared across goals. Not sure how much impact this would have.
At some point I'd like to experiment with a more radical programme where Aesop eagerly normalises everything up to reducible transparency. This has an obvious up-front cost, but I suspect it might unlock various optimisations. For example, we can't currently use state-of-the-art indexing techniques from automated theorem proving because they are incompatible with computation. More prosaically, equality checks on pre-normalised expressions should be much cheaper than isDefEq. E.g. assumption spends an average of 1.5ms looking through the context (and usually finding nothing); that seems like a lot to me.

I can't really work on these things atm, but if anyone wants to help, I'd be very excited.

Last updated: May 02 2025 at 03:31 UTC