Zulip Chat Archive
Stream: general
Topic: Exploring Impact Evaluation Tools for Lean
madab (Aug 12 2025 at 18:48):
Hi everyone,
I recently attended the Impact Evaluator Research Retreat in Iceland and wrote down some thoughts on potential approaches for building tools that could help evaluate and guide work in Lean:
My aim is to start a discussion about what the Lean community would value most in this space, and what could be most impactful to pursue. I’m especially curious about:
- Which kinds of evaluation or measurement tools could make collaboration easier or more effective?
- What would help contributors focus on the highest-impact work?
As a small proof of concept, I recently experimented with combining the Lean dependency graph with git blame data. I’d be happy to help shape and concretize projects in this direction if there’s interest.
experiment here: lean-extract
Looking forward to hearing your thoughts!
Wisdom
Bolton Bailey (Aug 23 2025 at 22:08):
My thoughts on your seven tools in section 7, and their status in the Lean/Mathlib ecosystem:
- Dependency graph explorer : Makes the structure of the project visible, down to the level of individual lemmas, and updates automatically as the project evolves.
- LeanBlueprint is the typical tool for this.
- Critical gap detection : Flags “critical sorries”, missing proofs that sit on high-leverage paths in the dependency graph.
- Mathlib doesn't allow sorried code, which gives rise to a situation where there is not a lot of work that sits on potentially unstable foundations.
- That being said, this shows chains of dependent github PRs. (ty Michael)
- Progress dashboards : Show completion percentage for the whole project, with trends over time, built from live Lean data.
- Contribution attribution : Connects each result in the graph to its contributors, so recognition and historical tracking are automatic.
- The stats page used to have a sort of leaderboard in the lean3 days, and there is the github contributors page. But I think the diff-size approach overstates the impact of library level rewrites or refactors.
- Bottleneck finder / Dynamic difficulty estimator / Foundational value scoring
- I think there is less along these lines, though I have recently been trying to use LLMs to ask questions about the 1000+ theorems database to see if I can find theorems that are important in these ways.
One thing I would personally be interested in seeing might be a list of all theorems that the mathlib changelog tells me I added, sorted by number of times those theorems have been used by PRs that weren't mine.
Michael Rothgang (Aug 24 2025 at 17:18):
The dependency graph project (for "critical gaps") is https://leanprover-community.github.io/queueboard/dependency_dashboard.html.
Michael Rothgang (Aug 24 2025 at 17:20):
Let me emphasize that in my personal experience,
- I have a stack of open PRs ready for review, but most interesting work is on a branch that's not yet split for review. So the dependency graph will only tell you so much. Looking at the total diff might be better.
- your impression "this knowledge is implicit in experienced contributors' heads" seems spot on; certainly most critical gaps in the differential geometry library are in the relevant people's heads, but not written down anywhere. (The bottleneck AFAICT are interested contributors, not distributing the work.)
Last updated: Dec 20 2025 at 21:32 UTC