Zulip Chat Archive

Stream: general

Topic: visualizations

Daniel Selsam (Oct 09 2019 at 02:00):

Here is a visualization of the Mathlib instance graph, courtesy of @Sebastian Ullrich :
mathlib_instances_alt3.png
There are many visualization parameters that we haven't gotten to experiment with yet.

Mario Carneiro (Oct 09 2019 at 02:11):

that is cool for reasons entirely unrelated to the mathlib instance hierarchy

Rob Lewis (Oct 09 2019 at 08:51):

These never got posted to a public channel here, but Patrick and I were experimenting with some graphs last week.
mathlib import hierarchy, tactic and meta folders removed
same thing, different display
full mathlib import hierarchy, how Mathematica thinks it should be clustered
Patrick's visualization of that
an approximation to the "structure hierarchy", not the same as the instance graph
a different approximation to the same thing

Sebastian Ullrich (Oct 09 2019 at 08:56):

@Rob Lewis Attachments shared privately are not accessible from other channels

Rob Lewis (Oct 09 2019 at 09:02):

Oops, Zulip has better security than I expected. Fixed?

Sebastian Ullrich (Oct 09 2019 at 09:13):

@Rob Lewis Yes!

Sebastian Ullrich (Oct 09 2019 at 09:25):

What were your approximations for the structure hierarchy?

Rob Lewis (Oct 09 2019 at 09:31):

The first one is from @Patrick Massot . "meta-programming to get all classes and all instances. And then grep to guess which instances are extensions." The second is mine, it's all and only uses of the extends keyword.

Sebastian Ullrich (Oct 09 2019 at 10:44):

Interesting. So yours is naturally diamond-free. I believe the best-effort approximation would be to only include edges from instances of the form [A xs] : B ys where xs is a subset of ys. That excludes undesired edges like monad -> alternative from monad m -> alternative (option_t m)

Rob Lewis (Oct 09 2019 at 10:55):

Mine is an underapproximation, and Patrick's is an overapproximation because it has some edges like that. The problem is I don't know exactly what we're trying to approximate. That sounds like a reasonable heuristic.

Mario Carneiro (Oct 09 2019 at 11:01):

using extends isn't naturally diamond free unless you never use old structure commands, and we usually use old structure command when we are about to make a diamond...

Mario Carneiro (Oct 09 2019 at 11:03):

It's also obviously not diamond free

Rob Lewis (Oct 09 2019 at 11:03):

Indeed, you can see diamonds in both my and Patrick's graphs.

Sebastian Ullrich (Oct 09 2019 at 11:12):

Ah, right...

Patrick Massot (Oct 09 2019 at 12:34):

More details about my method:

run the make_all.sh script
change the master all.lean to all.lean
run lean all.lean 2> mathlib.yaml, get mathlib.yaml
run instance_graph.py, get instances.graphml

See https://www.yworks.com/yed-live/?file=https://gist.githubusercontent.com/PatrickMassot/1f0d0f74661d43489379b2c464a798e8/raw/instances

Patrick Massot (Oct 09 2019 at 15:58):

Mathematical components visualization at: https://math-comp.github.io/math-comp/htmldoc/libgraph.html

Sebastian Ullrich (Oct 11 2019 at 12:05):

Okay, here is my version of the class inheritance graph: mathlib-coercions.svg
With 320 edges, it has still a few more than the other versions. Fun fact: the highest number of distincts paths between two classes is 407 between discrete_linear_ordered_field and has_add. ~~Which are actually all in core, not mathlib.~~

Patrick Massot (Oct 11 2019 at 12:17):

Nice!

Mario Carneiro (Oct 11 2019 at 13:30):

wow, that's less than I expected

Mario Carneiro (Oct 11 2019 at 13:31):

but it explains why the exponential blowup problem hasn't hit us that hard

Sebastian Ullrich (Oct 11 2019 at 14:02):

On the other hand, this is just the class hierarchy. If we look at actual resolution problems, the number of steps (tried instances) goes much higher...

[
  {
    "type": "decidable (has_mem.mem.{u_1 u_1} (set.{u_1} α) (set.{u_1} (set.{u_1} α)) (set.has_mem.{u_1} (set.{u_1} α)) (set_of.{u_1} α (fun (a : α), (not (eq.{1} ennreal (has_coe_to_fun.coe.{(max (u_1+1) 1) (max (u_1+1) 1)} (measure_theory.simple_func.{u_1 0} α (measure_theory.measure_space.to_measurable_space.{u_1} α _inst_1) ennreal) (measure_theory.simple_func.has_coe_to_fun.{u_1 0} α ennreal (measure_theory.measure_space.to_measurable_space.{u_1} α _inst_1)) s a) (lattice.has_top.top.{0} ennreal (lattice.order_top.to_has_top.{0} ennreal (lattice.bounded_lattice.to_order_top.{0} ennreal (lattice.complete_lattice.to_bounded_lattice.{0} ennreal (lattice.complete_linear_order.to_complete_lattice.{0} ennreal ennreal.lattice.complete_linear_order))))))))) (filter.sets.{u_1} α (measure_theory.measure.a_e.{u_1} α (measure_theory.measure_space.to_measurable_space.{u_1} α _inst_1) (measure_theory.measure_space.μ.{u_1} α _inst_1))))",
    "max_depth": 10,
    "steps": 22567
  },
  {
    "type": "has_scalar.{0 u_1} real α",
    "max_depth": 12,
    "steps": 39150
  },
  {
    "type": "has_scalar.{0 u_1} real α",
    "max_depth": 12,
    "steps": 39155
  },
  {
    "type": "has_scalar.{0 u} real α",
    "max_depth": 12,
    "steps": 39155
  },
  {
    "type": "has_scalar.{0 u_1} real α",
    "max_depth": 12,
    "steps": 39158
  },
  {
    "type": "has_scalar.{0 u_2} real F",
    "max_depth": 12,
    "steps": 39158
  },
  {
    "type": "has_scalar.{0 u_4} real G",
    "max_depth": 12,
    "steps": 39158
  },
  {
    "type": "has_scalar.{0 u_1} real E",
    "max_depth": 12,
    "steps": 39164
  },
  {
    "type": "has_scalar.{0 u_1} real γ",
    "max_depth": 12,
    "steps": 39182
  },
  {
    "type": "has_scalar.{u_2 (max u u_1)} K (α -> γ)",
    "max_depth": 13,
    "steps": 59428
  }
]

Sebastian Ullrich (Oct 11 2019 at 14:03):

I didn't record source locations unfortunately, but perhaps you can guess

Floris van Doorn (Oct 11 2019 at 14:59):

Are these the highest numbers in mathlib, or a random sample with a high number of steps?

Sebastian Ullrich (Oct 11 2019 at 15:06):

These are the top 10 in stdlib + mathlib (I guess they are all in mathlib)

Kevin Buzzard (Oct 11 2019 at 17:25):

Can I post one or more of these on Twitter or on my blog? Or maybe Sebastian should as he has actually had something to do with them? I could steal them, host them locally, link to them and give the correct attribution, but of course people are not happy with this then I'm open to other suggestions. The reason I'm interested is that I got all our first year undergraduates to do a concept map of mathematics, which basically looks like the sort of thing which is being posted here, and I think it would be cool to say that "this is what Lean's concept map of mathematics currently looks like. I'm getting a lot of hits at the minute and I think that this is cool material which would have general appeal. But obviously I didn't have anything to do with any of it so there needs to be a discussion about how I do it, if indeed I do it at all.

Rob Lewis (Oct 11 2019 at 17:36):

Feel free to post any of mine.

Jeremy Avigad (Oct 11 2019 at 21:42):

@Sebastian Ullrich I have recently started worrying that our coding of the algebraic hierarchy is getting out of hand, e.g. compared to the mathematical components library, which is designed to find the relevant structure with essentially no search. There are certainly ways we could code the hierarchy to avoid the crazy number of steps between discrete-linear-ordered-field and has_add. Do you think it will be a problem if the algebraic hierarchy continues to grow, or do you think it will block efficient automation?

Daniel Selsam (Oct 11 2019 at 22:27):

@Jeremy Avigad the 407 number is the number of distinct paths between the two, not the length of one of the paths between the two. The former is large because of diamonds. Our new procedure solves this problem.

Kevin Buzzard (Oct 12 2019 at 07:59):

Feel free to post any of mine.

Thanks Rob. https://twitter.com/XenaProject/status/1182928366753718273?s=20

In class I got all our first year undergraduates to draw a concept map of mathematics. Several people on the Lean chat have been drawing graphs of Lean's maths library and I was struck by the similarity. Here's one of Rob Lewis' representations. http://wwwf.imperial.ac.uk/~buzzard/docs/Rob_Lewis_mathlib_concept_map.pdf

- The Xena Project (@XenaProject)

Jeremy Avigad (Oct 12 2019 at 13:18):

@Daniel Selsam I know, but even a chain of ten instances to unravel a "+" sounds like it could cause problems ofr elaboration and unification. But if you tell me not to worry about it, I won't.

Kevin Buzzard (Oct 12 2019 at 14:38):

@Daniel Selsam @Jeremy Avigad Working on the perfectoid project this morning I had a max class instance resolution error, and if I increased the instance search level it turned into a timeout error. After some time staring at trace outputs and seeing exactly what by apply_instance could do, I managed to discover that I had managed to construct a ring B for which the type class inference structure could construct an instance of comm_ring B but not an instance of ring B (this was where the timeout occurred). The ring B was a "normal mathematical ring" -- it was a quotient of a subring of a field -- but the debugging was troublesome and the traces were gigantic and terrifying. I fixed the problem by giving comm_ring.to_ring a local priority of 1000 (this was the smallest number that worked). Looking at the traces, it really hit home what a phenomenally difficult job Lean has trying to solve these problems, but it was equally absolutely clear that the vast majority of what it was trying was really crazy. I just wanted to put my hands on Lean's shoulders and say "we are doing abstract ring theory here, stop thinking about int.ring and all these other silly ideas". Here, the issue didn't seem so much to be the loops in the inference graph but simply that it was so darn big and it was hard to highlight the manifestly important part of it.

Mario Carneiro (Oct 12 2019 at 14:46):

I think the issue is still the loops in the inference graph, it's just that since stuff like int.ring appears at the leaves it's going to appear a significant fraction of the time

Mario Carneiro (Oct 12 2019 at 14:48):

Does it help to letI := comm_ring B in the proof / statement?

Mario Carneiro (Oct 12 2019 at 14:49):

@Daniel Selsam Is the new instance algorithm back-portable to lean 3, or does it depend on significantly different architecture?

Kevin Buzzard (Oct 12 2019 at 15:45):

It does not help. In tactic mode I managed to get myself in a situation where letI : comm_ring B := by apply_instance succeeded and then on the next line letI : ring B := by apply_instance failed.

Kevin Buzzard (Oct 12 2019 at 15:45):

Once I had isolated this I could then fix the problem by fudging the instance priority.

Mario Carneiro (Oct 12 2019 at 15:46):

hm, I thought local instances always had the highest priority

Kevin Buzzard (Oct 12 2019 at 15:46):

 let B := ideal.quotient A,
  letI : comm_ring B := by apply_instance, -- works fine
  letI : ring B := by apply_instance, -- max class instance reached!

(from a PM to Patrick and Johan earlier today)

Reid Barton (Oct 12 2019 at 15:46):

But first we have to figure out to apply comm_ring.to_ring, right? and that doesn't have highest priority

Mario Carneiro (Oct 12 2019 at 15:47):

that's true, but are there a lot of other options?

Kevin Buzzard (Oct 12 2019 at 15:47):

[this is in the middle of a complicated thing, A is a subring of a perfectoid field, I couldn't minimise (indeed that was part of the problem)]

Mario Carneiro (Oct 12 2019 at 15:47):

I guess it went to division ring and then to field and wandered off

Mario Carneiro (Oct 12 2019 at 15:48):

letI : ring B := comm_ring.to_ring should presumably work

Kevin Buzzard (Oct 12 2019 at 15:49):

local attribute [instance, priority 1000] comm_ring.to_ring beforehand fixed everything. 999 was not enough.

Mario Carneiro (Oct 12 2019 at 15:49):

aha

Mario Carneiro (Oct 12 2019 at 15:49):

that must be the priority of local instances

Mario Carneiro (Oct 12 2019 at 15:51):

no, it's default priority

Kevin Buzzard (Oct 12 2019 at 15:51):

comm_ring.to_ring definitely had smaller priority than that, because I checked! I think it was 100? Not at Lean right now

Reid Barton (Oct 12 2019 at 15:52):

yes, it's 100

Mario Carneiro (Oct 12 2019 at 15:57):

I'm looking at the code, and it seems like they should be 1000 by default

Mario Carneiro (Oct 12 2019 at 15:58):

oh, the core lean files set the default_priority lower when algebraic structures are defined

Kevin Buzzard (Oct 12 2019 at 16:00):

ha ha so we have a random too-low priority in the algebraic hierarchy and then wonder why Lean is always going in the wrong direction? We should move this chat to Jeremy's thread.

Mario Carneiro (Oct 12 2019 at 16:02):

no, the priority is low for a reason

Mario Carneiro (Oct 12 2019 at 16:03):

I'm not sure what it's competing against, but it's probably mathlib's fault for not setting priority on some instance

Daniel Selsam (Oct 12 2019 at 17:46):

Daniel Selsam Is the new instance algorithm back-portable to lean 3, or does it depend on significantly different architecture?

@Mario Carneiro It would be a nightmare, because the new procedure relies on suspending and resuming the temporary metavariable contexts, which are implemented in type_context.cpp in an imperative style with only push_scope and pop_scope. It would require a big refactor in C++.

Sebastian Ullrich (Oct 14 2019 at 08:52):

Can I post one or more of these on Twitter or on my blog?

@Kevin Buzzard Of course, feel free to share mine.

Daniel Selsam (Oct 14 2019 at 12:42):

Can I post one or more of these on Twitter or on my blog?

@Kevin Buzzard Of course, feel free to share the one I posted as well.

Kevin Buzzard (Oct 14 2019 at 15:31):

Thanks all! I will (when I have a minute)

Sebastian Ullrich (Oct 17 2019 at 11:50):

I took a closer look at a has_scalar A B example @Rob Lewis gave me that fails after 47000 steps. It turns out that perhaps the most problematic instance is [comm_ring A] [ring B] [algebra A B] : has_scalar A B. In Rob's specific example, [comm_ring A] quickly succeeds, but [ring B] fails (slowly). So Lean 3's inference backtracks, finds another instance for [comm_ring A], and tries again to find a [ring B] (which "obviously" fails again). In the end, the number of steps after which [ring B] fails (~400) is multiplied by the number of (not necessarily non-defeq) instances of [comm_ring A] (~100).

Sebastian Ullrich (Oct 17 2019 at 11:50):

Fortunately, @Daniel Selsam's algorithm should eat this example for breakfast

Johan Commelin (Oct 17 2019 at 11:51):

/me usually takes > 15 minutes for eating his breakfast :worried:

Rob Lewis (Oct 17 2019 at 12:00):

Thanks for looking! This kind of repeated search turns up regularly, most memorably in

int.cast_coe :
  Π {α : Type u_1} [_inst_1 : has_neg α] [_inst_2 : has_zero α] [_inst_3 : has_one α] [_inst_4 : has_add α],
    has_coe ℤ α

Before we moved the has_neg instance to the front, this would take ages to fail to find a coercion from nat to int.

Rearranging the instances here doesn't look like it will help much. But it's good to know the root cause.

Rob Lewis (Dec 04 2019 at 12:48):

For another fun kind of visualization: I just learned about the tool gitstats. https://leanprover-community.github.io/mathlib_stats/index.html

Johan Commelin (Dec 04 2019 at 12:54):

Yeah, that's a nice tool. But I hadn't yet had the courage to run it on mathlib. How long does it take to run?

Rob Lewis (Dec 04 2019 at 12:55):

Not long enough to care. More than 5 seconds, less than 30?

Johan Commelin (Dec 04 2019 at 12:56):

Ooh, ok. Your computer is faster than mine (-;

Patrick Massot (Dec 04 2019 at 13:14):

Looks perfect for a 17 minutes talk about mathlib.

Kevin Buzzard (Dec 04 2019 at 16:25):

rofl I have committed barely anything to mathlib but I'm #15 on the list of commits, because every PR I make is so bad that it always takes ten more commits to fix up all the problems everyone else points out! :rolling_on_the_floor_laughing:

Last updated: May 02 2025 at 03:31 UTC