Zulip Chat Archive

Stream: mathlib4

Topic: Task 26: Replace Type u by Type* wherever possible


Martin Dvořák (Apr 22 2025 at 08:48):

Claiming task 26.

Jovan Gerbscheid (Apr 22 2025 at 12:10):

I think the difference is that Type* introduces a fresh universe, while in Type _, the level can be inferred by unification (or be fresh if it isn't unified to be anything else).

But there might still be instances in Mathlib where this replacement is possible.

Notification Bot (Apr 22 2025 at 12:13):

A message was moved here from #mathlib4 > Call for help: technical/ organisational debt by Yaël Dillies.

Matthew Ballard (Apr 22 2025 at 12:48):

Re 26: I would be very very careful with this. We have been encountering significant still-unidentified slowdowns that depend on the choice of universe name.

Re 27: I would replace Type _

  • first by Type* if possible
  • second by Type whatever where whatever is the universe level it actually unifies to.

Ruben Van de Velde (Apr 22 2025 at 12:50):

I would not replace Type _ at all in cases where Type* doesn't work

Edward van de Meent (Apr 22 2025 at 12:51):

actually, i think that Type _ happens precisely because arguments have Type*, since you don't get to refer to the universe variable created?
for example:

def Foo {A : Type*} (s:Set A) : Type _ := s

Matthew Ballard (Apr 22 2025 at 13:01):

There has been at least one case where Type _ really meant Type 0 (oops). I don't like the obfuscation if it can reasonably be avoided but it is not high up my priority list.

Martin Dvořák (Apr 22 2025 at 13:02):

Matthew Ballard said:

Re 26: I would be very very careful with this. We have been encountering significant still-unidentified slowdowns that depend on the choice of universe name.

I believe readability is more important than compilation speed (having one more variable in the context increases the cognitive load for readers); however, we will benchmark the PR to check it isn't too bad.

Matthew Ballard (Apr 22 2025 at 13:04):

Please be aware of #12737 and request my review

Jovan Gerbscheid (Apr 22 2025 at 14:07):

My understanding is that #12737 tracks slowdowns caused by universe metavariables. But Type* (unlike Type _) introduces a fresh universe parameter, not a metavariable. So I think going from Type u to Type* isn't bad for performance

Martin Dvořák (Apr 22 2025 at 14:33):

WIP part 1 (Algebra without category theory): #24285

Junyan Xu (Apr 22 2025 at 14:54):

Is there any automation to replace (R : Type*) (S : Type*) (M : Type*) by (R S M : Type*) and {R : Type*} {A : Type*} by {R A : Type*}, etc.?

Martin Dvořák (Apr 22 2025 at 14:57):

I guess you could write a regex for the basic cases.

Yaël Dillies (Apr 22 2025 at 14:57):

Not that I know of, but I have many times wished there was!

Martin Dvořák (Apr 22 2025 at 15:00):

I suggest not to do it in this PR (because of the way diffs are visualized).

Yury G. Kudryashov (Apr 22 2025 at 15:15):

Also note that

  • category theory relies on a specific order of universe variables, so that Category.{u} C means "category with morphisms in Type u";
  • definitions and theorems about docs#ULift should use the same order of universe arguments as ULift does (this is not always the case at the moment), for similar reasons.

Matthew Ballard (Apr 22 2025 at 15:20):

Jovan Gerbscheid said:

My understanding is that #12737 tracks slowdowns caused by universe metavariables. But Type* (unlike Type _) introduces a fresh universe parameter, not a metavariable. So I think going from Type u to Type* isn't bad for performance

Yes that is a reasonable belief but see #PR reviews > #21099 use unification hints for generators normalization of levels includes sorting of names which, last I checked, behaves unexpectedly for names with more than one part or more generally outside of ASCII. If anyone has more time to track down the offending code in core and locate the proper changes, please do.

Eric Wieser (Apr 22 2025 at 15:33):

Generally if you're working in an area where the universes are an important part of the theory (like ULift and category theory), you should write

universe u v
variable {X : Type u} {Y : Type v}

rather than Type*

Jovan Gerbscheid (Apr 22 2025 at 15:34):

The changes that @Martin Dvořák is proposing also include removing explicit universe annotations (e.g. Category.{v} C -> Category C). This does introduce universe metavariables, so this may give a slowdown.

Martin Dvořák (Apr 22 2025 at 15:35):

Given the info I have just learnt, I will revert the changes related to categories.

Jovan Gerbscheid (Apr 22 2025 at 15:42):

I looked a bit further into the slowdowns from universe metavariables, and I think part of the problem is that isDefEq has two kinds of caches: transient and permanent (the permanent one is also temporary, but resets less often). Having (universe) metavariables means that the permanent cache cannot be used.

Matthew Ballard (Apr 22 2025 at 15:44):

I think I tried using docs#Lean.Expr.hasExprMVar instead at some point for the cache. It did help a bit

Jovan Gerbscheid (Apr 22 2025 at 15:47):

That seems reasonable but I'd also worry if that can introduce bugs in isDefEq.

Matthew Ballard (Apr 22 2025 at 15:47):

It was very much a hack which is why I didn’t pursue it at all

Martin Dvořák (Apr 22 2025 at 16:16):

PR #24285 is ready for review.

Martin Dvořák (Apr 22 2025 at 16:55):

Benchmark +25.08% in Mathlib.Algebra.Module.Presentation.Tensor ...
Is it too much? Should I revert the changes in this file?
PS: I have just noticed that the file is not imported by any other Mathlib file except the sink. Are the changes I made there valid, first of all?

Jovan Gerbscheid (Apr 22 2025 at 17:01):

Interesting. This should be caused by removing the explicit universe annotation in

variable (relations₁ : Relations.{w₁₀, w₁₁} A)

I wonder if we should introduce a notation

variable (relations₁ : Relations.{*, *} A)

In the same spirit as Type*

Jovan Gerbscheid (Apr 22 2025 at 17:03):

I think all your changes are valid. I think the only one where you actually changed the definition is Mathlib/Algebra/Quotient.lean, but I agree with that change.

Martin Dvořák (Apr 22 2025 at 17:05):

Ah yeah, this is the one I was worried about the most.

Matthew Ballard (Apr 22 2025 at 17:05):

@Adam Topaz had thought about a Category* version in the past.

I haven’t looked yet but this sounds like a smaller scale incident of what I described before.

Matthew Ballard (Apr 22 2025 at 17:07):

branch#mrb/merge_type_stars merges some Type*’s Please feel free to PR but not that there is currently a Python script to understand the chances. It should probably be removed in the end.

I will be unavailable for 24-48 hours.

Junyan Xu (Apr 22 2025 at 17:13):

Thanks! Another automation target is to remove declared universe variables that are no longer used.

Martin Dvořák (Apr 22 2025 at 17:15):

Absolutely! There are plenty.

Kyle Miller (Apr 22 2025 at 17:22):

Jovan Gerbscheid said:

So I think going from Type u to Type* isn't bad for performance

Matthew mentioned an issue (though I'm not sure about the ASCII part, since Type* only creates names of the form u_1, u_2, ...), but another is that Type* adds the new level parameter to the end of the level parameter list, which may result in a different order. Yury mentioned this in the context of API, but I'm mentioning this as something that could cause performance differences (though I don't know if this can cause significant problems).

(I like the idea of * as a fresh universe level parameter. If it makes it to core, I think it would need to be rethought — it's odd how Type* creates non-hygienic level variables — you can refer to u_1 for example.)

Jovan Gerbscheid (Apr 22 2025 at 17:26):

Yes, I think the implementation of introducing a fresh universe level on the fly should be redone. Although the issue of universe hygiene is more fundamental; that also exists when using Type _.

Kyle Miller (Apr 22 2025 at 17:29):

A couple options:

  • * could create an inaccessible name, and these names can be cleaned in a pass before the declaration is added to the environment.
  • * could create a level metavariable along with a record in the TermElab state that it is an error if this metavariable unifies with anything (there would be a pass at the end checking for this, flagging the place where the level was introduced along with what it unified to). This is nice since it works just like Type _ except for the fact it can throw an error when the uniqueness assumption isn't met.

Kyle Miller (Apr 22 2025 at 17:29):

Jovan Gerbscheid said:

that also exists when using Type _

What hygiene issue are you talking about?

Jovan Gerbscheid (Apr 22 2025 at 17:33):

example (a : Type _) : a = a := Eq.refl (a : Type u_1)

I think this is what you were referring to: the universe level u_1 is valid here.

Kyle Miller (Apr 22 2025 at 17:37):

Yeah, that's related, and it's using the same system as Type* too. (Type* works by creating a metavariable and immediately using levelMVarToParam on it, rather than waiting for it to be applied at a later elaboration checkpoint. For theorems, that's before elaborating the body.)

Jovan Gerbscheid (Apr 22 2025 at 17:39):

I think * should create a level parameter, not a metavariable, because we need the function Expr.hasMVar to return false for the better isDefEq cache to be used

Kyle Miller (Apr 22 2025 at 17:41):

With my second bulleted option (metavariables), this would also give a better error messages than this one without having to add checks everywhere in the language:

-- error is on the whole `theorem` line
theorem weird : True := by
  have := Type*
  trivial
/-
error: AddConstAsyncResult.commitConst: constant has level params [u_1] but expected []
-/

Compare to

theorem not_so_weird : True := by
  have := Type _
    --~~ error position
  trivial
/-
failed to infer universe levels in 'have' declaration type
  Type (?u.635 + 1)
-/

Jovan Gerbscheid (Apr 22 2025 at 17:46):

That doesn't seem right. If we make it a def, not_so_weird should compile correctly (in the hypothetical implementation of *), but using a metavariable will trigger this error.

Jovan Gerbscheid (Apr 22 2025 at 17:49):

I think it should have the same effect as the current Type*, appart from issues around naming and hygiene.

Jovan Gerbscheid (Apr 22 2025 at 17:52):

I noticed some interesting behaviour regarding when universe metavariables get abstracted and when not:

variable (a : Type _)

-- a : Type ?u.5
-- b : Type u_1
def test (b : Type _) : True := by sorry

-- a : Type u_2
-- b : Type u_1
def test' (b : Type _) : a = a := by sorry

-- b : Type u_1
theorem test'' (b : Type _) : True := by sorry

It seems that universe metavariables are not abstracted inside the body if they are introduced via variable and do not (at first) appear in the type, and if we are in a def.

The file that slowed down by 25% contains isPresentationCoreTensor, which satisfies all these conditions. You can see the offending metavariables with set_option pp.universes true

Jovan Gerbscheid (Apr 22 2025 at 17:55):

Also, the universe names are in a weird order when combining variable and normal arguments:

variable (a₁ a₂ : Type _)
-- a₁ : Type u_3
-- a₂ : Type ?u.11
-- b₁ : Type u_1
-- b₂ : Type u_2
def test (b₁ b₂ : Type _) : a₁ = a₁ := by sorry

Kevin Buzzard (Apr 22 2025 at 18:16):

Kyle Miller said:

Matthew mentioned an issue (though I'm not sure about the ASCII part, since Type* only creates names of the form u_1, u_2, ...), but another is that Type* adds the new level parameter to the end of the level parameter list, which may result in a different order. Yury mentioned this in the context of API, but I'm mentioning this as something that could cause performance differences (though I don't know if this can cause significant problems).

You know about the 29% speedup in the (unmerged!) #21129 caused by changing the alphabetical order of the universes, right?

We have not had the benefit of that speed-up literally thousands of times since then.

Kyle Miller (Apr 22 2025 at 18:34):

I didn't @Kevin Buzzard; and I guess that proves that changing universe variable orders could have caused a 29% slowdown (assuming we had a good order from the beginning).

Kevin Buzzard (Apr 22 2025 at 18:37):

Just to be clear, this wasn't "physical" universe order changing, this was literally renaming of the universes so that alphabetically they were in a different order. But if that can cause a 29% effect then I'm sure that reordering them can!

Kyle Miller (Apr 22 2025 at 18:42):

Jovan Gerbscheid said:

If we make it a def, not_so_weird should compile correctly

Declarations with universe levels that can't be inferred from the type are a very special case, and such definitions are harder to use (since you have to provide universe levels). I don't think I'd mind if you had to opt in with an explicit universe level.

Plus: Is there any case of a universe level that's used only in the body of a declaration that couldn't just be replaced by 0?

Kyle Miller (Apr 22 2025 at 18:52):

Jovan Gerbscheid said:

I noticed some interesting behaviour regarding when universe metavariables get abstracted and when not:

Additionally, for def it also depends on whether the type is a proposition or not. If it's not, everything appears to remain as level metavariables.

Jovan Gerbscheid (Apr 22 2025 at 19:43):

@Kyle Miller, using explicit universes also gives the error you said is weird:

universe u

-- error is on the whole `theorem` line
theorem weird : True := by
  have := Type u
  trivial

So I think the problem here is just that Lean's universe error messages have room for improvement. (Although this isn't a common scenario in practice anyways)

Kyle Miller (Apr 22 2025 at 19:50):

It looks like theorem is not removing the universe levels that don't appear in the header. I'd expect "unknown universe level 'u'", just like

variable (x : Nat)

theorem notWeird : True := by
  have := x -- unknown identifier 'x'
  trivial

(Ideally it would say a bit more about how these were variables unused in the header.)

Aaron Liu (Apr 22 2025 at 19:53):

Like this?

-- unused universe parameter 'u'
theorem goodError.{u} : True := by
  trivial

Kyle Miller (Apr 22 2025 at 19:56):

No, in the context of Jovan's example, one direct fix would cause it to say just "unknown universe level 'u'" but it could say something more informative like "universe level 'u' was not used in the theorem header, so it was not included" with more work.

Jovan Gerbscheid (Apr 23 2025 at 12:12):

What do we think of a set_option that makes lean throw an error if there are universe metavariables left after elaboration? It would be false by default, but turned on in Mathlib, similar to autoImplicit. This would force the replacement of Type _ by Type* when possible. It would be beneficial for performance, but also for readability, because it forces universes to be introduced explicitly (either with a named level, or with the new * syntax for simple cases).

Kyle Miller (Apr 24 2025 at 13:16):

Kevin Buzzard said:

The culprits are declarations such as docs#Algebra.Extension and docs#Algebra.Generators .

Oops, I hope I didn't cause a diversion in the discussion here. I meant to say "is there any case of a universe level parameter that's used only in the body of a definition or theorem that couldn't just be replaced by 0?" Structures definitely make use of this sometimes for a seemingly legitimate reason, as you point out. (And that certainly doesn't change the fact that structures doing this are harder to use!)

I was thinking about this at some point because one rule Lean could take with the "failed to infer universe levels" error is to set them to zero.

def f :=
  let xs : List PUnit := []
--         ~~~~~~~~~~
  true
/-
failed to infer universe levels in 'let' declaration type
  List.{?u.2513} PUnit.{?u.2513 + 1}
-/

Indeed, when this error happens, Lean sets the level metavariables to 0 so that it can make progress with the rest of the file.

set_option pp.universes true
#print f
/-
def f : Bool :=
let xs := List.nil.{0};
true
-/

I don't think this could solve any of the universe level slowness you bring up in any way, unfortunately.

Notification Bot (Apr 24 2025 at 13:38):

17 messages were moved from this topic to #mathlib4 > JacobiZariski is slow. by Kevin Buzzard.

Kevin Buzzard (Apr 24 2025 at 13:40):

Oops, I hope I didn't cause a diversion in the discussion here.

You did, but that's not a bad thing! Clogging up this discussion is though (and this is my fault not yours), so I moved the relevant posts to the "JacobiZariski is slow" thread.


Last updated: May 02 2025 at 03:31 UTC