Zulip Chat Archive

Stream: lean4

Topic: ridiculous instance names

Kevin Buzzard (May 20 2023 at 21:10):

The output of docs4#SemiNormedGroup is comical (wait for the help).

Are these gigantic instance names (which people occasionally notice for random reasons, e.g. what just happened to me) actually a problem? If so, do we need to petition the core Lean devs to change the default instance-naming algorithm? Or is it OK to just let these 2500+ character (yes, they are now at over 2500 characters) instance names continue to pile up in mathlib?

Kevin Buzzard (May 20 2023 at 21:10):

Kevin Buzzard (May 20 2023 at 21:16):

Oh, something that breaks is that you can never see the equations in the docs because you always get "One or more equations did not get rendered due to their size."

Kevin Buzzard (May 20 2023 at 21:18):

I can't find an issue for this. Did I miss it?

Henrik Böving (May 20 2023 at 21:26):

Is there an alterantive naming algorithm that you would suggest?

Kevin Buzzard (May 20 2023 at 21:27):

The Lean 3 one!

Kevin Buzzard (May 20 2023 at 21:27):

I mean, the community Lean 3 one, which was working fine for everyone (and which IIRC was modified from 3.4.1)

Jireh Loreaux (May 20 2023 at 22:27):

I think we had a discussion about this during one of the porting meetings about a month ago (maybe two). Gabriel seemed to be in favor of it, but I guess it wasn't entirely clear what exactly the algorithm would be / what the preferences are.

Moritz Doll (May 21 2023 at 03:11):

I thought we more or less settled on Bar.instFoo for instance Foo Bar

Moritz Doll (May 21 2023 at 03:14):

see !4#3759

Kevin Buzzard (May 21 2023 at 08:23):

So currently the policy is "instance : foo considered harmful; please change to instance instfoo : foo? Are we going to make this mandatory in mathlib, e.g. lint for it? Or are we going to open an issue and petition the devs for a core change?

Eric Wieser (May 21 2023 at 08:31):

I suspect we don't need a core change and can just change the meaning of instance in mathlib

Kevin Buzzard (May 21 2023 at 08:34):

Oh -- make the naming follow the Lean 3 convention automatically? That is clearly the coolest solution!

Kevin Buzzard (May 21 2023 at 08:36):

So would it be reasonable to open an issue against mathlib suggesting this? The only reason I'm pushing it is that right now there doesn't seem to be an issue anywhere, because it's not really causing any problems, just general hilarity, and there's an easy fix ("rename the instance manually"). But at some point it might blow up in our faces.

Damiano Testa (May 21 2023 at 08:39):

Besides giving reasonable names to instances, could it also be one of these hygiene things, where you are not allowed to use the autogenerated name and, if you want to explicitly use the instance, you must give it a name?

Damiano Testa (May 21 2023 at 08:41):

(or even have a tactic code action that automatically fills in the correct instance name.)

Max Nowak 🐉 (May 21 2023 at 08:41):

Or at the very least prefix autogenerated names with an underscore to emphasize that you shouldn’t be using it. But that way you still can use them, for debugging stuff etc.

Mario Carneiro (May 21 2023 at 09:03):

Eric Wieser said:

I suspect we don't need a core change and can just change the meaning of instance in mathlib

I strongly dislike options that require globally overriding core commands, as this has a variety of knock on effects like mathport not working properly, linters and code actions not detecting the command anymore, third party processing of lean AST data getting messed up, etc.

Mario Carneiro (May 21 2023 at 09:05):

it is an option but I would consider it as a last resort

Kevin Buzzard (May 21 2023 at 09:13):

Then what is your preferred solution?

Mario Carneiro (May 21 2023 at 09:22):

fixing the algorithm in core

Mario Carneiro (May 21 2023 at 09:23):

I don't think anyone thinks this is desirable, we can all see the characteristics of exponential growth

Mario Carneiro (May 21 2023 at 09:24):

we just haven't been sufficiently motivated to fix it yet, since It Works Fine :tm:

Yaël Dillies (May 21 2023 at 13:48):

Damiano Testa said:

Besides giving reasonable names to instances, could it also be one of these hygiene things, where you are not allowed to use the autogenerated name and, if you want to explicitly use the instance, you must give it a name?

I dislike that, because that means we now need to name all the instances even though the (Lean 3) autogenerated name is usually the correct one.

Anatole Dedecker (May 21 2023 at 14:14):

On a related note, are we supposed to #align instances names ?

Ruben Van de Velde (May 21 2023 at 14:20):

I usually align them if I add an explicit name

Gabriel Ebner (May 21 2023 at 18:43):

Mario Carneiro said:

I strongly dislike options that require globally overriding core commands, as this has a variety of knock on effects like mathport not working properly, linters and code actions not detecting the command anymore, third party processing of lean AST data getting messed up, etc.

I don't think any of these particular concerns would apply here. We can override the instance naming heuristic with a macro that rewrites instance : ... to instance sensibleName : ..., which would not change the AST. But it's still confusing if instance does something different if you import mathlib.

Kevin Buzzard (May 22 2023 at 05:45):

I would also be very happy to see the change made in core!

Scott Morrison (May 26 2023 at 01:14):

In the meantime if you want to hunt for long names you can use:

import Mathlib.Util.LongNames
#long_names

(or #long_names 1000 to only show names longer than 1000 characters).

Kyle Miller (Dec 05 2023 at 08:57):

I was experimenting with dealing with these enormous instance names tonight, by coming up with some heuristic that names instances based on explicit arguments and types.

Here's a list of all the instances currently in mathlib, using the current autogenerated names, and here's a list using the experimental heuristic.

Here's the naming function, and I used the following to create these text files:

import Mathlib

open Lean

elab "print_all_instances" : command => do
  let env ← getEnv
  let names := Meta.instanceExtension.getState env |>.instanceNames
  let names := names.foldl (fun ns n _ => ns.push n) #[]
  let names := names.heapSort (lt := fun n1 n2 => n1.toString.length > n2.toString.length)
  let mut out : String := ""
  for i in [0:names.size] do
    let n := names[i]!
    out := out ++ s!"{n.toString.length} {names[i]!}\n"
  IO.FS.writeFile "instances.txt" out

print_all_instances

I only had to modify seven mathlib files to get it to compile, either to update some references to autogenerated instances or to help to_additive additivize a name.

Named instances are mixed into both of these lists, which skews things a bit (and probably the worst names have already been renamed), but still in the new list over 98.5% have fewer than 80 characters, vs 92.2% in the old one.

I haven't really spent any time evaluating the results yet to see how to improve the heuristics, or whether to use a completely different approach.

Kyle Miller (Dec 05 2023 at 09:03):

I just checked the #3759 example, and with this the instance comes out as Seminorm.instConditionallyCompleteLatticeSeminorm rather than the slightly longer Seminorm.instConditionallyCompleteLatticeSeminormToSeminormedRingToSeminormedCommRingToNormedCommRingToAddGroupToSMulToZeroToNegZeroClassToSubNegZeroMonoidToSubtractionMonoidToDivisionAddCommMonoidToSMulZeroClassToZeroToCommMonoidWithZeroToCommGroupWithZeroToSemifieldToFieldToSMulWithZeroToMonoidWithZeroToSemiringToDivisionSemiringToMulActionWithZeroToAddCommMonoid.

Kyle Miller (Dec 05 2023 at 09:11):

There are sort of conflicting goals here:

The generated instance names should be globally unique
But, if you have two of the same instance, name collision can sometimes help you detect it
Short unreadable names that are guaranteed to be unique help with the first point
But, it can be helpful seeing instance names in terms to figure out what's going on

Sebastian came up with an idea of doing the simplest name possible and tagging it with the current module name to make sure it's globally unique, which would work, but in this experiment I wanted to see how far pruning the instance names could go, since the names can sometimes be valuable.

Mario Carneiro (Dec 05 2023 at 09:13):

putting the module name is another way to say "never refer to this directly"

Mario Carneiro (Dec 05 2023 at 09:15):

I'm not sold on treating global uniqueness as a requirement. It's fine if it is 95% unique because you can always rename it

Mario Carneiro (Dec 05 2023 at 09:15):

and you can automatically avoid collisions when the duplicate is in a dependency

Kyle Miller (Dec 05 2023 at 09:16):

Yeah, both the current name generator and the experimental one are doing that using mkUnusedBaseName

Sebastian Ullrich (Dec 05 2023 at 09:22):

Mario Carneiro said:

It's fine if it is 95% unique because you can always rename it

That is just not true if both instances are outside of your control e.g. because they are part of other packages

Mario Carneiro (Dec 05 2023 at 09:26):

I was thinking of qualifying that that the names should still be reasonably scoped to the library in question, but generally the usage of constants defined in the library will ensure this

Mario Carneiro (Dec 05 2023 at 09:27):

In general the answer to that is to petition the upstream library to name the offending instance

Mario Carneiro (Dec 05 2023 at 09:28):

TBH this is a problem with the global namespace in general, not limited to instances. If I make List.sort in my library because it's missing from std, and you do the same thing for the same reason, then now no one can use both our libraries at the same time

Sebastian Ullrich (Dec 05 2023 at 09:30):

Yes but let's stay at anonymous instances for this thread. If you don't name something, it is ridiculous that you may be forced to rename it later

Mario Carneiro (Dec 05 2023 at 09:31):

that doesn't sound ridiculous at all...? If you didn't name the thing you might need to name it later for a variety of reasons

Sebastian Ullrich (Dec 05 2023 at 09:32):

But the system giving it the wrong name should never be one of these reasons

Mario Carneiro (Dec 05 2023 at 09:32):

If this was an actually anonymous and unnameable instance we would not have this problem, we could just use private / hygienic names and call it a day

Mario Carneiro (Dec 05 2023 at 09:32):

but this is supposed to actually give things names a user would use

Mario Carneiro (Dec 05 2023 at 09:33):

and that means it comes at the cost of potential ambiguity, because any good (for humans) naming system has a reasonable amount of ambiguity

Sebastian Ullrich (Dec 05 2023 at 09:35):

Mario Carneiro said:

but this is supposed to actually give things names a user would use

No, that is your goal. As you correctly inferred above, my goal is to give it a name that can technically be accessed for debugging and referenced in doc-gen but makes it sufficiently obvious that references to it should never be committed.

Sebastian Ullrich (Dec 05 2023 at 09:36):

As Scott has already and successfully removed all references to anonymous instance names in mathlib (there could be new ones of course), it doesn't seem to me like people would object to that goal

Mario Carneiro (Dec 05 2023 at 09:36):

Okay, but in that case we will need to ban anonymous instances

Sebastian Ullrich (Dec 05 2023 at 09:37):

It's fine to refer to anonymous instances anonymously... like we're usually doing, via typeclass search

Mario Carneiro (Dec 05 2023 at 09:38):

because these instances need to be reasonably nameable and referenceable

Mario Carneiro (Dec 05 2023 at 09:39):

Ok but as a library design requirement we don't know whether users will need to refer to them by name (either in projects or on zulip)

Mario Carneiro (Dec 05 2023 at 09:41):

and they should have names that blend in with the rest of the library

Mario Carneiro (Dec 05 2023 at 09:41):

If instance doesn't do this, then I want a command that does

Kyle Miller (Dec 05 2023 at 09:42):

There are some examples in Mathlib/CategoryTheory/Limits/Shapes/WidePullbacks.lean of needing to add attributes to DecidableEq instances created via deriving handlers.

Sebastian Ullrich (Dec 05 2023 at 09:45):

If it's in the same project that seems okay, you can still name the instance. But lessening the need to refer to instance names explicitly in general seems like a sensible goal.

Mario Carneiro (Dec 05 2023 at 09:46):

Sure, but we can do both: discourage their use but also don't make them obnoxiously bad names

Sebastian Ullrich (Dec 05 2023 at 10:09):

That doesn't sound too far from my own stance! My proposal would have been Seminorm.«instConditionallyCompleteLattice@Mathlib.Analysis.Seminorm», is that sufficiently between "discouraging" and "obnoxiously bad" for your taste?

Mario Carneiro (Dec 05 2023 at 10:11):

it's pretty bad, I can't even type that on my keyboard in zulip

Mario Carneiro (Dec 05 2023 at 10:13):

I don't think the names themselves need to discourage use. The fact that they are written by robots will make them awkward enough already

Sebastian Ullrich (Dec 05 2023 at 10:13):

Would you have typed out Seminorm.instConditionallyCompleteLatticeSeminorm? Or copied it from docgen, found by searching for substrings or via navigation?

Mario Carneiro (Dec 05 2023 at 10:14):

It is very difficult to predict all the ways these names get around. I have had to write private names, hygienic names, pretty much any declaration comes up in the course of debugging or discussion eventually

Mario Carneiro (Dec 05 2023 at 10:15):

and you also don't control all the venues for writing text. I might be writing that on my phone while in line at the store

Mario Carneiro (Dec 05 2023 at 10:17):

Seminorm.instConditionallyCompleteLatticeSeminorm actually doesn't look that bad. Most of the characters in it are just because the human-written mathlib name itself is long

Sebastian Ullrich (Dec 05 2023 at 10:18):

It's still not something I would write blindly in a store but copy from a docgen tab, but workflows may differ

Mario Carneiro (Dec 05 2023 at 10:18):

for sure I will copy that if I have an option

Mario Carneiro (Dec 05 2023 at 10:19):

but workflows may differ

Mario Carneiro (Dec 05 2023 at 10:20):

oh, another weird place you might see a lean name is in latex documents / papers

Mario Carneiro (Dec 05 2023 at 10:21):

it's not even that weird to be referencing an instance in such a case if you are trying to align a "Theorem: foo is a group"

Mario Carneiro (Dec 05 2023 at 10:23):

or you are using the blueprint tool or similar and you want to link to the corresponding lean theorem, and then latex complains because of the « and you have a little latex debugging / configuration session

Mario Carneiro (Dec 05 2023 at 10:24):

now that I come to think about it, linkifiers are also a case where support for unusual characters is lacking. Let's test your example: docs#Seminorm.«instConditionallyCompleteLattice@Mathlib.Analysis.Seminorm»

Mario Carneiro (Dec 05 2023 at 10:25):

oof

Sebastian Ullrich (Dec 05 2023 at 10:26):

That's a problem we can have today with referring to syntax, we should fix that

Mario Carneiro (Dec 05 2023 at 10:27):

(we could also consider fixing that by rendering them in ascii)

Mario Carneiro (Dec 05 2023 at 10:28):

writing a regex for lean names is very un-fun

Mario Carneiro (Dec 05 2023 at 10:29):

it would probably help if lean used actual unicode character classes

Sebastian Ullrich (Dec 05 2023 at 10:32):

Then it seems like we're at a standstill again regarding this issue. It should be clear by now that "works in 95% of cases" is not how core operates.

Mario Carneiro (Dec 05 2023 at 10:32):

This is a naming heuristic, is it not?

Mario Carneiro (Dec 05 2023 at 10:33):

heuristic implying that 100% correctness is not a goal

Mario Carneiro (Dec 05 2023 at 10:34):

I would argue that you should just not have unnamed instances under these circumstances

Sebastian Ullrich (Dec 05 2023 at 10:34):

It would rather say it is a combination of naming and hygiene. Hygiene is supposed to work in 100% of cases

Mario Carneiro (Dec 05 2023 at 10:35):

it works in 99.9% of cases

Mario Carneiro (Dec 05 2023 at 10:36):

but for theorem names, hygienic names are always the devil

Mario Carneiro (Dec 05 2023 at 10:38):

initialize declarations are an example of hygienic names that I have genuinely never needed to call or write (although I have needed to debug them and locating them by name is hard)

Mario Carneiro (Dec 05 2023 at 10:39):

but deriving FromJson is a example of a bad hygienic named declaration, because it's not that weird to want to talk about that function

Mario Carneiro (Dec 05 2023 at 10:40):

and instances are at about the same level of badness if they are hygienic. you can avoid talking about them in most situations, but it's unreasonably difficult to deal with them when things aren't going to plan

Mario Carneiro (Dec 05 2023 at 10:49):

and for the record I'm 100% on board with Kyle's new naming algorithm, it's clearly giving better results

Mario Carneiro (Dec 05 2023 at 10:53):

Slightly facetious alternative suggestion: if the name is longer than 80 characters cut the rest off and replace it with a hash. No one except the compiler is reading that far anyway

Eric Wieser (Dec 05 2023 at 13:08):

Mario Carneiro said:

I'm not sold on treating global uniqueness as a requirement. It's fine if it is 95% unique because you can always rename it

Mario Carneiro said:

TBH this is a problem with the global namespace in general, not limited to instances. If I make List.sort in my library because it's missing from std, and you do the same thing for the same reason, then now no one can use both our libraries at the same time

As I now see Kyle pointed out above, sometimes non-uniqueness of instance names is a feature, just as it is with regular names; it reminds you that the instance you wrote already exists elsewhere, and you didn't need to write it in the first place.

Eric Wieser (Dec 05 2023 at 13:16):

Kyle Miller said:

I just checked the #3759 example, and with this the instance comes out as Seminorm.instConditionallyCompleteLatticeSeminorm rather than the slightly longer Seminorm.<snip>.

Was it a deliberate choice to keep the trailing Seminorm here?

Jireh Loreaux (Dec 05 2023 at 23:20):

@Sebastian Ullrich What about prefixing all instances names with the lake project in which they reside (e.g., Mathlib.instAddGroupInt). This should avoid the problem with downstream projects import two projects with overlapping instance names, but could allow a heuristic which works only 99% of the time within a project.

Kyle Miller (Dec 05 2023 at 23:31):

I was going to propose essentially that as a compromise too :-)

The version I was going to suggest though is instAddGroupInt._Mathlib (either with or without that _) to make sure that the instance is added into the correct namespace, in case that matters

Mario Carneiro (Dec 05 2023 at 23:34):

Maybe only do that for things in the root namespace?

Jireh Loreaux (Dec 05 2023 at 23:34):

But then you have to type it if you ever refer to it. With Mathlib. you could just open Mathlib.

Jireh Loreaux (Dec 05 2023 at 23:34):

Mario, that doesn't solve Sebastian's problem that instances in different libraries should be 100% unique.

Mario Carneiro (Dec 05 2023 at 23:35):

As long as the namespaces are independent you still have this property. If they aren't independent, you have other problems

Jireh Loreaux (Dec 05 2023 at 23:36):

Two projects could write instances for List

Kyle Miller (Dec 05 2023 at 23:37):

One reason I like this idea of tagging instances with the project name is that it gives you a better idea of where an instance comes from. It's small yet still a fairly unobtrusive affordance.

Kyle Miller (Dec 05 2023 at 23:37):

@Jireh Loreaux Do we want to refer to autogenerated instance names in code? I also like how this discourages you a little from doing it (but not so much that you won't if you need to)

Kyle Miller (Dec 05 2023 at 23:38):

I don't like prefixing with Mathlib because there's no relationship between a namespace and a module, so why should it presume that Mathlib (or whatever the project name is) is the prefix to use?

Jireh Loreaux (Dec 05 2023 at 23:38):

In general I wouldn't want to refer to it in code, but as we've addressed, it does happen.

Eric Wieser (Dec 05 2023 at 23:39):

Jireh Loreaux said:

Two projects could write instances for List

If two projects write instance of the same type for List, then you probably want to be really sure that the instances are actually equal; having a monadic list bind in project A that reverses the list but one in project B that doesn't would be very confusing.

We don't have this problem for functions, because definining a function with the same name but a different behavior is just an error.

Mario Carneiro (Dec 05 2023 at 23:40):

I am very :sob: about not having import Foo renaming or similar to just solve these issues once and for all

Eric Wieser (Dec 05 2023 at 23:41):

... though I can't help feel this is a non issue in the case being discussed here; if two projects really define the same instance on the same type with the same name, this is a probably a strong indication that the result really belong upstream anyway

Jireh Loreaux (Dec 05 2023 at 23:41):

Eric, the problem isn't identical instances (of course, non-defeq instances would be problematic), the problem is naming conflicts. Those could arise for genuinely different (but not of the same type) instances with a heuristic that is only 99% effective at hygiene (without prefixing with the project name)

Kyle Miller (Dec 05 2023 at 23:41):

Jireh Loreaux said:

In general I wouldn't want to refer to it in code, but as we've addressed, it does happen.

I meant more, is it something we want to do so much that we'd want to make sure we can do open Mathlib to make it easier?

Jireh Loreaux (Dec 05 2023 at 23:42):

Kyle, you're right. Probably not. So I am also fine with your suggestion.

Kyle Miller (Dec 05 2023 at 23:44):

Eric Wieser said:

Kyle Miller said:

I just checked the #3759 example, and with this the instance comes out as Seminorm.instConditionallyCompleteLatticeSeminorm rather than the slightly longer Seminorm.<snip>.

Was it a deliberate choice to keep the trailing Seminorm here?

No definitely not, thanks for pointing that out. Getting people to evaluate the names is why I was sharing examples :smile:

Kyle Miller (Dec 06 2023 at 04:38):

I tweaked the heuristics a bit more, and I have it adding suffixes using the module root which (I hope?) prevents inter-project instance conflicts.

Even with these _mathlib suffixes, now the percent of names that are less than 80 characters long has stayed about the same (it's gone from 98.5% to 98.6%). Here's a list if anyone wants to take a look. Note that auto-named instances from Lean core do not yet appear with a suffix here. Here's the new algorithm.

Kyle Miller (Dec 06 2023 at 04:41):

I think Kevin's example in his first message is currently in mathlib as the 2579 character ContinuousLinearMap.instNormedSpaceContinuousLinearMapToSemiringToDivisionSemiringToSemifieldToFieldToNormedFieldIdToNonAssocSemiringContinuousLinearMapToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToTopologicalSpaceToUniformSpaceToPseudoMetricSpaceToAddCommMonoidToAddCommGroupToModuleToModuleTopologicalSpaceToTopologicalAddGroupAddCommMonoidToContinuousAddToAddGroupToSeminormedAddGroupContinuousLinearMapToTopologicalSpaceTo...(2117 characters omitted)...

It's a little obscure how it comes to this, but with the heuristics in the naming experiment, this comes out as just ContinuousLinearMap.instNormedSpaceId_mathlib instead. The Id comes from a RingHom.id that appears inside the ContinuousLinearMap notation. The heuristic drops ContinuousLinearMap from the instance name itself since it already appears in the namespace name.

Mario Carneiro (Dec 06 2023 at 04:45):

The use of underscores reminds me: maybe we should use underscore separation instead of camel casing. I know that's not the usual advice but we can make an exception for instances, and it will help a lot with being able to mentally tokenize these identifiers

Mario Carneiro (Dec 06 2023 at 04:46):

although it's possible that with the new heuristic there are never enough tokens for it to matter

Mario Carneiro (Dec 06 2023 at 04:48):

I also have another proposal for how to avoid the _mathlib in many cases: if the declaration's namespace is a type (or other declaration), and that type was declared in the current project, then the suffix is not needed. This maintains the property that declarations from different projects are distinct, because they can't both have defined the type in the namespace

Sebastian Ullrich (Dec 06 2023 at 10:34):

This maintains the property that declarations from different projects are distinct, because they can't both have defined the type in the namespace

Interesting proposal, it makes sense. I would like to propose a variant though that both uses a more precise suffix and should make it unnecessary in virtually all cases (which we should measure): we suffix the whole module name unless any of the declarations used in the generated instance name are from the current module. With that we should have guaranteed uniqueness again except with fringe mangling collisions like declaring an instance of the same class for both N.X and M.X or both X Y and XY.
The reason I'm involving the full module name again is that while it's true that name collisions in the same project can be fixed easily, it is one of these language design papercuts that we really really want to avoid. Especially to beginners it must be absolutely puzzling to get a naming collision from something they haven't even named and can easily leave a sour aftertaste of the entire language.

Yaël Dillies (Dec 06 2023 at 10:39):

The aftertaste can probably be watered down significantly if the error message clearly tells that this is a naming collision and offers solutions to it, like

Try replacing
instance : Foo
with
instance instFoo : Foo

Sebastian Ullrich (Dec 06 2023 at 10:56):

Certainly after putting in the work to enable that but I'm not yet seeing a downside to my proposal that would fix the issue at the root compared to Mario's

Mario Carneiro (Dec 06 2023 at 10:56):

asking for it to be in the module and not the package will rule out a lot of mathlib instances

Sebastian Ullrich (Dec 06 2023 at 10:57):

That many? Neither the class nor the type nor other relevant parts are from the module?

Mario Carneiro (Dec 06 2023 at 10:58):

My impression is that Kevin's example linked above wouldn't meet the requirement

Mario Carneiro (Dec 06 2023 at 11:00):

it's not that weird to have a "diamond" file that shows that structure A has property B if you don't want either A or B to import the other

Johan Commelin (Dec 06 2023 at 11:00):

@Sebastian Ullrich what is the advantage of restricting to "module" instead of "package" in your proposed rule?

Sebastian Ullrich (Dec 06 2023 at 11:07):

If you don't do that, you lose the described level of uniqueness and there is no advantage compared to Mario's proposal

Sebastian Ullrich (Dec 06 2023 at 11:08):

Mario Carneiro said:

it's not that weird to have a "diamond" file that shows that structure A has property B if you don't want either A or B to import the other

It is a reasonable case, more than Kevin's example that apparently is a porting workaround. I would still prefer to see some data on this.

Mario Carneiro (Dec 06 2023 at 11:09):

maybe @Kyle Miller can run a test?

Mario Carneiro (Dec 06 2023 at 11:10):

kevin's case is also not that unusual though, the right generalization is that there is a certain non-instance definition which is intended to be local instance'd in certain files

Mario Carneiro (Dec 06 2023 at 11:11):

although I think it's not as problematic to have a module prefix on local instances

Joachim Breitner (Dec 06 2023 at 11:18):

Johan Commelin said:

Sebastian Ullrich what is the advantage of restricting to "module" instead of "package" in your proposed rule?

Is “package” even a concept that’s known to lean? Isn’t it all modules somewhere in a search path? I smell a layering violation here (but might be wrong)

Jireh Loreaux (Dec 06 2023 at 14:49):

Sebastian Ullrich said:

The reason I'm involving the full module name again is that while it's true that name collisions in the same project can be fixed easily, it is one of these language design papercuts that we really really want to avoid. Especially to beginners it must be absolutely puzzling to get a naming collision from something they haven't even named and can easily leave a sour aftertaste of the entire language.

In my opinion, long instance names are a much larger paper cut for new users, especially if the naming conflict comes with a good error message and the instructions to name the instance. For example, you see them all the time in doc gen, and they take up the entire search drop down. Of course, we could fix this by removing instances from doc gen search, but that seems like a silly hack.

(EDIT: the point being, I think using only the package instead of the full module is better because it results in short names.)

Kyle Miller (Dec 06 2023 at 16:17):

@Joachim Breitner I'm not sure what the right terminology is here, but it's between the full module name and the module root (the A in A.B.C). I'm not completely certain about this, but I've been under the impression that different Lake packages are not supposed to have intersecting sets of module roots.

Joachim Breitner (Dec 06 2023 at 17:27):

Yes, absolutely: But that’s a concept of lake (the build tool), as a wrapper around lean (the compiler/language). Lean itself (I believe) just sees modules, and it would be somewhat odd if lean the language has to know about how modules are grouped in packages.

It may seem silly to from a user’s point of view to have such abstraction boundaries between components that to the user – rightfully – are all just “Lean”. But such abstraction boundaries keep the complexity in check, and should not be given up easily.

Saying “use the first component of the name” might be ok here, because it pragmatically in the common case together with how people (and lake) typically structure lean compilation, so I’m not vetoing the idea, despite the abstraction leak smell.

(Or maybe we have already given up on this separation of concern; I think an optimization in how lean looks up modules in the search path assumes that the first component determines which search path entry to look at. So maybe the :ship: has :sailboat:ed.)

Sebastian Ullrich (Dec 06 2023 at 17:36):

Lean 4 has actually assumed that the first component of module names is unique in the search path for longer than it had any package manager. Among other reasons, it allows the system to tell the user where it expected the file for import A.B to be if it was able to find A but not B.

Kyle Miller (Dec 06 2023 at 17:37):

If this is a leak, it'll show up as "the user is surprised because two modules sharing a top-level module have instances with identical autogenerated names".

But, yeah, it turns out the module root seems to be known as a "package" internally (independent of Lake packages I guess), and when importing the first thing on the search path that provides a package owns that package. docs#Lean.SearchPath.findWithExt

Kyle Miller (Dec 06 2023 at 17:42):

I know the idea of an environment extension to track autogenerated definitions has been kicked around. I wonder if we could generalize that idea to one that tracks provenance of autogenerated features of definitions?

The simplest implementation is that the extension could associate a list of strings to a declaration, though for tooling it could be a list of some inductive type that records more interpretable information (like whether or not the declaration itself was autogenerated, and perhaps, if it was, what declaration it is associated with).

For example, instances with autogenerated names could be tagged with "instance command named this instance with an autogenerated name". Then if you import two modules that happen to have declarations with the same names, the error message can report provenance. If this information is structured enough, Yaël's suggestion from earlier could be implementable.

Kyle Miller (Dec 06 2023 at 18:33):

I just had an odd idea, and I'm not sure what I think about it yet. We could extend the lexical syntax for names so that they can end with the … ellipsis character. When this character is present, then Lean looks for a name with the given prefix.

The pretty printer could pretty print long constants with … truncation, supplying enough characters beyond the long constant cutoff so that the truncated name uniquely determines the constant.

In the documentation, long names can also be truncated. If they use this character, then the names from the documentation can still be copy/pasted.

Given this, then I think there is less concern about whether a full module name is tacked onto the end of autogenerated instance names. The pretty printer could even be aware of the format of the module name trailer and try to truncate it there, even if it's not a long name.

Kyle Miller (Dec 06 2023 at 18:41):

Mario Carneiro said:

maybe Kyle Miller can run a test?

I'm happy to run tests. Let me know if I got this right:

Your version is "if the declaration's namespace is itself a declaration that is declared in the current package, then the suffix is not needed"
Sebastian's version is "if any constant whose name appears in the autogenerated name is a declaration in the current module, then the suffix is not needed"

Would you be want with your version to be upgraded to Sebastian's, but with "module" -> "package"? I'm considering the current namespace as being one of the constants that appears in the autogenerated name.

Damiano Testa (Dec 06 2023 at 21:07):

[Did not mean to post anything]

Mac Malone (Dec 07 2023 at 00:12):

Kyle Miller said:

I'm not completely certain about this, but I've been under the impression that different Lake packages are not supposed to have intersecting sets of module roots.

This is not a safe assumption. Currently, Lean does enforce this and Lake does not work around it, but Lake does not enforce it and I hoping to eventually work around Lean here to support intersecting module roots.

Mac Malone (Dec 07 2023 at 00:15):

As an aside, It is actually relatively easy for Lake to work around this -- all it has to do is organize files in the build directory such that, as far as Lean is concerned, they share the same path. (Though just removing the check in Lean core is similarly easy.)

Mac Malone (Dec 07 2023 at 00:17):

I also think this is a very important medium-term feature. It is very likely that a number of packages from the same author / organization may wish to share the same module root but be distributed as separate packages. In fact, Mathlib even may want to do something like this itself in the future to allow downloading subsets of the library.

Kyle Miller (Dec 07 2023 at 00:24):

Let's say someone is using these overlay packages and there is a conflict between autogenerated names between two such packages. Whose "fault" is that? (In the sense of who would someone in this position feel should have gotten things right.) Is it Lean for not making unique enough names, or is it the package authors for not testing that their overlay packages can be simultaneously imported?

Jireh Loreaux (Dec 07 2023 at 06:31):

Mario Carneiro said:

I am very :sob: about not having import Foo renaming or similar to just solve these issues once and for all

Why can't we have this exactly? Or is it just hard to implement and we haven't managed to do it yet?

Mario Carneiro (Dec 07 2023 at 06:58):

it's baked pretty deeply into the design of lean

Mario Carneiro (Dec 07 2023 at 06:58):

as for whether it's worth doing anyway? That's above my pay grade, take it up with the FRO

Mac Malone (Dec 07 2023 at 10:49):

From my past expereince toying with this, one major problem was that C symbols are derived directly from the Lean name (with just some mangling to make them valid C identifiers). Thus, even with a perfect Lean-level rename, the resulting modules wouldstill not link due to a symbol clash.

Scott Morrison (Dec 10 2023 at 22:45):

I think it would be good to have an issue asking about this. It is indeed very difficult to change at this point, but even more substantial changes to how import works are on the medium term roadmap (i.e. "the module system") so it would be good to keep this point "live".

Kyle Miller (Apr 03 2024 at 02:10):

Update on shorter instance names (lean4#3089): The algorithm has been refined a bit more, and we're thinking it's about ready to merge, but it would be good to get some feedback first.

You can compare the old generated names with the new generated names. This data is across all instances, whether or not they have been explicitly named.

It adds project-based suffixes if the instance only refers to declarations from outside the current project (a "project" is the root module). For example, in mathlib, if the instance is not about anything contained in a module starting with Mathlib, then it adds a _mathlib suffix. This seems to be enough to protect against cross-project collisions.

Here's a scatterplot showing old generated length vs new generated length:

image.png

Note that the new algorithm incorporates binders, which explains why some instance names get longer, for example Fintype.instDecidableExistsOfDecidablePred has this OfDecidablePred clause from a binder.

With some slightly older data that used a longer suffixing scheme, here's a histogram showing ratio of new generated length to old generated length (median ratio of new length to old length is 73%, and with this longer suffixing scheme it's 75%):

image.png

There are 36884 instances across core, std, and mathlib. With the old algorithm, it would generate 19957 unique names (that's assuming no one overrode the generated name) and the new one would generate 19707 unique names. 1.2% worse on uniqueness to get shorter instance names seems like a reasonable trade.

Kyle Miller (Apr 03 2024 at 02:15):

The 2.5 kB instance name from issue lean4#2343 is now ContinuousLinearMap.instNormedSpaceOfSMulCommClass

Kyle Miller (Apr 03 2024 at 02:18):

The longest instance name with this new algorithm is still sizable, but it's only 202 characters long:

AlgebraicGeometry.StructureSheaf.instIsLocalizedModuleαCommRingPrimeComplAsIdealStalkCommRingCatPushforwardObjTopMapAlgebraMapValOpensTopologicalSpaceTopStructureSheafToLinearMapToPushforwardStalkAlgHom

That's explicitly named as docs#AlgebraicGeometry.StructureSheaf.isLocalizedModule_toPushforwardStalkAlgHom

Kyle Miller (Apr 03 2024 at 02:25):

The main heuristic for the naming algorithm is that it uses explicit arguments and type arguments, and it only ever incorporates a particular sub-expression into a name at most once.

Type arguments make it so that, for example, x = y gets turned into EqInt if x y : Int.

Yaël Dillies (Apr 03 2024 at 05:51):

(I must say it's quite funny to me that we even use scatter-plots to compare naming algorithms)

Kevin Buzzard (Apr 03 2024 at 08:26):

There are about 16000 instance names and you mention that the longest is 200 characters (which is still a bit long in my mind) but the 100th longest is CategoryTheory.MonoidalCategory.fullBraidedSubcategory.instFullFullSubcategoryFullBraidedSubcategoryInclusion which is at the top end of being reasonable, and if we decide that anything longer than this is unsatisfactory (e.g. it's still polluting docs search) then we can start renaming the longer ones. So this looks good to me. It wasn't the 200 character names which were awful, it was the 2000 character names, so I'm definitely much happier!

Mario Carneiro (Apr 03 2024 at 08:27):

Do we know how this interacts with doc-gen's "could not render due to size" error?

Mario Carneiro (Apr 03 2024 at 08:30):

Also, have we considered using underscores for separation instead of camel case? Naming convention notwithstanding, having actual separators in these massive names would significantly improve the scannability of the name itself, especially when multi-word things like FullBraidedSubcategory are getting mashed together

Mario Carneiro (Apr 03 2024 at 08:31):

(of course that's not going to help with character count metrics)

Yaël Dillies (Apr 03 2024 at 08:59):

Yes, my support for the new naming convention is to the point that we can recognise subatoms within names because they are _-separated (before, a _ could be separating subatoms within the new name, or subatoms within the names' atoms). So the camel casing of declarations which are never going to be an atom in another declaration's name is bad to me.

Kyle Miller (Apr 03 2024 at 08:59):

Also, have we considered using underscores for separation instead of camel case?

I had thought about doing it for Prop to try to stick with the mathlib naming convention. I didn't do it because (1) it adds a bit more complexity, (2) it makes the names longer, and (3) human readability isn't a primary goal (though it's nice to have).

Regarding (1), it's not very much extra complexity. Design-wise, there's the small detail of whether to use __mathlib for the suffix when using _'s.

Mario Carneiro (Apr 03 2024 at 11:42):

_mathlib sounds better, if possible. _Mathlib would have the benefit of being different from the usual naming convention such that there won't be any theoretical collisions

Mario Carneiro (Apr 03 2024 at 11:45):

(By the way, I think the naming convention should have a special rule for instances regardless, so I don't think it should be considered as a constraint.)

Kyle Miller (Apr 03 2024 at 22:03):

@Mario Carneiro @Yaël Dillies Are you both suggesting that instance naming convention should be snake case? Like inst_fintype_subtype_of_decidableEq rather than instFintypeSubtypeOfDecidableEq? (Edit: said 'camel', meant 'snake')

By the way, the plan is that once we change the name auto-generator, we don't want to touch it again for a very long time. We're looking to make sure everyone is happy enough with whatever convention it uses, but also it would be nice to get something in sooner rather than later (maybe there's still a chance to slip it into this release? but there can't be very much more development for that to happen).

Matt Diamond (Apr 03 2024 at 22:17):

@Kyle Miller did you mean snake case? the first thing you wrote was snake case and the second was camel case

Kim Morrison (Apr 03 2024 at 22:26):

(release of v4.8.0 looks unlikely before next week at this point, and even next week may be difficult because the entire FRO is travelling)

Yaël Dillies (Apr 04 2024 at 05:51):

Yes, inst_fintype_subtype_of_decidableEq is much more readable than instFintypeSubtypeOfDecidableEq, which could be interpreted in many wrong ways, eg inst + FintypeSubtype + OfDecidable + Eq

Eric Wieser (Apr 04 2024 at 08:07):

Maybe inst_Fintype_Subtype... to make it clear that instances have special naming rules?

Eric Wieser (Apr 04 2024 at 08:08):

A nice property of the current instFintypeSubtype scheme is that it contains the original typeclass name without any case folding

Mario Carneiro (Apr 04 2024 at 08:42):

Eric Wieser said:

Maybe inst_Fintype_Subtype... to make it clear that instances have special naming rules?

while I'm not opposed to this naming convention, I'm not sure I like the idea of using a different naming convention literally just to be different. We already use way more naming conventions simultaneously than most languages

Number Eighteen (Apr 04 2024 at 13:30):

I will always lament the fact that kebab case cannot be used in Lean. I used to find it weird and for weirdos, but then I realized how much better things look, are to type, and most importantly to read; especially long names.

Mauricio Collares (Apr 04 2024 at 13:53):

(To save people some searching: "kebab case" is the name used in JavaScript communities for lisp-style-variables-separated-by-dashes)

Eric Wieser (Apr 04 2024 at 14:08):

def «-you-just-have-to-put-points-on-the-end-» : Nat := 37

Kyle Miller (Apr 05 2024 at 21:35):

Here's what we're thinking: this PR is shortening the names in the same style as the current name generator, and changing the naming conventions is out of scope. Maybe changing naming conventions can be on the table at some later point, as part of a more general discussion (rather than as being part of this particular feature).

Kyle Miller (Apr 05 2024 at 21:35):

The question now is: Would everyone be happy with this new name generator being merged for the next release as-is or with very minor modifications?

Straw poll options: :+1: means "yeah!" :thumbs_down: means "no, but I'm not against merging" and :stop_sign: means "we should not proceed". (I've added all three to make it easy. My actual vote is :thumbs_up: :-) )

Last updated: May 02 2025 at 03:31 UTC