Zulip Chat Archive

Stream: lean4

Topic: Pre-RFC: lean/lake module resolution


Mario Carneiro (Sep 14 2023 at 09:22):

I want to restart this conversation now that we have some more data on how projects are laid out and what kind of things people are doing with lakefiles. Currently, lean resolves files in a "search path", meaning that if /home/me is in the LEAN_SRC_PATH and you write import A.B, it will look in /home/me/A/B.lean for the file. Lake populates this path using information from the lakefile.

  • However, most projects maintain the pattern that the first component of the import path is the project name (including Mathlib, contrary to the lean 3 behavior)
  • This is soft-enforced by lake through its default package setup

I would like to propose the following pattern:

  • The LEAN_SRC_PATH maps paths to module prefixes, for example /home/me/A/ -> A or mathlib4/src/ -> Mathlib, such that Mathlib.* maps to mathlib4/src/*.lean
  • Lake populates this path using information in the lakefile
  • The module prefix can be empty (recovering the current behavior), but this is discouraged
  • The file with exactly the module prefix (e.g. Mathlib) is also looked up in the folder with a TBD special name like mathlib4/src/lib.lean

Advantages:

  • Because this enforces a top level hierarchical structure for modules, we can do fewer file lookups for a given module. If we see Mathlib.Foo we don't have to look in std/ because it declares a disjoint module prefix.
  • It moves the folder roots one level down, meaning that they collide less, solving the problem from lean4#2456 that there can be multiple libraries with the same root, for example Mathlib and MathlibExtras both use mathlib4/ as the root for the source path which means that when doing a reverse lookup (we have a file on disk and want to know what lib it is in) we cannot easily do this without a full recursive traversal of all libraries with mathlib4/ or a parent as the root.
  • It means we can move Mathlib.lean out of the root and rename Mathlib/ to src/ for a more standard project structure
  • This kind of structure is more similar to rust (but compare haskell/java, which would put the sources in src/Mathlib/). Random C# example suggests they also have the ability to map file paths to module prefixes

Eric Wieser (Sep 14 2023 at 09:25):

Is it currently legal for two different git repos to provide non-overlapping files in a Mathlib folder?

Mario Carneiro (Sep 14 2023 at 09:28):

yes...

Mario Carneiro (Sep 14 2023 at 09:29):

I was thinking of mentioning a "level 2" stricter version of this, which would require that all module prefixes in the LEAN_SRC_PATH are nonoverlapping, which would exclude both empty module prefixes and also multiple packages with the same "name for resolution purposes"

Mario Carneiro (Sep 14 2023 at 09:30):

Current usage is conformant with that stricter approach, although the lean 3 resolution very much was not

Eric Wieser (Sep 14 2023 at 09:30):

It's been a while since I've looked at it, but as far as I'm aware the current design is consistent with how PYTHON_PATH works

Mario Carneiro (Sep 14 2023 at 09:36):

yes, from looking around it appears that python projects are structured like current lean 4 projects, except that the root file for a submodule is A/__init__.py instead of A.py

Eric Wieser (Sep 14 2023 at 09:39):

Mario Carneiro said:

  • It means we can move Mathlib.lean out of the root and rename Mathlib/ to src/ for a more standard project structure

I don't understand this claim; where does Mathlib.lean live in this picture?

Eric Wieser (Sep 14 2023 at 09:40):

Or to be clearer: let's imagine I create a new Mathlib/Mathlib.lean (import Mathlib.Mathlib) with the current project structure (that's a blank file). This doesn't clash with Mathlib.lean (import Mathlib) in the current implementation, but I don't see how you avoid the clash with your proposal

Mario Carneiro (Sep 14 2023 at 09:44):

Mathlib.lean would live at src/lib.lean

Mario Carneiro (Sep 14 2023 at 09:46):

we could also call it default.lean like in the old days

Thomas Murrills (Sep 14 2023 at 09:46):

Just to clarify for this stream: renaming mathlib4/Mathlib/ to mathlib4/src/ is technically a separate decision, right? Are all the other specific advantages still available for Mathlib if it isn’t renamed (i.e. we just map mathlib4/Mathlib to Mathlib)?

Mario Carneiro (Sep 14 2023 at 09:47):

yes

Eric Wieser (Sep 14 2023 at 09:54):

Mario Carneiro said:

we could also call it default.lean like in the old days

This seems like it would be confusing unless we turned back on default.lean elsewhere too (which is certainly a separate discussion)

Mario Carneiro (Sep 14 2023 at 09:55):

yeah, lib.lean was meant to only apply at the top level

Eric Wieser (Sep 14 2023 at 09:55):

Sorry for somehow missing that you already explained lib.lean in the original post

Eric Wieser (Sep 14 2023 at 09:57):

Should we split the part of this thread that is about re-introducing default? (edit: done)

Notification Bot (Sep 14 2023 at 09:59):

14 messages were moved from this topic to #lean4 > Reintroducing default.lean by Eric Wieser.

Eric Wieser (Sep 14 2023 at 10:03):

Eric Wieser said:

It's been a while since I've looked at it, but as far as I'm aware the current design is consistent with how PYTHON_PATH works

Perhaps it's worth noting that the PYTHON_PATH approach is usually to just have a single site-packages entry that contains all the user packages, which negatives any real benefit you'd get from "fewer file lookups"

Mario Carneiro (Sep 14 2023 at 10:04):

but python doesn't have project specific options, I think

Mario Carneiro (Sep 14 2023 at 10:05):

or does it? I saw something about setup.py recursion issues

Eric Wieser (Sep 14 2023 at 10:05):

Project specific option in python are just installation instructions

Eric Wieser (Sep 14 2023 at 10:05):

Once it's in site-packages they're forgotten

Eric Wieser (Sep 14 2023 at 10:06):

So indeed, that approach doesn't attempt to deal with things like set_option pp.unicode.fun true being set for a whole package

Joachim Breitner (Sep 14 2023 at 10:07):

I was thinking of mentioning a "level 2" stricter version of this

The ecosystem is still young, so whatever useful discipline we can impose and enforce now we should, before people relying on its absence. So if we gain something from not having overlapping module prefixes, and we want developers to rely on it, maybe just make it so? Or are there relevant use-cases that this would forbid?

Mario Carneiro (Sep 14 2023 at 10:08):

Even if you wanted a lean 3 style mathlib which used the top level namespace for its stuff, you could still do that by having a bunch of mappings like mathlib/src/Algebra -> Algebra, mathlib/src/Tactic -> Tactic, etc

Mario Carneiro (Sep 14 2023 at 10:11):

One case that might be hard to handle if we required nonoverlapping paths would be if mathlib wanted to keep Mathlib.Extras in a mathlib4/extras/ folder while having the rest of Mathlib.* in mathlib4/src

Mario Carneiro (Sep 14 2023 at 10:12):

currently we're calling it MathlibExtras so this isn't a problem

Mario Carneiro (Sep 14 2023 at 10:13):

we could have a rule that if one root name is a prefix of another one then the longer one takes precedence for lookup

Joachim Breitner (Sep 14 2023 at 10:13):

A middle ground may be to allow overlapping … jinx :-)

Mario Carneiro (Sep 14 2023 at 10:14):

that would also work for the lean 3 mathlib, it could just claim ., the "default supplier of packages"

Mario Carneiro (Sep 14 2023 at 10:15):

you wouldn't be allowed to have two mathlibs though

Joachim Breitner (Sep 14 2023 at 10:15):

This may have some nice applications, but overall I wonder if mixing the “package prefix” namespace and the actual module namespace is advisable.
What would happen if you require the module prefix to be a simple name (no .?)
A benefit would be that if I see an import Foo.Bar.Baz, I know immediately that somewhere must be a file Bar/Baz.lean, and that Foo is the library name.

Mario Carneiro (Sep 14 2023 at 10:16):

That's what rust does, more or less

Mario Carneiro (Sep 14 2023 at 10:18):

although the package names used for external references in rust are actually set per-package, i.e. mathlib says it wants Std.* to mean stuff from the std package, although it defaults to the name of the package itself

Mario Carneiro (Sep 14 2023 at 10:18):

this is useful if two of your deps decided to have the same name

Sebastian Ullrich (Sep 14 2023 at 10:27):

I believe that's a problem anyway as e.g. hygiene depends on globally unique module names

Joachim Breitner (Sep 14 2023 at 10:28):

So you are saying I can import two libraries that both think of themselves as Foo as Foo1 and Foo2? How would the imports within these libraries look like?

Mario Carneiro (Sep 14 2023 at 10:34):

Inside Foo1 you would have imports to Self.A or Foo.A or however Foo1 is referring to itself, and similarly in Foo2, and in Bar that imports both it would refer to Foo1.A and Foo2.A

Joachim Breitner (Sep 14 2023 at 10:35):

So if I jump to the definition and open a file in Foo1, then that file will be processed with a different search path – got it.

Mario Carneiro (Sep 14 2023 at 10:37):

but sebastian has a reasonable point that this would mess with global names which involve mixing in the "absolute path" of the module

Mario Carneiro (Sep 14 2023 at 10:37):

you would still have to get pretty unlucky to hit a conflict, unless the two packages are actually copies of each other

Mario Carneiro (Sep 14 2023 at 10:38):

although in that case you will also have regular names clashing

Sebastian Ullrich (Sep 14 2023 at 10:39):

Quick general notes before I board a flight: as mentioned the current scheme was inspired by Python, so "more standard" is arguable. And "fewer file lookups" would first have to be proven relevant in practice. So that mostly seems to leave lean4#2456 as motivation

Sebastian Ullrich (Sep 14 2023 at 10:39):

Having said that, I'm not fundamentally opposed to the change (yet)

Mario Carneiro (Sep 14 2023 at 10:40):

I did see some people using src/ even in the python world

Mario Carneiro (Sep 14 2023 at 10:41):

they would just use what I called the java/haskell convention with src/Mathlib/actual/stuff.lean

Mario Carneiro (Sep 14 2023 at 10:42):

vscode at least makes folders that only contain other folders rather lightweight, so I'm not that opposed to this structure either and it also solves the lean4#2456 problem

Mario Carneiro (Sep 14 2023 at 10:45):

the java convention usually has project names with 2 to 4 components so it's actually more like src/com/mypkg/actual/stuff.java which is a bit annoyingly deep for my taste

Mario Carneiro (Sep 14 2023 at 10:47):

the source mapping convention described here lets you still have long package names if you want without forcing a deep file structure too

Mario Carneiro (Sep 14 2023 at 10:48):

but I kind of agree with Joachim Breitner that we don't really need multi-component package names, I would be happy if they were just strings

Mario Carneiro (Sep 14 2023 at 10:48):

this came up as an issue in lake new foo.bar parsing

Mario Carneiro (Sep 14 2023 at 10:51):

Sebastian Ullrich said:

And "fewer file lookups" would first have to be proven relevant in practice.

Note that it's not just about the performance impact of fewer file lookups for the computer: it's also a file lookup for the human if files can come from anywhere like this

Mario Carneiro (Sep 14 2023 at 10:52):

If mathlib added a file mathlib4/Std/Data/AssocList.lean it would work, and it would be very confusing and undesirable

Mario Carneiro (Sep 14 2023 at 10:53):

The worst part is that it's almost a reasonable thing to do, people might decide to write their for_mathlib folders by literally putting things in mathlib

Mario Carneiro (Sep 14 2023 at 10:54):

in fact, now that I come to think about it this might even break lake, since it allows you to change the file that an upstream file depends on

Yaël Dillies (Sep 14 2023 at 10:55):

Mario Carneiro said:

that would also work for the lean 3 mathlib, it could just claim ., the "default supplier of packages"

So import X would import the file and import X. would import all files in the folder? I like that.

Mario Carneiro (Sep 14 2023 at 10:59):

I... wasn't suggesting that, but it's not terrible

Mario Carneiro (Sep 14 2023 at 10:59):

probably import X.* is a clearer way to write that

Sebastian Ullrich (Sep 14 2023 at 11:00):

Mario Carneiro said:

If mathlib added a file mathlib4/Std/Data/AssocList.lean it would work, and it would be very confusing and undesirable

It wouldn't, the lookup (findWithExt) is still based on packages. So the first Std/ in the search path wins. I think there may have been an intention to eventually check for unique package roots in the build system but I suppose we never got to that.

Yaël Dillies (Sep 14 2023 at 11:01):

Ah okay :sweat_smile: Regardless, I think import X. or import X.* is a good syntax idea to support.

Mario Carneiro (Sep 14 2023 at 11:16):

Sebastian Ullrich said:

Mario Carneiro said:

If mathlib added a file mathlib4/Std/Data/AssocList.lean it would work, and it would be very confusing and undesirable

It wouldn't, the lookup (findWithExt) is still based on packages. So the first Std/ in the search path wins. I think there may have been an intention to eventually check for unique package roots in the build system but I suppose we never got to that.

Oh, I guess I should add that as an extra bullet point under:

This is soft-enforced by lake through its default package setup

lean itself has a notion of packages which are one component, no more no less

Mario Carneiro (Sep 14 2023 at 11:18):

in which case it seems like the argument for this method is even stronger, we should just look in the corresponding package for the remainder of the module name after the package

Mac Malone (Sep 15 2023 at 08:43):

Mario Carneiro said:

I was thinking of mentioning a "level 2" stricter version of this, which would require that all module prefixes in the LEAN_SRC_PATH are nonoverlapping, which would exclude both empty module prefixes and also multiple packages with the same "name for resolution purposes"

This would be a problem. Lean/Lake need overlapping paths in order to develop them. Since Lean and Lake are already hard-coded at the bottom of the list, building a new version of Lake requires its source a higher level and overlapping/overriding (in order for interactive features to work properly).

Mario Carneiro (Sep 15 2023 at 08:46):

I don't follow

Mario Carneiro (Sep 15 2023 at 08:46):

lean and lake are not in overlapping paths

Mac Malone (Sep 15 2023 at 08:47):

Mario Carneiro said:

  • Because this enforces a top level hierarchical structure for modules, we can do fewer file lookups for a given module. If we see Mathlib.Foo we don't have to look in std/ because it declares a disjoint module prefix.

It is worth noting this is already enforced. Once Lean has found a module root in a path, it will already only look in that directory. (Edit: Sebastian already covered this.) I was actually hoping to remove this so module naming conventions like Data.Foo or Data.Bar would be legal in separate Foo and Bar packages (like Haskell).

Mario Carneiro (Sep 15 2023 at 08:47):

  • src/Init -> Init
  • src/Lean -> Lean
  • src/lake/Lake -> Lake

Mac Malone (Sep 15 2023 at 08:54):

@Mario Carneiro Maybe I was confused on what you meant by overlapping. I meant that when building Lake, in LEAN_SRC_PATH there is a .elan/toolchains/src/lake/Lake that points to the toolchain source and then there is a my-lean4/src/lake/Lake that points to to the source in the current development workspace (and similar for LEAN_PATH). That later one is important in order not go-to def or import outside of your development build.

Mario Carneiro (Sep 15 2023 at 08:55):

why is overlapping resolution needed there?

Mario Carneiro (Sep 15 2023 at 08:56):

I don't see why you wouldn't always want it to go to one or the other (I guess you always want the dev source to win)

Mac Malone (Sep 15 2023 at 08:56):

Yes, I thought by overlapping you meant that you could not have two paths the contain the same modules.

Mario Carneiro (Sep 15 2023 at 08:56):

well I did not say that but I assumed it to be the case

Mario Carneiro (Sep 15 2023 at 08:57):

more specifically, it wouldn't be an error but one would be completely shadowed by the other

Mario Carneiro (Sep 15 2023 at 08:57):

there wouldn't be any point putting two in the path

Mac Malone (Sep 15 2023 at 08:57):

Ah, well then, that is already how it works.

Mac Malone (Sep 15 2023 at 09:01):

Mario Carneiro said:

  • The file with exactly the module prefix (e.g. Mathlib) is also looked up in the folder with a TBD special name like mathlib4/src/lib.lean

On a matter of preference, I vote against this. I much prefer the the root file matching the folder name. Ruby does this and I love it. Rust also explicitly added support for the current pattern over the previous special name approach, indicating broad community support for the current pattern.

Mario Carneiro (Sep 15 2023 at 09:01):

no, rust still has lib.rs

Mac Malone (Sep 15 2023 at 09:01):

I really do not like Python's __init__.py modules. I also confuses me to remember that that does something special.

Mario Carneiro (Sep 15 2023 at 09:01):

I think you are mixing this up with the other thread

Mario Carneiro (Sep 15 2023 at 09:01):

this is just about the root module

Mac Malone (Sep 15 2023 at 09:02):

Oh, in that case, it would be even more confusing (because of the inconsistency).

Mario Carneiro (Sep 15 2023 at 09:02):

I think it would solve some of the problems discussed earlier about Foo.lean and FooExe.lean

Mac Malone (Sep 15 2023 at 09:03):

And, as far as I now understand it, Rust only does that because a package can only have one library, which is not true for Lean/Lake.

Mario Carneiro (Sep 15 2023 at 09:03):

no, it does it for exes too

Mario Carneiro (Sep 15 2023 at 09:03):

this would be an extra config arg, lib.lean is just the default

Mac Malone (Sep 15 2023 at 09:04):

Yeah, I would not prefer that default. So, my vote would be against it. I would support the option for such a package structure, though.

Mac Malone (Sep 15 2023 at 09:07):

Again, I like the current naming convention. It mirrors Ruby's exactly, and I have always loved Ruby's aesthetics.

Mario Carneiro (Sep 15 2023 at 09:10):

most of the examples I can find put sources in a app/ or lib/ directory in ruby

Mac Malone (Sep 15 2023 at 09:10):

Yes, but the module hierarchy under that follows exactly Lean's.

Mac Malone (Sep 15 2023 at 09:11):

Essentially, Ruby defaults to srcDir := "lib" (for libraries)

Mac Malone (Sep 15 2023 at 09:12):

I would be happy with a source directory default, but my understanding Sebastian and/or Leo decided against that.

Mario Carneiro (Sep 15 2023 at 09:13):

hence the discussion

Mac Malone (Sep 15 2023 at 09:15):

Yep! I was just dropping in my vote.

Mario Carneiro (Sep 15 2023 at 09:16):

I can't tell whether it is typical for ruby projects to have multiple folders in lib/

Mac Malone (Sep 15 2023 at 09:23):

@Mario Carneiro Generally, not. That would be essentially a multi-library package (which exists, but are not common).

Mac Malone (Sep 15 2023 at 09:24):

It would be as common as having multiple folders in lib would be if srcDir := "lib" was the default in Lean/Lake (i.e., generally only for multi-library packages).


Last updated: Dec 20 2023 at 11:08 UTC