Zulip Chat Archive
Stream: lean4
Topic: Pre-RFC: lean/lake module resolution
Mario Carneiro (Sep 14 2023 at 09:22):
I want to restart this conversation now that we have some more data on how projects are laid out and what kind of things people are doing with lakefiles. Currently, lean resolves files in a "search path", meaning that if /home/me
is in the LEAN_SRC_PATH
and you write import A.B
, it will look in /home/me/A/B.lean
for the file. Lake populates this path using information from the lakefile.
- However, most projects maintain the pattern that the first component of the import path is the project name (including
Mathlib
, contrary to the lean 3 behavior) - This is soft-enforced by lake through its default package setup
I would like to propose the following pattern:
- The
LEAN_SRC_PATH
maps paths to module prefixes, for example/home/me/A/ -> A
ormathlib4/src/ -> Mathlib
, such thatMathlib.*
maps tomathlib4/src/*.lean
- Lake populates this path using information in the lakefile
- The module prefix can be empty (recovering the current behavior), but this is discouraged
- The file with exactly the module prefix (e.g.
Mathlib
) is also looked up in the folder with a TBD special name likemathlib4/src/lib.lean
Advantages:
- Because this enforces a top level hierarchical structure for modules, we can do fewer file lookups for a given module. If we see
Mathlib.Foo
we don't have to look instd/
because it declares a disjoint module prefix. - It moves the folder roots one level down, meaning that they collide less, solving the problem from lean4#2456 that there can be multiple libraries with the same root, for example
Mathlib
andMathlibExtras
both usemathlib4/
as the root for the source path which means that when doing a reverse lookup (we have a file on disk and want to know what lib it is in) we cannot easily do this without a full recursive traversal of all libraries withmathlib4/
or a parent as the root. - It means we can move
Mathlib.lean
out of the root and renameMathlib/
tosrc/
for a more standard project structure - This kind of structure is more similar to rust (but compare haskell/java, which would put the sources in
src/Mathlib/
). Random C# example suggests they also have the ability to map file paths to module prefixes
Eric Wieser (Sep 14 2023 at 09:25):
Is it currently legal for two different git repos to provide non-overlapping files in a Mathlib
folder?
Mario Carneiro (Sep 14 2023 at 09:28):
yes...
Mario Carneiro (Sep 14 2023 at 09:29):
I was thinking of mentioning a "level 2" stricter version of this, which would require that all module prefixes in the LEAN_SRC_PATH
are nonoverlapping, which would exclude both empty module prefixes and also multiple packages with the same "name for resolution purposes"
Mario Carneiro (Sep 14 2023 at 09:30):
Current usage is conformant with that stricter approach, although the lean 3 resolution very much was not
Eric Wieser (Sep 14 2023 at 09:30):
It's been a while since I've looked at it, but as far as I'm aware the current design is consistent with how PYTHON_PATH
works
Mario Carneiro (Sep 14 2023 at 09:36):
yes, from looking around it appears that python projects are structured like current lean 4 projects, except that the root file for a submodule is A/__init__.py
instead of A.py
Eric Wieser (Sep 14 2023 at 09:39):
Mario Carneiro said:
- It means we can move
Mathlib.lean
out of the root and renameMathlib/
tosrc/
for a more standard project structure
I don't understand this claim; where does Mathlib.lean
live in this picture?
Eric Wieser (Sep 14 2023 at 09:40):
Or to be clearer: let's imagine I create a new Mathlib/Mathlib.lean
(import Mathlib.Mathlib
) with the current project structure (that's a blank file). This doesn't clash with Mathlib.lean
(import Mathlib
) in the current implementation, but I don't see how you avoid the clash with your proposal
Mario Carneiro (Sep 14 2023 at 09:44):
Mathlib.lean
would live at src/lib.lean
Mario Carneiro (Sep 14 2023 at 09:46):
we could also call it default.lean
like in the old days
Thomas Murrills (Sep 14 2023 at 09:46):
Just to clarify for this stream: renaming mathlib4/Mathlib/
to mathlib4/src/
is technically a separate decision, right? Are all the other specific advantages still available for Mathlib if it isn’t renamed (i.e. we just map mathlib4/Mathlib
to Mathlib
)?
Mario Carneiro (Sep 14 2023 at 09:47):
yes
Eric Wieser (Sep 14 2023 at 09:54):
Mario Carneiro said:
we could also call it
default.lean
like in the old days
This seems like it would be confusing unless we turned back on default.lean
elsewhere too (which is certainly a separate discussion)
Mario Carneiro (Sep 14 2023 at 09:55):
yeah, lib.lean
was meant to only apply at the top level
Eric Wieser (Sep 14 2023 at 09:55):
Sorry for somehow missing that you already explained lib.lean
in the original post
Eric Wieser (Sep 14 2023 at 09:57):
Should we split the part of this thread that is about re-introducing default
? (edit: done)
Notification Bot (Sep 14 2023 at 09:59):
14 messages were moved from this topic to #lean4 > Reintroducing default.lean by Eric Wieser.
Eric Wieser (Sep 14 2023 at 10:03):
Eric Wieser said:
It's been a while since I've looked at it, but as far as I'm aware the current design is consistent with how
PYTHON_PATH
works
Perhaps it's worth noting that the PYTHON_PATH
approach is usually to just have a single site-packages
entry that contains all the user packages, which negatives any real benefit you'd get from "fewer file lookups"
Mario Carneiro (Sep 14 2023 at 10:04):
but python doesn't have project specific options, I think
Mario Carneiro (Sep 14 2023 at 10:05):
or does it? I saw something about setup.py
recursion issues
Eric Wieser (Sep 14 2023 at 10:05):
Project specific option in python are just installation instructions
Eric Wieser (Sep 14 2023 at 10:05):
Once it's in site-packages they're forgotten
Eric Wieser (Sep 14 2023 at 10:06):
So indeed, that approach doesn't attempt to deal with things like set_option pp.unicode.fun true
being set for a whole package
Joachim Breitner (Sep 14 2023 at 10:07):
I was thinking of mentioning a "level 2" stricter version of this
The ecosystem is still young, so whatever useful discipline we can impose and enforce now we should, before people relying on its absence. So if we gain something from not having overlapping module prefixes, and we want developers to rely on it, maybe just make it so? Or are there relevant use-cases that this would forbid?
Mario Carneiro (Sep 14 2023 at 10:08):
Even if you wanted a lean 3 style mathlib which used the top level namespace for its stuff, you could still do that by having a bunch of mappings like mathlib/src/Algebra -> Algebra, mathlib/src/Tactic -> Tactic, etc
Mario Carneiro (Sep 14 2023 at 10:11):
One case that might be hard to handle if we required nonoverlapping paths would be if mathlib wanted to keep Mathlib.Extras
in a mathlib4/extras/
folder while having the rest of Mathlib.*
in mathlib4/src
Mario Carneiro (Sep 14 2023 at 10:12):
currently we're calling it MathlibExtras
so this isn't a problem
Mario Carneiro (Sep 14 2023 at 10:13):
we could have a rule that if one root name is a prefix of another one then the longer one takes precedence for lookup
Joachim Breitner (Sep 14 2023 at 10:13):
A middle ground may be to allow overlapping … jinx :-)
Mario Carneiro (Sep 14 2023 at 10:14):
that would also work for the lean 3 mathlib, it could just claim .
, the "default supplier of packages"
Mario Carneiro (Sep 14 2023 at 10:15):
you wouldn't be allowed to have two mathlibs though
Joachim Breitner (Sep 14 2023 at 10:15):
This may have some nice applications, but overall I wonder if mixing the “package prefix” namespace and the actual module namespace is advisable.
What would happen if you require the module prefix to be a simple name (no .
?)
A benefit would be that if I see an import Foo.Bar.Baz
, I know immediately that somewhere must be a file Bar/Baz.lean
, and that Foo
is the library name.
Mario Carneiro (Sep 14 2023 at 10:16):
That's what rust does, more or less
Mario Carneiro (Sep 14 2023 at 10:18):
although the package names used for external references in rust are actually set per-package, i.e. mathlib says it wants Std.*
to mean stuff from the std
package, although it defaults to the name of the package itself
Mario Carneiro (Sep 14 2023 at 10:18):
this is useful if two of your deps decided to have the same name
Sebastian Ullrich (Sep 14 2023 at 10:27):
I believe that's a problem anyway as e.g. hygiene depends on globally unique module names
Joachim Breitner (Sep 14 2023 at 10:28):
So you are saying I can import two libraries that both think of themselves as Foo
as Foo1
and Foo2
? How would the imports
within these libraries look like?
Mario Carneiro (Sep 14 2023 at 10:34):
Inside Foo1
you would have imports to Self.A
or Foo.A
or however Foo1
is referring to itself, and similarly in Foo2
, and in Bar
that imports both it would refer to Foo1.A
and Foo2.A
Joachim Breitner (Sep 14 2023 at 10:35):
So if I jump to the definition and open a file in Foo1
, then that file will be processed with a different search path – got it.
Mario Carneiro (Sep 14 2023 at 10:37):
but sebastian has a reasonable point that this would mess with global names which involve mixing in the "absolute path" of the module
Mario Carneiro (Sep 14 2023 at 10:37):
you would still have to get pretty unlucky to hit a conflict, unless the two packages are actually copies of each other
Mario Carneiro (Sep 14 2023 at 10:38):
although in that case you will also have regular names clashing
Sebastian Ullrich (Sep 14 2023 at 10:39):
Quick general notes before I board a flight: as mentioned the current scheme was inspired by Python, so "more standard" is arguable. And "fewer file lookups" would first have to be proven relevant in practice. So that mostly seems to leave lean4#2456 as motivation
Sebastian Ullrich (Sep 14 2023 at 10:39):
Having said that, I'm not fundamentally opposed to the change (yet)
Mario Carneiro (Sep 14 2023 at 10:40):
I did see some people using src/
even in the python world
Mario Carneiro (Sep 14 2023 at 10:41):
they would just use what I called the java/haskell convention with src/Mathlib/actual/stuff.lean
Mario Carneiro (Sep 14 2023 at 10:42):
vscode at least makes folders that only contain other folders rather lightweight, so I'm not that opposed to this structure either and it also solves the lean4#2456 problem
Mario Carneiro (Sep 14 2023 at 10:45):
the java convention usually has project names with 2 to 4 components so it's actually more like src/com/mypkg/actual/stuff.java
which is a bit annoyingly deep for my taste
Mario Carneiro (Sep 14 2023 at 10:47):
the source mapping convention described here lets you still have long package names if you want without forcing a deep file structure too
Mario Carneiro (Sep 14 2023 at 10:48):
but I kind of agree with Joachim Breitner that we don't really need multi-component package names, I would be happy if they were just strings
Mario Carneiro (Sep 14 2023 at 10:48):
this came up as an issue in lake new foo.bar
parsing
Mario Carneiro (Sep 14 2023 at 10:51):
Sebastian Ullrich said:
And "fewer file lookups" would first have to be proven relevant in practice.
Note that it's not just about the performance impact of fewer file lookups for the computer: it's also a file lookup for the human if files can come from anywhere like this
Mario Carneiro (Sep 14 2023 at 10:52):
If mathlib added a file mathlib4/Std/Data/AssocList.lean
it would work, and it would be very confusing and undesirable
Mario Carneiro (Sep 14 2023 at 10:53):
The worst part is that it's almost a reasonable thing to do, people might decide to write their for_mathlib
folders by literally putting things in mathlib
Mario Carneiro (Sep 14 2023 at 10:54):
in fact, now that I come to think about it this might even break lake, since it allows you to change the file that an upstream file depends on
Yaël Dillies (Sep 14 2023 at 10:55):
Mario Carneiro said:
that would also work for the lean 3 mathlib, it could just claim
.
, the "default supplier of packages"
So import X
would import the file and import X.
would import all files in the folder? I like that.
Mario Carneiro (Sep 14 2023 at 10:59):
I... wasn't suggesting that, but it's not terrible
Mario Carneiro (Sep 14 2023 at 10:59):
probably import X.*
is a clearer way to write that
Sebastian Ullrich (Sep 14 2023 at 11:00):
Mario Carneiro said:
If mathlib added a file
mathlib4/Std/Data/AssocList.lean
it would work, and it would be very confusing and undesirable
It wouldn't, the lookup (findWithExt
) is still based on packages. So the first Std/
in the search path wins. I think there may have been an intention to eventually check for unique package roots in the build system but I suppose we never got to that.
Yaël Dillies (Sep 14 2023 at 11:01):
Ah okay :sweat_smile: Regardless, I think import X.
or import X.*
is a good syntax idea to support.
Mario Carneiro (Sep 14 2023 at 11:16):
Sebastian Ullrich said:
Mario Carneiro said:
If mathlib added a file
mathlib4/Std/Data/AssocList.lean
it would work, and it would be very confusing and undesirableIt wouldn't, the lookup (
findWithExt
) is still based on packages. So the firstStd/
in the search path wins. I think there may have been an intention to eventually check for unique package roots in the build system but I suppose we never got to that.
Oh, I guess I should add that as an extra bullet point under:
This is soft-enforced by lake through its default package setup
lean itself has a notion of packages which are one component, no more no less
Mario Carneiro (Sep 14 2023 at 11:18):
in which case it seems like the argument for this method is even stronger, we should just look in the corresponding package for the remainder of the module name after the package
Mac Malone (Sep 15 2023 at 08:43):
Mario Carneiro said:
I was thinking of mentioning a "level 2" stricter version of this, which would require that all module prefixes in the
LEAN_SRC_PATH
are nonoverlapping, which would exclude both empty module prefixes and also multiple packages with the same "name for resolution purposes"
This would be a problem. Lean/Lake need overlapping paths in order to develop them. Since Lean and Lake are already hard-coded at the bottom of the list, building a new version of Lake requires its source a higher level and overlapping/overriding (in order for interactive features to work properly).
Mario Carneiro (Sep 15 2023 at 08:46):
I don't follow
Mario Carneiro (Sep 15 2023 at 08:46):
lean and lake are not in overlapping paths
Mac Malone (Sep 15 2023 at 08:47):
Mario Carneiro said:
- Because this enforces a top level hierarchical structure for modules, we can do fewer file lookups for a given module. If we see
Mathlib.Foo
we don't have to look instd/
because it declares a disjoint module prefix.
It is worth noting this is already enforced. Once Lean has found a module root in a path, it will already only look in that directory. (Edit: Sebastian already covered this.) I was actually hoping to remove this so module naming conventions like Data.Foo
or Data.Bar
would be legal in separate Foo
and Bar
packages (like Haskell).
Mario Carneiro (Sep 15 2023 at 08:47):
src/Init -> Init
src/Lean -> Lean
src/lake/Lake -> Lake
Mac Malone (Sep 15 2023 at 08:54):
@Mario Carneiro Maybe I was confused on what you meant by overlapping. I meant that when building Lake, in LEAN_SRC_PATH
there is a .elan/toolchains/src/lake/Lake
that points to the toolchain source and then there is a my-lean4/src/lake/Lake
that points to to the source in the current development workspace (and similar for LEAN_PATH
). That later one is important in order not go-to def or import outside of your development build.
Mario Carneiro (Sep 15 2023 at 08:55):
why is overlapping resolution needed there?
Mario Carneiro (Sep 15 2023 at 08:56):
I don't see why you wouldn't always want it to go to one or the other (I guess you always want the dev source to win)
Mac Malone (Sep 15 2023 at 08:56):
Yes, I thought by overlapping you meant that you could not have two paths the contain the same modules.
Mario Carneiro (Sep 15 2023 at 08:56):
well I did not say that but I assumed it to be the case
Mario Carneiro (Sep 15 2023 at 08:57):
more specifically, it wouldn't be an error but one would be completely shadowed by the other
Mario Carneiro (Sep 15 2023 at 08:57):
there wouldn't be any point putting two in the path
Mac Malone (Sep 15 2023 at 08:57):
Ah, well then, that is already how it works.
Mac Malone (Sep 15 2023 at 09:01):
Mario Carneiro said:
- The file with exactly the module prefix (e.g.
Mathlib
) is also looked up in the folder with a TBD special name likemathlib4/src/lib.lean
On a matter of preference, I vote against this. I much prefer the the root file matching the folder name. Ruby does this and I love it. Rust also explicitly added support for the current pattern over the previous special name approach, indicating broad community support for the current pattern.
Mario Carneiro (Sep 15 2023 at 09:01):
no, rust still has lib.rs
Mac Malone (Sep 15 2023 at 09:01):
I really do not like Python's __init__.py
modules. I also confuses me to remember that that does something special.
Mario Carneiro (Sep 15 2023 at 09:01):
I think you are mixing this up with the other thread
Mario Carneiro (Sep 15 2023 at 09:01):
this is just about the root module
Mac Malone (Sep 15 2023 at 09:02):
Oh, in that case, it would be even more confusing (because of the inconsistency).
Mario Carneiro (Sep 15 2023 at 09:02):
I think it would solve some of the problems discussed earlier about Foo.lean
and FooExe.lean
Mac Malone (Sep 15 2023 at 09:03):
And, as far as I now understand it, Rust only does that because a package can only have one library, which is not true for Lean/Lake.
Mario Carneiro (Sep 15 2023 at 09:03):
no, it does it for exes too
Mario Carneiro (Sep 15 2023 at 09:03):
this would be an extra config arg, lib.lean
is just the default
Mac Malone (Sep 15 2023 at 09:04):
Yeah, I would not prefer that default. So, my vote would be against it. I would support the option for such a package structure, though.
Mac Malone (Sep 15 2023 at 09:07):
Again, I like the current naming convention. It mirrors Ruby's exactly, and I have always loved Ruby's aesthetics.
Mario Carneiro (Sep 15 2023 at 09:10):
most of the examples I can find put sources in a app/
or lib/
directory in ruby
Mac Malone (Sep 15 2023 at 09:10):
Yes, but the module hierarchy under that follows exactly Lean's.
Mac Malone (Sep 15 2023 at 09:11):
Essentially, Ruby defaults to srcDir := "lib"
(for libraries)
Mac Malone (Sep 15 2023 at 09:12):
I would be happy with a source directory default, but my understanding Sebastian and/or Leo decided against that.
Mario Carneiro (Sep 15 2023 at 09:13):
hence the discussion
Mac Malone (Sep 15 2023 at 09:15):
Yep! I was just dropping in my vote.
Mario Carneiro (Sep 15 2023 at 09:16):
I can't tell whether it is typical for ruby projects to have multiple folders in lib/
Mac Malone (Sep 15 2023 at 09:23):
@Mario Carneiro Generally, not. That would be essentially a multi-library package (which exists, but are not common).
Mac Malone (Sep 15 2023 at 09:24):
It would be as common as having multiple folders in lib
would be if srcDir := "lib"
was the default in Lean/Lake (i.e., generally only for multi-library packages).
Last updated: Dec 20 2023 at 11:08 UTC