Zulip Chat Archive

Stream: lean4

Topic: Mathlib 4


Mario Carneiro (May 08 2021 at 21:43):

I'd like to start the repo for mathlib 4. The idea is that for the present it will serve a role similar to library_dev from the old days: it will collect some basic theorems about nat, list and a few other things that I think everyone who is using lean 4 is rediscovering. The intent will be to later replace this with the ported version when the mathlib porting effort is ready, but in the meanwhile there are still a lot of things that can be done with only a basic library, where the cost of duplicated work is worth having these lemmas available for the current small experiments.

Additionally, as I've mentioned before, there is a second mathlib spinoff project, that I'm calling the mathlib prelude. This will collect all the tactics used by mathlib proper such as ring, linarith and the myriad small tactics. Due to lean 4's architecture, it is necessary for these tactics to exist in a separate compilation unit so that they can be available in their compiled form in mathlib. (You can use a tactic immediately after its definition but this uses the interpreter and so doesn't benefit from the tremendous work done on the lean 4 compiler.)

Which brings me to the issue: should these two be separate repositories or not? I think that having all of mathlib in one monorepo reduces the maintenance cost of adding features to lean significantly, and I think most would agree that making even a comparatively simple change in lean core today is higher friction than adding to mathlib, since at the very least a new version of lean containing the modification has to be released and mathlib updated for the new version, both things that mathlib changes don't have to worry about.

A natural solution would be to have mathlib and the prelude in two folders of one repo. But I don't know if lean 4 (or more accurately leanpkg) can currently handle packages that exist in subfolders of a repo.

What do people think about the plan and/or the timing? I don't think there is any reason to delay further, once the logistical issues are resolved.

Guilherme Silva (May 08 2021 at 21:56):

I think that it is a good idea to have a mathlib. If you create it, I will use it.

Kevin Buzzard (May 08 2021 at 22:10):

Why can't you just have a directory prelude in a mathlib 4 repo? I don't understand your question about leanpkg. Why does prelude have to be a package? If it becomes constant later you could split it off maybe?

I have been porting parts of mathlib to Lean 4. It's fun. My main worry is that lean 3 keeps moving. I started on some famous file (maybe data.list.basic?) yesterday and discovered that it was last edited about a week ago.

Mario Carneiro (May 08 2021 at 22:14):

Why can't you just have a directory prelude in a mathlib 4 repo?

That's also a theoretical possibility, although I'm quite sure that lean 4 doesn't currently support a project setup like this. But we can quite possibly make it work if we use a custom build process, since lean itself can be driven at the per-file level and we just need to call the compiler at the right time and import it when compiling other mathlib files. In the end we'll surely need our own build process so this isn't totally outlandish.

Mario Carneiro (May 08 2021 at 22:16):

I have been porting parts of mathlib to Lean 4. It's fun. My main worry is that lean 3 keeps moving. I started on some famous file (maybe data.list.basic?) yesterday and discovered that it was last edited about a week ago.

I have no intention of being completionist or trying to keep it up to date with mathlib. That will wait until the port is ready, at which point we can just make sure that any divergence between the mathlib 3 and mathlib 4 libraries in those files is merged.

Mario Carneiro (May 08 2021 at 22:18):

The main goal at this point is to have the things that lean 4 users need, not feature parity with mathlib 3

David Renshaw (May 08 2021 at 23:57):

A natural solution would be to have mathlib and the prelude in two folders of one repo. But I don't know if lean 4 (or more accurately leanpkg) can currently handle packages that exist in subfolders of a repo.

Keeping things in a single repository makes sense to me. In my understanding, leanpkg is at least somewhat modeled after Rust's cargo, and cargo supports that kind of pattern very nicely (with cargo workspaces and some other features).

Gabriel Ebner (May 09 2021 at 07:10):

This is a great idea, and the timing is good. We've been waiting for this to happen.

I'd definitely prefer a monorepo (in particular if we need to move stuff from mathlib to prelude and back to satisfy the staging requirements of Lean 4). I fully expect some issues to arise with this setup, but I believe the whole point of mathlib4 is to find and iron out these bugs.

Sebastian Ullrich (May 09 2021 at 07:16):

What leanpkg can do:

  • depend on a package in a subdir using { path = "prelude" }

What leanpkg cannot do yet:

  • automatically compile a dependency into a shared library and pass it in --plugin
  • share a common lock file between related packages, cargo workspaces-like
    • We don't even have lock files yet. We should.
  • depend on packages in a subdirectory of a different repository (apparently cargo simply searches the whole repo for a Cargo.toml...?)
    • only relevant if someone wants to depend on mathlib prelude without mathlib

In summary, leanpkg is in need of some serious love in general (contributions welcome), but it has (almost) equal support for both repo approaches

Gabriel Ebner (May 09 2021 at 07:32):

Ah, this works much better than I'd expected: https://github.com/gebner/lean4-split-leanpkg At least, leanpkg build works in every subdirectory. And you can also use it as a dependency.

Gabriel Ebner (May 09 2021 at 07:33):

We don't even have lock files yet. We should.

leanpkg has only lock files (the revisions of the dependencies are fixed after all)

Sebastian Ullrich (May 09 2021 at 07:37):

Ah, I didn't remember whether rev was optional. An actual reproducibility issue is lean_version = "leanprover/lean4:nightly" though.

Gabriel Ebner (May 09 2021 at 07:37):

Right, elan should just reject that.

Eric Wieser (May 09 2021 at 07:42):

What heuristic would elan use? Reject "nightly" specifically? Reject all branches?

Eric Wieser (May 09 2021 at 07:43):

Even tags are a reproducibility issue, as they can in principle be moved

Mario Carneiro (May 09 2021 at 07:54):

Okay, I've created https://github.com/leanprover-community/mathlib4 . I tried to set something up like @Gabriel Ebner 's demo but I'm not sure what to do about the "main" files, since mathlib is a library, not an application

Gabriel Ebner (May 09 2021 at 07:54):

@Eric Wieser I think nightly is even more special, since it isn't a tag or branch. Instead it refers to the latest tag in the nightlies repository.

Gabriel Ebner (May 09 2021 at 07:56):

@Mario Carneiro I think prelude/leanpkg.toml is missing.

Gabriel Ebner (May 09 2021 at 07:57):

Mario Carneiro said:

I'm not sure what to do about the "main" files, since mathlib is a library, not an application

I think the main file needs to import all of the other ones, otherwise they won't be compiled.

Mario Carneiro (May 09 2021 at 07:58):

Why can't they just be compiled separately e.g. via make

Mario Carneiro (May 09 2021 at 08:05):

Right now the hierarchy looks like this:

prelude/
  Tactic/
    Basic.lean
  Main.lean
  leanpkg.toml
src/
  Algebra/
    etc.
leanpkg.toml

Inside prelude/Main.lean I tried to import Tactic.Basic but it doesn't like it. (unknown package 'Tactic') There are no path settings in prelude/leanpkg.toml

Gabriel Ebner (May 09 2021 at 08:20):

I've just pushed a fix. You're not going to like it.

Gabriel Ebner (May 09 2021 at 08:20):

Apparently, Lean 4 doesn't like the src directory. It wants a Mathlib directory and a Mathlib.lean file (you could also use OMGPleaseNo/ and OMGPleaseNo.lean).

Mario Carneiro (May 09 2021 at 08:21):

Lean's standard library has a makefile that does this:

stdlib:
# Use `+` to use the Make jobserver with `leanmake` for parallelized builds
    +"${LEAN_BIN}/leanmake" lib PKG=Init $(LEANMAKE_OPTS)
    +"${LEAN_BIN}/leanmake" lib PKG=Std $(LEANMAKE_OPTS)
    +"${LEAN_BIN}/leanmake" lib PKG=Lean $(LEANMAKE_OPTS)
    +"${LEAN_BIN}/leanmake" bin PKG=Leanpkg BIN_NAME=leanpkg $(LEANMAKE_OPTS) LINK_OPTS="${CMAKE_EXE_LINKER_FLAGS}"

The corresponding directory structure is

src/
  Init/
    stuff...
  Std/
    stuff...
  Init.lean
  Std.lean

and you can refer to things as Init.* and Std.*, with no extra prefix. Can we make that work?

Mario Carneiro (May 09 2021 at 08:22):

It is not clear to me whether Init is used compiled when building Std... probably it's just using stage 0

Mario Carneiro (May 09 2021 at 08:26):

your fix isn't working for me

Mario Carneiro (May 09 2021 at 08:26):

I still get "unknown package 'Tactic'" when running leanmake in the prelude directory

Gabriel Ebner (May 09 2021 at 08:27):

It worked for me. But that was probably due to old olean files. :-/

Mario Carneiro (May 09 2021 at 08:29):

It works if I use MathlibPrelude.Tactic.Basic inside MathlibPrelude.lean

Mario Carneiro (May 09 2021 at 08:30):

that also works inside the editor for Logic/Basic.lean but leanmake at the root fails with build/temp/Mathlib/Logic/Basic.depend: No such file or directory

Gabriel Ebner (May 09 2021 at 08:31):

Can you try git clean -fdx? It works for me, now.

Sebastian Ullrich (May 09 2021 at 08:33):

The src/ directory in the Lean repo is just a historical artifact

Mario Carneiro (May 09 2021 at 08:37):

I pulled the latest and used git clean -fdx, and leanmake works inside prelude but not at /

Gabriel Ebner (May 09 2021 at 08:38):

Ok, this is weird. I can't reproduce this at all.

Mario Carneiro (May 09 2021 at 08:39):

$ leanmake
unknown package 'MathlibPrelude'
/home/mario/.elan/toolchains/leanprover-lean4-nightly/bin/../share/lean/lean.mk:98: build/temp/Mathlib/Logic/Basic.depend: No such file or directory
make: *** [/home/mario/.elan/toolchains/leanprover-lean4-nightly/bin/../share/lean/lean.mk:50: build/temp/Mathlib/Logic/Basic.depend] Error 1

Sebastian Ullrich (May 09 2021 at 08:40):

leanmake doesn't know about dependencies, that's leanpkg's job. There usually is no reason to ever use leanmake directly.

Mario Carneiro (May 09 2021 at 08:40):

and what's the mumbo jumbo

Mario Carneiro (May 09 2021 at 08:41):

leanpkg build looks like it worked

Mario Carneiro (May 09 2021 at 08:42):

and it doesn't require going into prelude first either

Mario Carneiro (May 09 2021 at 08:44):

okay, so since leanpkg is... opinionated about the names of files and folders, is there a way to make this work?

Mathlib/
  Logic/
    Basic.lean
Mathlib.lean
Prelude/
  Tactic/
    Basic.lean
Prelude.lean
leanpkg.toml

Patrick Massot (May 09 2021 at 08:46):

Maybe you're taking from the wrong end. Sebastian and https://github.com/leanprover/lean4/issues/397 clearly state that help on leanpkg is very much welcome.

Mario Carneiro (May 09 2021 at 08:48):

Okay, if this behavior is up for modification then I can certainly try to make the appropriate code changes, if someone doesn't get there first. In that case we can just plan the layout we want and make leanpkg work with it

Patrick Massot (May 09 2021 at 08:49):

I'll let @Sebastian Ullrich confirm of course, but that's my understanding.

Mario Carneiro (May 09 2021 at 08:49):

But my impression, at least partially, is that this behavior is deliberate and viewed as better than the alternatives. Personally I liked the flexibility of the lean 3 leanpkg.path file: just point to the project roots and imports are relative to that

Patrick Massot (May 09 2021 at 08:51):

Tooling is really important and non-trivial, but it makes sense that Leo and Sebastian have no time to work on that. It would be really to have something flexible enough so that it could also do everything leanproject does, maybe through a leanpkg plugin or something.

Mario Carneiro (May 09 2021 at 08:51):

Sure, at this point I'm just trying to solicit design criteria

Mario Carneiro (May 09 2021 at 08:53):

The existence of leanproject in some ways indicates that leanpkg was not fulfilling its goal, or at least that it was abandoned and the new crowd didn't want to work with the old code base

Mario Carneiro (May 09 2021 at 08:55):

Are we going to reinvent leanproject for mathlib 4? I don't think it's necessarily a bad idea - mathlib has to handle several things beyond what leanpkg itself is even aware of, like downloading built oleans, plus things like docgen might end up in it as well (I don't actually know if this is currently the case).

Patrick Massot (May 09 2021 at 08:55):

I think Lean 3 was not very convenient for this kind of programming, but this may very well be incompetence from me.

Mario Carneiro (May 09 2021 at 08:56):

Since we know that users will have a copy of lean, it's quite possible that the best scripting language to use for build tooling is lean 4

Patrick Massot (May 09 2021 at 08:56):

I also wrote it under the misconceptions that everybody on earth could easily get a sane python environment. At least once per month we see on Zulip how wrong I was.

Patrick Massot (May 09 2021 at 08:58):

About doing it in Lean 4, I guess the main roadblock is available libraries. leanproject talks to git (not from simply executing the command line git client), downloads files, decompresses files. How far are we from doing that in a Lean 4 project (without implementing the zip compression and decompression algorithm)?

Patrick Massot (May 09 2021 at 08:59):

Is it already possible to simply link Lean code and a C library and call C functions from Lean?

Mario Carneiro (May 09 2021 at 08:59):

you can get by with the equivalent of shell scripting for a lot of that

Mario Carneiro (May 09 2021 at 08:59):

oh that's a good point

Patrick Massot (May 09 2021 at 08:59):

Isn't this assuming the existence of a sane shell/POSIX environment?

Mario Carneiro (May 09 2021 at 09:00):

However I think that a program that needs to link with a C library is at risk of environment issues

Mario Carneiro (May 09 2021 at 09:00):

a shell script is too

Patrick Massot (May 09 2021 at 09:00):

Can't we statically link to make sure everything is in the executable?

Patrick Massot (May 09 2021 at 09:01):

I never knew much about C compilation and forgot most of it.

Mario Carneiro (May 09 2021 at 09:01):

Yes, if we statically link and distribute the binary that's probably the most reliable method

Patrick Massot (May 09 2021 at 09:01):

I would hope that embarking git and a compression/decompression lib wouldn't be too huge.

Patrick Massot (May 09 2021 at 09:02):

That doesn't answer my question: is this currently possible in Lean 4 (from user land)?

Mario Carneiro (May 09 2021 at 09:03):

I'm not sure. I think so, we just need a good demo of it

Mario Carneiro (May 09 2021 at 09:04):

You can use @[extern] to inline bits of C code, that can call an external function; then during the compilation stage you just link in the libraries you need to supply the function

Sebastian Ullrich (May 09 2021 at 09:07):

Yeah, with the help of some inline C code you should be able to fill out most FFI holes (strings, structures as long as they are behind a pointer, ...)

Sebastian Ullrich (May 09 2021 at 09:10):

Mario Carneiro said:

okay, so since leanpkg is... opinionated about the names of files and folders, is there a way to make this work?

Mathlib/
  Logic/
    Basic.lean
Mathlib.lean
Prelude/
  Tactic/
    Basic.lean
Prelude.lean
leanpkg.toml

How do you declare what parts should be compiled and loaded as plugins for the other files? Our assumption was that packages are the most intuitive boundary for that (like with Rust's procedural macros, with the important difference that in Lean you can just interpret macros inside the same package), and I believe that each package should have its own leanpkg.toml (like in Rust).

Patrick Massot (May 09 2021 at 09:12):

Mario Carneiro said:

we [just] need a good demo of it

This sounds very true to me.

Mario Carneiro (May 09 2021 at 09:39):

If this were rust, the hierarchy would look more like this:

src/
  Logic/
    Basic.lean
  Lib.lean
  leanpkg.toml
prelude/
  Tactic/
    Basic.lean
  Lib.lean
  leanpkg.toml

Here I'm imagining that leanpkg is either defaulting to a file named Lib.lean, or (preferably) this can be set inside the toml file. Similarly, the mathlib toml file uses a path dependency like ../prelude to point to the other file.

Mario Carneiro (May 09 2021 at 09:39):

In particular, the names src/ and prelude/ don't have to be anything in particular

Gabriel Ebner (May 09 2021 at 09:41):

mathlib4#1

Mario Carneiro (May 09 2021 at 09:42):

Actually it should also be a possibility to put the mathlib toml file at the root, where the file has a src = "./src" key to find the actual root for the sources

Gabriel Ebner (May 09 2021 at 09:43):

That PR does a bit of Makefile hackery to get rid of the separate project requirement for plugins.

Mario Carneiro (May 09 2021 at 09:43):

I see a reference to stage0, does this mean that mathlib can be it's own prelude? :D

Gabriel Ebner (May 09 2021 at 09:43):

Yes!

Gabriel Ebner (May 09 2021 at 09:46):

It is a bit wasteful since it compiles that plugin's dependencies twice. In principle that's solvable, but it requires copy&pasting the makefile.

Mario Carneiro (May 09 2021 at 09:46):

how does that work exactly? Do we still need to keep the files separate

Gabriel Ebner (May 09 2021 at 09:46):

At that point it might be easier/faster to just rewrite leanmake in lean.

Gabriel Ebner (May 09 2021 at 09:47):

No need to keep the files separate. It operates in two stages:

  1. stage0 compiles Mathlib/Plugin.lean (and its dependencies) and links it into Mathlib_Plugin.so
  2. Then all of mathlib is compiled with the --plugin=Mathlib_Plugin.so argument.

Mario Carneiro (May 09 2021 at 09:47):

So the only rule is that any dependency of Plugin.lean can't use tactics from the plugin

Gabriel Ebner (May 09 2021 at 09:48):

I think you can still use the tactics, they'll just be interpreted, right?

Mario Carneiro (May 09 2021 at 09:48):

or maybe they still can, but interpreted?

Mario Carneiro (May 09 2021 at 09:49):

certainly you can't use a tactic before it's defined though, unlike lean

Mario Carneiro (May 09 2021 at 09:49):

(we would need to keep a copy of the source tree if we wanted that, which sounds like way too much effort)

Sebastian Ullrich (May 09 2021 at 09:50):

The intended layout was

prelude/
  MathlibPrelude.lean
  MathlibPrelude/
  leanpkg.toml
Mathlib.lean
Mathlib/
leanpkg.toml

Gabriel Ebner (May 09 2021 at 09:50):

Tactics in proofs would be fine (stage0 could set a -Dskip_all_proofs=true flag or something).

Gabriel Ebner (May 09 2021 at 09:51):

But I believe plugins can also add other features (like linters) that can take effect before they are defined.

Gabriel Ebner (May 09 2021 at 09:52):

Sebastian Ullrich said:

The intended layout was

prelude/
  MathlibPrelude/

This is pretty cumbersome, since you need to write import MathlibPrelude.Tactic.Core instead of import Mathlib.Tactic.Core (which is already too long). You also need to remember where it happened to be defined.

Patrick Massot (May 09 2021 at 09:52):

Didn't Leo told us we don't want to go to stage0 hell?

Mario Carneiro (May 09 2021 at 09:53):

Also MathlibPrelude / Mathlib is the name of the project, it's silly to have all sources in a file with the name of the project because you are already presumably in such a folder

Gabriel Ebner (May 09 2021 at 09:54):

@Patrick Massot The stage0 thing in core is something different: they include the compiled C files in the git repository. We'd just be compiling (a small part of) mathlib twice.

Mario Carneiro (May 09 2021 at 09:55):

I would prefer if the folder hierarchy just skips that top level, and then src/Foo/Bar.lean exists at ProjectName.Foo.Bar, where ProjectName is in the toml file

Mario Carneiro (May 09 2021 at 09:55):

or you can set the source folder to . and just put it at Foo/Bar.lean

Mario Carneiro (May 09 2021 at 09:58):

Looking at the lean source layout, I wonder whether we can also do something similar, where the mathlib project actually consists of multiple lean "packages" along some kind of topical lines. Perhaps even the current top level folders. As long as these projects can reference each other that would seem to be a way to get rid of that top level qualifier

Mario Carneiro (May 09 2021 at 09:59):

(referring to the fact that there are four top level lean packages defined in the lean source tree - Init, Lean, Src, Leanpkg)

Patrick Massot (May 09 2021 at 10:01):

Gabriel, I remember a discussion during LT2021 when someone suggested we could have something like stage0 to handle precisely this issue in mathlib, and Leo laughed and told us we definitely don't want to go in this direction. But I don't remember any more detail.

Gabriel Ebner (May 09 2021 at 10:06):

I'm pretty sure that he referred to using macros before they're defined. E.g. we might want to implement a derive EquivFunctor handler (for Scott's transfer tactic). Since it is so useful, we might want to use derive EquivFunctor when implementing it. One crazy proposal to make that work is to commit "expanded" Lean files to the mathlib repository, where all occurrences of derive EquivFunctor have been (automatically) expanded. I believe this is what Leo was laughing at.

Gabriel Ebner (May 09 2021 at 10:07):

I think I should just remove the stage0 name.

Mario Carneiro (May 09 2021 at 10:13):

We don't really need to compile anything twice here, right? We can just avoid compiling any file that is a dependency of Plugin.lean during the second pass

Mario Carneiro (May 09 2021 at 10:13):

which should happen automatically since the build files already exist

Mario Carneiro (May 09 2021 at 10:15):

Gabriel Ebner said:

I think I should just remove the stage0 name.

It's more of a stage 1

Gabriel Ebner (May 09 2021 at 10:17):

Mario Carneiro said:

We don't really need to compile anything twice here, right? We can just avoid compiling any file that is a dependency of Plugin.lean during the second pass

This is unfortunately hard, we'd need to add the Mathlib_Plugin.sodependency to those files only. This is hard to do in make.

Mario Carneiro (May 09 2021 at 10:29):

By the way, feel free to merge the PR yourself when you think it is ready. I'm not planning for us to have any review standards in the mathlib4 repo higher than your average wiki during this experimentation phase, at least until we start putting CI credentials or something in it

Gabriel Ebner (May 09 2021 at 10:30):

Done. It's so nice to work on a project where CI takes only 16s.

Sebastian Ullrich (May 09 2021 at 10:50):

I would have hoped that we could come up with a less hackish approach when we have the chance to reimagine Lean package management :frowning: . Something principled that we can put in leanpkg, eventually without depending on make, so that other projects can use the same approach.

Gabriel Ebner (May 09 2021 at 10:52):

To be fair, that Makefile hack is more of a proof of concept showing that:

  1. We only need one Mathlib directory.
  2. We don't need to manually split mathlib into stages.
  3. This doesn't require large changes to Lean.

Gabriel Ebner (May 09 2021 at 10:54):

eventually without depending on make

Indeed, my next step would be to rewrite the Makefile into a build.lean file. (No need to spawn a subprocess for lean --deps anymore!) If that works out well, we can move it into Lean/leanpkg.

Sebastian Ullrich (May 09 2021 at 11:03):

I still think that separate (sub)packages are the most natural solution to compiling different parts of a project in different modes. How these packages are laid out on disk is a related but separate question. Also consider third parties that want to use mathlib tactics without mathlib.

Gabriel Ebner (May 09 2021 at 11:18):

How these packages are laid out on disk is a related but separate question.

Ultimately, I don't think anybody on the mathlib side cares about the packaging. The core issue is the disk layout:

  1. Simple directory structure (no mathlib/prelude/MathlibPrelude).
  2. Uniform imports (no figuring out whether its MathlibPrelude.LinearAlgebra.Basic or Mathlib.LinearAlgebra.Basic or LinearAlgebra.Basic).
  3. All LinearAlgebra files (e.g.) should be in the same subdirectory.

Gabriel Ebner (May 09 2021 at 12:09):

Sebastian Ullrich said:

Also consider third parties that want to use mathlib tactics without mathlib.

Then you can just import mathlib; all the interesting tactics in mathlib depend on types and lemmas from mathlib anyhow. I have seen one good argument why you wouldn't want to do that in Lean 3: Joe Hendrix said that the nat.cast coercion introduced a performance footgun in a bitvector library. Note that 1) this will most likely no longer be an issue in Lean 4, and 2) such an instance would already be in the prelude, so you don't gain anything from the split.

Sebastian Ullrich (May 09 2021 at 12:25):

Do you have a feeling for how big the prelude will be? I get the point that it will probably be too big to put in a separate package. But I still wonder about whether the double compilation will (ever) be an issue, whether people will unknowingly increase the prelude, what the ramifications for separate compilation, if that can be applied at all to mathlib, are, etc...

Sebastian Ullrich (May 09 2021 at 12:26):

I'm also assuming that even disregarding the prelude, there are too many (cyclic) interconnections between topics to e.g. put (most of) LinearAlgebra in a separate package.

Gabriel Ebner (May 09 2021 at 12:31):

But I still wonder about whether the double compilation will (ever) be an issue

I think I was not clear enough about this. The double compilation is a hack because anything more clever is too hard to do with a Makefile. After rewriting the Makefile in Lean, it should be easy enough to compile the plugin+dependencies only once, and the rest with --plugin=.... Or maybe the double compilation is actually useful (so that we can use automation in Mathlib.Logic.Base, I don't know yet).

Sebastian Ullrich (May 09 2021 at 12:33):

So it would be something like "compile these modules and their dependencies into a shared library, then compile all remaining modules using the library"?

Gabriel Ebner (May 09 2021 at 12:35):

Yes. AFAICT, there are two possible reasons why we want plugins:

  • linarith, etc. should be as fast as possible.
  • It would be cool if #lint worked everywhere.

Gabriel Ebner (May 09 2021 at 12:40):

there are too many (cyclic) interconnections between topics to e.g. put (most of) LinearAlgebra in a separate package.

Pretty much. The mathlib import graph is a big hairy ball.

Gabriel Ebner (May 09 2021 at 12:41):

(Luckily it can still be combed because mathlib has holes.)

Daniel Fabian (May 09 2021 at 12:41):

are some / most of the issues solved or become non-issue in the presence of a nix-based per-file build or in case we had per-file JIT and therefore would only rely on interpretation within a file but not across files?

Gabriel Ebner (May 09 2021 at 12:43):

I don't see how nix per-file builds would change anything. JIT compilation would probably make a difference (no idea about the warm-up costs). Another very brutal solution would be to produce one shared object per lean file, and then pass all of them as --plugin arguments.

Daniel Fabian (May 09 2021 at 12:45):

well, the way how the lean compiler itself is built using nix works such, that the main module and its transitive dependencies are built and become available as imports for downstream dependencies.

Daniel Fabian (May 09 2021 at 12:46):

That's pretty much the same a JIT, no? At least insofar, as the thing you're importing is an already-built binary IIUC

Gabriel Ebner (May 09 2021 at 12:53):

the way how the lean compiler itself is built using nix works such, that the main module and its transitive dependencies are built and become available as imports for downstream dependencies.

It's not how the compiler is built itself (this is a different mechanism). The nix build files included with Lean do support plugins though, but you can only compile a single package into a plugin.

You can also use nix to execute the two-staged approach that is now (temporarily) used in mathlib4. Unfortunately the nix cache can't help you here.

Gabriel Ebner (May 09 2021 at 12:56):

That's pretty much the same a JIT, no? At least insofar, as the thing you're importing is an already-built binary IIUC

As the name implies, just-in-time compilation refers to compilation while the program is running. If you're loading the binaries from disk, then it's ahead-of-time compilation.

Daniel Fabian (May 09 2021 at 13:08):

what I was referring to was the question when a tactic defined is turned into compiled code and when is it interpreted. IIUC, inside of a file, the tactic would always be interpreted (it is available in the line straight after being defined), whereas when it's imported, then it ideally is already compiled.

So that subsequent files compile potentially much faster. This is important for something like mathlib that is its own consumer. And my understanding was, that the nix build AOT compiles as port of the import, effectively.

It is in that sense, that JIT would have a similar benefit. Depending of course, when exactly you JIT... One proposal was to JIT as part of the import as opposed to after each definition.

Sebastian Ullrich (May 09 2021 at 13:12):

No, there is no per-file native compilation in the Nix build either

Daniel Fabian (May 09 2021 at 13:13):

thanks for the clarification.

Daniel Fabian (May 09 2021 at 13:13):

so then, JIT would potentially really be a good thing.

Mac (May 09 2021 at 15:08):

Gabriel Ebner said:

Sebastian Ullrich said:

Also consider third parties that want to use mathlib tactics without mathlib.

Then you can just import mathlib; all the interesting tactics in mathlib depend on types and lemmas from mathlib anyhow. I have seen one good argument why you wouldn't want to do that in Lean 3: Joe Hendrix said that the nat.cast coercion introduced a performance footgun in a bitvector library. Note that 1) this will most likely no longer be an issue in Lean 4, and 2) such an instance would already be in the prelude, so you don't gain anything from the split.

Personally, I find "Then you can just import mathlib" to be a horrible solution. That introduces an absolutely massive dependency which severely pollutes the global namespace. This is one of the main reasons I heavily avoided mathlib in my own projects in Lean 3. If I want to toy around with proofs in some aspect of mathematics or build a formalization for some subsection that is different then mathlib's then I have to deal with mathlib having eaten all the reasonable names for concepts in the global namespace. This would be less of a problem if mathlib stuck its content inside its own namespace or if Lean did not use transitive flat imports, but neither of those things seem likely to change this iteration, so I still consider this to be a major problem with mathlib.

Patrick Massot (May 09 2021 at 15:14):

Mac, why do you use mathlib at all? Is it only because it gives you the opportunity to be rude?

Mac (May 09 2021 at 15:37):

I had no intentions of being rude, and I apologize if my comment came off that way. I was simply giving my thoughts on the matter and pointing out a potential concern with the solution of "Then just import mathlib" for a monolithic mathlib. It seemed to fit well with the discussion, seeing as there was a lot of back-and-forth about what package/repo split there should be (if any).

I used descriptions like "horrible" to emphasize the strength of my feelings/opinions on the matter. I believe this to be appropriate, though I guess I can see how it could appear rude.

Mac (May 09 2021 at 15:56):

Gabriel Ebner said:

How these packages are laid out on disk is a related but separate question.

Ultimately, I don't think anybody on the mathlib side cares about the packaging. The core issue is the disk layout:

  1. Simple directory structure (no mathlib/prelude/MathlibPrelude).
  2. Uniform imports (no figuring out whether its MathlibPrelude.LinearAlgebra.Basic or Mathlib.LinearAlgebra.Basic or LinearAlgebra.Basic).
  3. All LinearAlgebra files (e.g.) should be in the same subdirectory.

To resolve the MathlibPrelude/Mathlib problem, you could have dummy files like Mathlib.LinearAlgebra.Basic that simply import MathlibPrelude.LinearAlgebra.Basic, thus making the imports appear uniform to consumers.

Alternatively, could this work as a potential directory structure:

Mathlib.lean
Mathlib/
  Prelude/
  Prelude.lean
  leanpkg.toml
leanpkg.toml

That is, is it possible to get leanpkg to work with subpackages within the current package's directory structure? This may be a good question for @Sebastian Ullrich ,

Eric Wieser (May 09 2021 at 21:30):

The fact that mathlib takes all the names in the global namespace shouldn't be a problem as long as your code doesn't operate in the global namespace, right? Or are there still pain points there? Expecting mathlib to stay out of the global namespace so that you can use it instead seems somewhat hypocritical, although admittedly mathlib's use of it is easy to construe as selfish.

Mario Carneiro (May 09 2021 at 21:54):

Looking at the results of leanpkg build on mathlib4 right now, I notice that it has to call leanc on every file. This is likely to be wasted work for a large fraction of mathlib files, which have no computable definitions, or at least nothing that we care to have compiled. Maybe there should be a way for lean files to signal that they are in "mathematician mode" and want to opt out of compilation

Mac (May 09 2021 at 21:59):

Eric Wieser said:

The fact that mathlib takes all the names in the global namespace shouldn't be a problem as long as your code doesn't operate in the global namespace, right?

Unfortunately, that is not how Lean works. Local and scoped notation that clashes with global notation is considered ambiguous and global names clash with namespace names if the namespace has been open'd. See the example here: https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/Lean.20Hide.20Root.20Names.2FMacros/near/234738423 (that thread in general was about my issues with global name clashes).

Mario Carneiro (May 09 2021 at 22:28):

Note that mathlib is generally very conservative about global notation for exactly this reason, and uses local and scoped ("localized") notations since those don't have ambiguity issues

Mario Carneiro (May 09 2021 at 22:29):

It is possible that lean 4 has a different resolution mechanism that means that this strategy doesn't work anymore

Daniel Fabian (May 10 2021 at 01:14):

@Mario Carneiro Not quite sure if this is hijacking the thread, so please do tell so, if you think my discussion should be a separate topic.

How do you plan to structure mathlib 4? Specifically, how can we go about structuring it into packages that limit the dependency you need to take on? E.g. you want to build some crypto verification. In this case you most likely mostly care about some fairly basic algebra, but not analysis. Chances are you'd not even need the reals, etc.

Mario Carneiro (May 10 2021 at 01:18):

Well, mathlib has historically been reasonably good at importing only the things that are needed, so you can depend on a file and only the dependencies of that particular file need to be compiled. In the last year or so I think I haven't been exerting enough pressure to make sure that it stays this way so I think it might be more of a tangled web now, but in theory if someone needed to curate a particular subset for some purpose they could split up more files and decrease transitive dependencies to ensure that there isn't too much baggage getting pulled in

Mario Carneiro (May 10 2021 at 01:19):

For low level things like nat, list, and even rat and real to some extent, you should be able to get the basic stuff without pulling in half of the library

Mario Carneiro (May 10 2021 at 01:20):

However, I don't think that this can be accomplished with package level splits. That's way too coarse for mathlib

Daniel Fabian (May 10 2021 at 01:24):

so the proposed strategy is to rely on mathlib as a package and then depend on specific modules?

Mario Carneiro (May 10 2021 at 01:31):

yes

Daniel Fabian (May 10 2021 at 01:45):

should we have 10-20 for the lack of a better word "interface files" for mathlib, then? I.e. main entry points like algebra or analysis and compile each one into a package? These would still sit in the monorepo but would allow for us to distribute them more easily and wouldn't require a full mathlib dependency if you work in a limited domain?

Mario Carneiro (May 10 2021 at 01:50):

The problem with "main entry points" is that those files tend to be the source of a lot of spurious dependencies, so we try to avoid using them inside mathlib itself, and from outside mathlib it all comes down to your individual requirements. It is possible to import entire folders but that's generally not a good idea if you want to have a small dependency footprint

Scott Morrison (May 10 2021 at 04:47):

I think stripping down imports and making parts of mathlib more orthogonal is a great discussion to have about mathlib3, but should not be considered as part of porting mathlib to Lean 4.

Daniel Fabian (May 10 2021 at 08:14):

Fair enough, I'll open the discussion there, then

Mario Carneiro (May 10 2021 at 08:33):

I just ported init.data.nat.lemmas in the mathlib4 repo. :tada: However, I left an easter egg in it:

protected lemma min_le_left (a b : ) : Nat.min a b  a := by
  simp [Nat.min]; by_cases a  b <;> simp [h]
  case pos => exact Nat.le_refl _
  case neg => exact _ -- <- is this a bug? I'm not getting an error here

There are no errors thrown up when processing this definition, and in fact the full build completes successfully. If you don't put a tactic there it clearly shows there is a remaining goal, and if you put sorry then it will say the proof uses sorry, so it seems that the hole is actually a part of the proof. I don't think simp is finishing the goal because you need an application of Nat.le_of_not_le to complete the proof and that's not a simp lemma. I'm not sure how to do the trust zero dance in lean 4 though.

Daniel Fabian (May 10 2021 at 09:01):

print the term?

Daniel Fabian (May 10 2021 at 09:02):

the kernel doesn't eat meta variables, so it must have done something

Mario Carneiro (May 10 2021 at 09:23):

The term puts all the interesting stuff in private lemmas that I can't even #print

Mario Carneiro (May 10 2021 at 09:24):

protected theorem Nat.min_le_left :  (a b : ), Nat.min a b  a :=
fun (a b : ) =>
  Or.casesOn (Decidable.em (a  b))
    (fun (h : a  b) (h_1 : Decidable.em (a  b) = Or.inl h) =>
      Eq.ndrec
        (Eq.mpr
          (congrFun
            (congrArg LE.le
              (Eq.trans (iteCongr (eqTrue h) (fun (a_1 : True) => Eq.refl a) fun (a : ¬True) => Eq.refl b)
                (if_pos (ofEqTrue (Eq.refl True)))))
            a)
          (Nat.le_refl a))
        (Eq.symm h_1))
    (fun (h : ¬a  b) (h_1 : Decidable.em (a  b) = Or.inr h) =>
      Eq.ndrec
        (Eq.mpr
          (congrFun
            (congrArg LE.le
              (Eq.trans (iteCongr (eqFalse h) (fun (a_1 : False) => Eq.refl a) fun (a : ¬False) => Eq.refl b)
                (if_neg (ofEqTrue (eqTrueOfDecide (Eq.refl true))))))
            a)
          sorry)
        (Eq.symm h_1))
    (Eq.refl (Decidable.em (a  b)))

Mario Carneiro (May 10 2021 at 09:25):

Oh no I guess not, there is a sorry right there

Mario Carneiro (May 10 2021 at 09:25):

still it's surprising that this made it past the kernel without a peep

Daniel Fabian (May 10 2021 at 09:29):

the kernel eats sorry quite happily, I know that.

Daniel Fabian (May 10 2021 at 09:30):

but you should really see it in axioms for this one... don't you?

Mario Carneiro (May 10 2021 at 09:30):

yes

Mario Carneiro (May 10 2021 at 09:31):

It's not really a soundness issue, but the usual "theorem uses 'sorry'" message is missing

Daniel Fabian (May 10 2021 at 09:33):

still not great... please open a github issue for this.

Daniel Fabian (May 10 2021 at 09:33):

that feels like a fairly clear bug to me

Gabriel Ebner (May 10 2021 at 09:35):

I've also run into this but couldn't minimize it. It's a pretty frustrating bug because 1) you don't know you messed up and 2) you don't know how to fix it (because you can't see the goal state).

Mario Carneiro (May 10 2021 at 09:44):

minimized:

theorem test (o : x  y) (f : x  y) : y := by
  cases o with
  | inl h => exact f h
  | inr h => exact _

Mario Carneiro (May 10 2021 at 09:45):

it does the same thing with induction and match tactics, but not term mode match

Kevin Buzzard (May 10 2021 at 09:53):

Oh so you have finished bikeshedding about directory structure? Do you want submissions? Do you want any of https://github.com/kbuzzard/mathlib4_experiments ?

Kevin Buzzard (May 10 2021 at 09:54):

It doesn't compile right now because I was in the middle of nodup when I got called for dinner...

Mario Carneiro (May 10 2021 at 09:54):

All your experiments are belong to us

Kevin Buzzard (May 10 2021 at 09:55):

I nearly have a definition of finset

Kevin Buzzard (May 10 2021 at 09:59):

Stuff in CoreExt is stuff which is in core Lean 3 but which I (or SReichelt) couldn't find in core Lean 4. I work on this stuff on Thursdays on the discord, just to teach myself Lean 4. If there are any particular mathlib files which you want ported then let me know. Right now my goal was to port direct sums (as a random high-up point in the algebra heirarchy) to see if the elaboration issues which Eric was having could be reproduced in Lean 4, but this was just a random goal.

Adam Topaz (May 10 2021 at 18:09):

@Kevin Buzzard Just a quick reminder about this thread:
https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/finiteness/near/206179081

Kevin Buzzard (May 10 2021 at 18:11):

Johannes Hoelzl strongly advocated against doing any refactoring at the same time as porting.

Johan Commelin (May 10 2021 at 18:12):

How much money do we need to buy him back from Apple? :see_no_evil:

Kevin Buzzard (May 10 2021 at 18:18):

Mario Carneiro (May 10 2021 at 20:03):

I implemented the block structuring tactic we talked about a few weeks ago:

example : (0 < 1  2 < 3)  3 < 4 := by
  split
  - split
    - decide
    - decide
  - decide

It's even better than lean 3 blocks because the blocks can be processed independently, so if one branch is incomplete you can also work on the other branch

Kevin Buzzard (May 10 2021 at 20:05):

Well I implemented split ;-)

Mario Carneiro (May 10 2021 at 20:06):

actually this isn't too much harder than split. It's just a small variant on focus with a shorter name

Sebastian Ullrich (May 10 2021 at 20:20):

Mario Carneiro said:

I implemented the block structuring tactic we talked about a few weeks ago:

Nice work! Permission to upstream it?

Mario Carneiro (May 10 2021 at 20:23):

Absolutely

Patrick Massot (May 10 2021 at 20:25):

What about not recompiling a finished block when working on later blocks? (dreaming is still allowed, right?)

Mario Carneiro (May 10 2021 at 20:26):

The lean 4 strategy is to make everything fast enough that this won't bother you :grinning:

Patrick Massot (May 10 2021 at 20:26):

Sounds pretty good!

Mario Carneiro (May 10 2021 at 20:27):

It's seriously impressive to see how fast it can get through Data.Nat.Basic

Mario Carneiro (May 10 2021 at 20:31):

Actually I'm not sure where caching is handled. It might need to be in the C++?

Patrick Massot (May 10 2021 at 20:32):

Whatever the speed, we'll always manage to come up with slow tactics. We'll simply become more ambitious and brutal.

Mario Carneiro (May 10 2021 at 20:32):

Yes, that's a law of nature

Jannis Limperg (May 10 2021 at 20:47):

I'm working on really slow automation already. :)

Kevin Buzzard (May 10 2021 at 20:47):

Chris Hughes for his MSc thesis implemented a tactic which in the worst case scenario performs worse than any finite tower of exponentials

Daniel Fabian (May 10 2021 at 20:58):

@Kevin Buzzard do you know how that compares to ackermann? IIRC ackermann is also worse than that.

Mario Carneiro (May 10 2021 at 21:29):

Ackermann is faster growing. power tower is something like A(n, 4) compared to A(n, n) which is the ackermann function

Mario Carneiro (May 10 2021 at 21:30):

I don't know any algorithm that needs ackermann running time though

Kevin Buzzard (May 10 2021 at 21:43):

@Chris Hughes do you know the actual worst case running time of that word problem algorithm?

Chris Hughes (May 10 2021 at 21:48):

I think it is just power tower and it never gets worse than that. I don't recall where I read that though.

Daniel Fabian (May 10 2021 at 21:50):

@Mario Carneiro neither do I ;-). But I did come across the inverse ackermann as a complexity once. Which was quite funny. Also they casually mentioned that it's quite fair to consider it constant time, because for any input that fits into the universe you'd be limited to 4

Kevin Buzzard (May 10 2021 at 21:57):

Right -- my son has certainly shown me algorithms with inverse Ackermann running time -- I thought these were normal in your world.

Joe Hendrix (May 10 2021 at 22:11):

@Mario Carneiro I've only skimmed it myself, but found this paper interesting. It provides a survey of different complexity classes and problems more complex than elementary. It looks like Petri-nets and vector addition containment problems naturally give rise to problems with an Ackermann running time.

Marc Huisinga (May 10 2021 at 22:13):

Kevin Buzzard said:

Right -- my son has certainly shown me algorithms with inverse Ackermann running time -- I thought these were normal in your world.

Union-find is fairly common

Mac (May 10 2021 at 23:41):

Mario Carneiro said:

I implemented the block structuring tactic we talked about a few weeks ago:

example : (0 < 1  2 < 3)  3 < 4 := by
  split
  - split
    - decide
    - decide
  - decide

It's even better than lean 3 blocks because the blocks can be processed independently, so if one branch is incomplete you can also work on the other branch

Any way to perform both splits at once? For example, the following:

example : (0 < 1  2 < 3)  3 < 4 := by
  split 3
  - decide
  - decide
  - decide

Johan Commelin (May 11 2021 at 04:04):

@Mac What should split 3 do with a ∧ b ∧ c ∧ d ?

Daniel Fabian (May 11 2021 at 04:16):

@Kevin Buzzard, @Marc Huisinga I was talking of union find, yes. You get the inverse ackermann by path compression or path halving, otherwise it's log. But that's also the only one I'm aware of. So I wouldn't say it's exactly common ;-).

Mac (May 11 2021 at 04:46):

Johan Commelin said:

Mac What should split 3 do with a ∧ b ∧ c ∧ d ?

The idea was that split n would preform n-1 splits, creating n goals. So split 3 on a ∧ b ∧ c ∧ d would produce goals fora, b, and c ∧ d (as is right-associative).

Mac (May 11 2021 at 04:49):

split 4 on a ∧ b ∧ c ∧ d would produce 4 goals (a, b, c, and d).

Johan Commelin (May 11 2021 at 04:51):

I think rcases is better for such things, because you can give it the exact pattern along which you want to split.

Mac (May 11 2021 at 04:53):

Johan Commelin I mean, by that logic, is there really a need for split at all then? I think the idea here is to avoid the need to provide such patterns.

Johan Commelin (May 11 2021 at 05:33):

What does split 4 do to (a ∧ b ∧ c) ∧ (d ∧ (e ∧ f) ∧ g)?

Johan Commelin (May 11 2021 at 05:34):

But I guess simp only [and_assoc] and then split n would give useful behaviour.

Johan Commelin (May 11 2021 at 05:34):

But if the parens are all over the place, I think rcases is a lot easier to wrap my head around. (At least for me.)

Mac (May 11 2021 at 07:57):

Johan Commelin said:

But if the parens are all over the place, I think rcases is a lot easier to wrap my head around. (At least for me.)

That's fair. However, for the normal form (where parens are not all over the place), I certainly think split n would be a useful tool.

François G. Dorais (May 11 2021 at 11:46):

I think it's time for a Mathlib 4 topic stream? (@Mario Carneiro ?) This thread is getting confusing.

François G. Dorais (May 11 2021 at 11:47):

@Johan Commelin The split tactic applies to any inductive with one constructor. And is one of those but the others aren't associative, in general.

Chris Hughes (May 11 2021 at 12:22):

Maybe split_and is a better tactic, so you don't have to count how many ands you have and you don't split other inductives.

Scott Morrison (May 11 2021 at 13:04):

I have just created a separate stream titled "mathlib4". Please subscribe yourselves if interested!

Scott Morrison (May 11 2021 at 13:08):

I propose not moving any historical topics from this stream to that, but that we try to have future discussions about porting efforts over there.

Mario Carneiro (May 11 2021 at 15:28):

Eh, why is it a non-default stream? That will make it almost private for folks who missed the notification

Kevin Buzzard (May 11 2021 at 16:57):

oh I wondered why it hadn't shown up. Why not kill it and create it again?

Mario Carneiro (May 11 2021 at 17:00):

I don't think zulip lets you do that. I've signed up everyone from the lean 4 stream, hopefully I didn't just ping everyone

Sebastian Ullrich (May 11 2021 at 17:22):

I'm afraid you did, including all of our course students :laughing:

Scott Morrison (May 11 2021 at 22:46):

It's now a default stream for new users.


Last updated: Dec 20 2023 at 11:08 UTC