Zulip Chat Archive

Stream: lean4

Topic: mathport tests


Scott Morrison (Nov 02 2021 at 05:36):

@Daniel Selsam, in fd2a137 you added the targets

test-leanbin:
    cd Test/ImportLean && rm -rf build && lake build

test-mathbin:
    cd Test/ImportMathbin && rm -rf build && lake build

to the Makefile. While these work (for test-mathbin, I needed to change Test/ImportMathbin to Test/ImportMathlib: I'll PR this shortly), they do something that I think is a bit strange: they check out new copies of mathport and mathlib, rather than using the existing ones.

I think this may be a problem with lake: it's ignoring the src fields of the transitive dependencies. Does that seem correct to you? If so I'll make a lake issue.

Mac (Nov 02 2021 at 07:53):

@Scott Morrison This is expected behavior for Lake. Lake hoists deep dependencies into the top-level lean_packages directory. Note that this is standard behavior for package managers (e.g, Yarn/NPM do the same thing). This means if you build packages from two different root directories (e.g., Test/ImportLean and Test/ImportMathbin) you will get 2 different lean_packages directories each with their own copies.

Mac (Nov 02 2021 at 07:54):

I am also considering doing the same for build directory (i.e., hoisting all dependency builds into the top-level build directory). This would also not be unique to Lake (e.g., Nix does it as well with its results directory).

Sebastian Ullrich (Nov 02 2021 at 08:00):

Yes, using a single build directory is common (though Nix' input-addressed store is different enough that it's not a good comparison I think). It might be painful in the case of Lean when rebuilding e.g. mathlib as a dependency, though I suppose a local dependency to mathlib is not that common.

Sebastian Ullrich (Nov 02 2021 at 08:02):

I'm not sure I understand the source copy issue yet: is it a semantic difference or "merely" a redundant directory copy? Either way, I'm relatively sure Cargo doesn't do that.

Mac (Nov 02 2021 at 08:11):

Sebastian Ullrich said:

I'm not sure I understand the source copy issue yet: is it a semantic difference or "merely" a redundant directory copy? Either way, I'm relatively sure Cargo doesn't do that.

The asserted problem (I believe) is that, in the script above, lake is being run from first inTest/ImportLean and then in Test/Importbin. Thus, it creates a lean_packages directory in each directory with a copy of mathport and mathlib as both packages depend on them instead of a single top-level lean_packages directory with said dependencies.

Sebastian Ullrich (Nov 02 2021 at 08:13):

So there should be no semantic difference to not copying at least

Mac (Nov 02 2021 at 08:15):

Sebastian Ullrich I'm sorry, but I don't know what you mean by "no semantic difference to not copying"?

Sebastian Ullrich (Nov 02 2021 at 08:18):

I mean, an alternative implementation that does not copy the source directory but builds into the target directory straight from the original source is perfectly imaginable, no? But the build result should be the same in both implementations.

Mac (Nov 02 2021 at 08:22):

@Sebastian Ullrich this is about remote dependencies, not local dependencies. There is no source directory until the dependency is resolved. The question is whether that dependency should be resolved into the lean_packages directory of the package itself or its dependent package (i.e., the top-level package). As far as I am aware, most package managers resolve it into the packages folder of the top-level package (for de-duplication reasons). I know Yarn and NPM do this, and I think Cargo does too?

Sebastian Ullrich (Nov 02 2021 at 08:23):

I see, I misunderstood Scott's src reference then

Sebastian Ullrich (Nov 02 2021 at 08:25):

Cargo has a user-local cache of remote Git repos for even more deduplication

Mac (Nov 02 2021 at 08:26):

@Sebastian Ullrich how does that work? What if two packages depend on the same Git repo but different tags/branches?

Sebastian Ullrich (Nov 02 2021 at 08:28):

It looks like it creates a separate <crate>-<commit> directory in the cache for each checkout

Mac (Nov 02 2021 at 08:29):

ah, so the cache works at a commit level not a repo level

Mac (Nov 02 2021 at 08:29):

we could do that too as well

Mac (Nov 02 2021 at 08:30):

(i.e., stick the cache in some specific subdirectory of the toolchain)

Sebastian Ullrich (Nov 02 2021 at 08:30):

Both, I think (clone vs checkout). There's a bit more information in the docs for this external tool: https://github.com/matthiaskrgr/cargo-cache#show-the-largest-items-in-the-cargo-home

Sebastian Ullrich (Nov 02 2021 at 08:32):

And ~/.cargo/bin is where globally (well, user-locally) installed executables go

Sebastian Ullrich (Nov 02 2021 at 08:34):

Which of course is not at all according to the XDG Base Directory Specification :)

Daniel Selsam (Nov 02 2021 at 13:20):

Scott Morrison said:

I needed to change Test/ImportMathbin to Test/ImportMathlib: I'll PR this shortly)

Thanks. I had only tested it locally for Lean. I hit this issue on my remote machine yesterday when trying to re-run the full pipeline, but I forgot to push the fix.

Daniel Selsam (Nov 02 2021 at 13:24):

Scott Morrison said:

they do something that I think is a bit strange: they check out new copies of mathport and mathlib, rather than using the existing ones.

FWIW I intended this to be the lightest-weight-possible way of using mathbin, so a new package could just depend on it directly on its own. Lake-caching aside, we could change the behavior by removing https://github.com/leanprover/mathport/blob/master/Lib4/leanbin/lakefile.lean#L11 and making it the importing-package's responsibility to add a dependency to their mathport/mathlib.

Mac (Nov 02 2021 at 14:06):

Daniel Selsam said:

we could change the behavior by removing https://github.com/leanprover/mathport/blob/master/Lib4/leanbin/lakefile.lean#L11 and making it the importing-package's responsibility to add a dependency to their mathport/mathlib.

How would that help?

Daniel Selsam (Nov 02 2021 at 16:38):

It might not help, but it would allow the user to make mathport a local dependency instead of a git dependency (without introducing a constraint on the placement of the Lib4 directory).

Mac (Nov 02 2021 at 17:31):

@Daniel Selsam I still don't understand how it being a local dependency would be better. Maybe this would be another good topic for our discussion.

Mac (Nov 06 2021 at 01:23):

@Daniel Selsam, while fixing the build in Lake I think it figured out what you meant by having mathport be a local dependency for Leanbin/Mathbin, etc. I had previously misunderstand what you were saying. It would be a great idea for Leanbin/Mathbin to just refer to the top-level mathport as local dependency.

For example, the dependency could be specified like so:

 {
    name := `mathport
    src := Source.path (FilePath.mk ".." / "..")
 }

A user could still depend on just one of the libraries via the following:

 {
    name := `leanbin
    src := Source.git "https://github.com/leanprover/mathport.git" "master"
    dir := FilePath.mk "Lib4" / "leanbin"
 }

Daniel Selsam (Nov 08 2021 at 15:25):

@Mac Now I am confused :) Since https://github.com/leanprover-community/mathlib4/commit/b0c3952f590d4b2e301d2ffe13bb815856fff1e5 we can get rid of the mathport dependency entirely. I thought the plan was to have a repo for mathbin with a lakefile that downloads the .oleans from a GitHub CI, and that can be depended on trivially by a downstream project. If so, it seems it would be good if that downstream project didn't need to add a not-directly-used mathlib4 dependency just to avoid import errors.

Mac (Nov 08 2021 at 19:59):

@Daniel Selsam ah, if the idea to have a separate repo for mathbin, then sure that works as well (and is probably much better)

Daniel Selsam (Nov 08 2021 at 20:04):

Curiously, I thought this was your idea :) Either way, here is a hacky PoC for leanbin:

Scott Morrison (Nov 09 2021 at 09:22):

Okay, now that lean --tlean doesn't use >64gb, I've got continuous integration for mathport working, and uploading some .tar.gz files. These are currently at https://github.com/semorrison/mathport/releases. This is generated by the CI branch of mathport.

Next steps, I think:

  1. move mathport from dselsam to leanprover-community
  2. tell one or more of the existing hosted runners for mathlib CI that they should also do CI for mathport
  3. currently CI posts the artifacts to a release just called nightly. Presumably this should be nightly-YYYY-MM-DD
  4. @Daniel Selsam's leanbin repository should pick up the artifacts generated by CI.

(1. is necessary for 2.)

Scott Morrison (Nov 09 2021 at 09:23):

Further on point 3. Should CI run nightly? Or on every push to mathport? Or both?

Gabriel Ebner (Nov 09 2021 at 09:44):

AFAICT it pulls the latest mathlib revision, so it makes sense to run it at least nightly. We can always switch to a shorter intervals as the port picks up speed.

Gabriel Ebner (Nov 09 2021 at 09:46):

I think nightly-YYYY-MM-DD (or even nightly-YYYY-MM-DD-HH) is a good idea. We want to be able to look back at historic releases.

Daniel Selsam (Nov 09 2021 at 14:48):

Scott Morrison said:

  1. move mathport from dselsam to leanprover-community

The main mathport is https://github.com/leanprover/mathport (the dselsam one is just a one-off fork from yesterday to experiment with releases). I don't have auth to move it to leanprover-community but I give my blessing to whoever does.

Daniel Selsam (Nov 09 2021 at 15:15):

Scott Morrison said:

  1. Daniel Selsam's leanbin repository should pick up the artifacts generated by CI.

This part should be easy. Lake may be adding relevant features for this soon, but for now we can just tweak https://github.com/dselsam/leanbin/blob/master/lakefile.lean#L4-L5


Last updated: Dec 20 2023 at 11:08 UTC