Zulip Chat Archive

Stream: lean4

Topic: leanpkg++ idea [RFC]


Mac (May 19 2021 at 19:40):

One of the big planned steps forward for Lean is a fully feature package mean (a leanpkg++). Currently, the idea seems to be to model Rust's Cargo, hence the current similar TOML manifests.

However, unlike Rust, Lean is both a bootstrapped and interpretable language. So, I propose we instead follow the model of other interpreted languages (like Ruby) and dispense with the TOML manifests and have a project's build/packaging scripts written in Lean (like Ruby's Rakefile/Gemfile).

For example, a simple build script based off the manifest currently produced by leanpkg init could look like this:

import Leanpkg
open Leanpkg

def manifest : Manifest := {
  name := "Leanstrap"
  version := "0.1"
  leanVersion := "leanprover/lean4:nightly-2021-05-18",
  dependencies := [
    {
      name := "mathlib4",
      src := Source.git
        (url := "https://github.com/leanprover-community/mathlib4.git")
        (rev := "e5cd871f3d59d344521f01821a3e48b744c756d9")
        (branch := "master")
    }
  ]
}

#eval build manifest

For synergy with the leanpkg command, this could be done more intelligently by tagging various defs with attributes that leanpkg picks up on and uses. For example, say we had the following Leanpkg.lean:

@[manifest]
def manifest : Manifest := {
    name := "Leanstrap"
    version := "0.1"
    leanVersion := "leanprover/lean4:nightly-2021-05-18",
  }

The leanpkg command could read this file and build the project using the tagged @[manifest] definition as the manifest. Ideally, it would also support things like custom build, configure, and test hooks, and even support for running custom commands.

What do people think of this idea?

Daniel Fabian (May 19 2021 at 19:54):

reminds me an awful lot of FAKE F# make. It works there really well there, but you need quite a lot of library for it to be delightful. PSake (PowerShell make) is not a great example of the same idea.

Joe Hendrix (May 19 2021 at 20:05):

I think the main risk of using a build script rather than a file is essentially tied to the "expression problem".

There are potentially many things want to leverage in a build system beyond just build the software. Even for building, you want information about dependencies so you can compute dependencies up front, resolve version constraints, create "freeze" files, etc. There's also non-build tasks like building documentation, running tests, etc, creating descriptions and license information for a package library such as crates.io.

There is no reason all this couldn't be done with a build script, but I think a declarative model that describes data rather than code is more easily extensible to support these other uses of build information.

Joe Hendrix (May 19 2021 at 20:07):

I should add I'm currently looking into how to modify cargo itself to build non-rust packages. Making this robust will be a lot of work, but would have the long term benefit of not requiring continued maintenance of a package system and would make it easier to depend on Rust packages in Lean (and vice-versa).

Mario Carneiro (May 19 2021 at 20:11):

I'm a fan of this approach. Not all packages are simple to build, and even cargo supports this kind of scripting with build.rs. I'm sympathetic to the idea of having a declarative manifest like a toml file, but I would like to know more specifically what the value proposition is, and in particular whether it would suffice to have a manifest object in the lean file like the one pictured, which leanpkg can print out as TOML or JSON if desired by some external tool.

Daniel Fabian (May 19 2021 at 20:11):

that's what FAKE does, they use F# types to describe data and then F# code to implement it. It work there... but the moment you need to call a shell, you really need a decent base class library or it gets painful.

Mario Carneiro (May 19 2021 at 20:11):

I think whatever library we need to make this work should exist anyway, so I don't consider that a major issue

Mac (May 19 2021 at 20:19):

Joe Hendrix said:

There are potentially many things want to leverage in a build system beyond just build the software. Even for building, you want information about dependencies so you can compute dependencies up front, resolve version constraints, create "freeze" files, etc. There's also non-build tasks like building documentation, running tests, etc, creating descriptions and license information for a package library such as crates.io.

Daniel Fabian said:

reminds me an awful lot of FAKE F# make. It works there really well there, but you need quite a lot of library for it to be delightful.

I think these both two points tie together quite nicely, in order for this approach to work smoothly your will need a robust library and packaging tool (leanpkg). The library needs to provide ways to neatly express a lot of declarative things (like a manifest, or more modularly, dependencies, version constraints etc.) and leanpkg needs to be able to read the script and present this information in manner that is easily consumed by humans and other tools. With the current tiny leanpkg this could be something like a print-manifest command that prints the Lean manifest as a TOML file (which I think is quite feasible). As more hooks and such are added, this will of course need to grow to become more complex.

However, I think in the long run this offers a lot flexible, and, if done well, can as @Daniel Fabian put it "be delightful". and I think Lean's extensive metaprogramming facilities make this a feasible goal. I would also certainly advise following existing examples of this (like Ruby's rake and F*'s fake) along with features from state-of-art package managers (like cargo and npm).

Mac (May 19 2021 at 20:19):

Mario Carneiro said:

and in particular whether it would suffice to have a manifest object in the lean file like the one pictured, which leanpkg can print out as TOML or JSON if desired by some external tool.

It appears you beat me to the typing!

Mario Carneiro (May 19 2021 at 20:20):

There is no reason all this couldn't be done with a build script, but I think a declarative model that describes data rather than code is more easily extensible to support these other uses of build information.

What if the Manifest type contained a bunch of thunks for getting the dependencies, running the build, enumerating the tests and so on, and different invocations of leanpkg will compute the manifest, run the thunks relevant to the current operation, and then perform the operation?

Daniel Fabian (May 19 2021 at 20:45):

In general, I think, what the languages like F# or lean have that can make this potentially work, is a strong type system. I.e. you could use a library, leanpackage to provide all the necessary functionality, and types, such that a build script has a minimal amount of structure to it. In case of FAKE they are tasks with dependencies. In the lean case it would probably be that plus some package metadata, etc. Then leanpackage would execute the script and be allowed to assume something about the structure. Almost like having an interface for the build scripts.

What definitely does work really well in the FAKE case is the library of helpers built for FAKE is quite impressive, from zip over ftp to nuget, etc. everything is around.

It's a radically different approach than trying to adopt cargo to lean. @Joe Hendrix how does that resonate with you?

Daniel Fabian (May 19 2021 at 20:46):

msbuild, the c# project file format is technically also a build script, just that it's in xml and painful.

Joe Hendrix (May 19 2021 at 20:55):

@Daniel Fabian I'm still a bit partial to the Cargo approach still but that's mostly because I like the idea of reusing the work invested in another package manager or build system.

If leanpkg can just take this and print out a Cargo.toml file (or some other format), then this sounds more like a EDSL for building TOML/JSON files for other build systems. I'm not opposed to it, but hope the files are small enough for this not be necessary.

Daniel Fabian (May 19 2021 at 20:56):

Fair enough

Daniel Fabian (May 19 2021 at 20:57):

Also for the record fake is really a build system and not a package manager.

Joe Hendrix (May 19 2021 at 21:00):

Incidentally, I just mentioned the idea of adding Lean support to cargo to the cargo developers, and gotten some push back but also some buy in to the concept.
I don't have a strong technical preference , but am exploring cargo integration for a couple questions:

  1. I think integration with another package management tool may make it easier for people to try out Lean.
  2. I think this could reduce the maintenance involved so folks in the Lean community can focus on other problems.

Daniel Fabian (May 19 2021 at 21:04):

I wonder if we need to separate our building from versioning. Whilst they are related, they are not the same. And cargo is really about the latter afaict whereas fake is about the former.

Ryan Lahfa (May 19 2021 at 21:08):

FWIW, Zig language has a very interesting build system, e.g. https://github.com/ziglang/zig/blob/master/build.zig where you can write in Zig itself with some build API which works pretty fine for more complex projects

Ryan Lahfa (May 19 2021 at 21:10):

BTW, is there a good reason to not consider to just use nix as a backend and have whatever nice frontend behind leanpkg so that the advanced / power user might switch to the full expressiveness of nix whenever needed?

Andrew Ashworth (May 19 2021 at 21:10):

Nix doesn't work on windows

Ryan Lahfa (May 19 2021 at 21:10):

Good point

Andrew Ashworth (May 19 2021 at 21:11):

if you want an industrial strength cross-language reproducible build system, the next best is Bazel

Ryan Lahfa (May 19 2021 at 21:11):

Isn't Bazel being phased out at Google?

Andrew Ashworth (May 19 2021 at 21:11):

Re: separating build from versioning - when the build system is separate from the versioning system you get dependency management using git submodule and curl https://www.github.com/thezipfile.zip.

Andrew Ashworth (May 19 2021 at 21:12):

Bazel is being phased out? I didn't hear that

Ryan Lahfa (May 19 2021 at 21:13):

Andrew Ashworth said:

Bazel is being phased out? I didn't hear that

I surinterpreted https://github.com/kubernetes/kubernetes/issues/88553

Andrew Ashworth (May 19 2021 at 21:14):

Ah, well, reading that, it looks like just k8s is moving, because they have to maintain two build systems in parallel, and the go package builder is finally supporting remote build and build artifact caching natively

Iocta (May 19 2021 at 21:56):

Python's package-distribution problems are mostly caused by having a dynamic, interpreted setup.py rather than a static file format with manifest files specifying fixed-output deps. Cargo and Poetry is the right model, not custom dynamic files.

Iocta (May 19 2021 at 21:59):

Nix can use those formats via naersk and poetry2nix. I tried to get nix to handle lean manifests a while ago and I think it mostly worked, but I don't remember the details atm.

Daniel Fabian (May 19 2021 at 23:13):

Ryan Lahfa said:

FWIW, Zig language has a very interesting build system, e.g. https://github.com/ziglang/zig/blob/master/build.zig where you can write in Zig itself with some build API which works pretty fine for more complex projects

yeah, at a first glance, that's more or less what FAKE does: https://github.com/fsharp/FAKE/blob/release/next/build.fsx

Mac (May 20 2021 at 01:10):

Iocta said:

Python's package-distribution problems are mostly caused by having a dynamic, interpreted setup.py rather than a static file format with manifest files specifying fixed-output deps. Cargo and Poetry is the right model, not custom dynamic files.

I think Lean is expressive enough that you can support both though. For most simple configurations there can be a one-to-one matching with a static file format, but for more complex use cases (where, for example, custom run tasks are desired) the more powerful features can be used as well.

Daniel Fabian (May 21 2021 at 07:56):

After some feasibility investigations and conversations with some other open-source package managers, we'd like to give the FAKE-style package manager a chance (see: https://github.com/fsharp/FAKE). Here's some of the reasoning:

  • built in lean: this makes outside contributions quite a lot easier
  • small core: at the core, we only need notions of manifest, task and dependency.
  • community friendly: everything outside of the core is extremely hackable, and the small core is very stable.
  • extremely expressive: given that the building DSL is embedded in Lean as opposed to being external, you can always fall back to the full expressivity of lean. So you should never get stuck with the language not being expressive enough for your project.
  • it's a fine starting point for production use of lean 4: Since this is programming a programming project, it should be fairly pain-free to code up, since we won't find missing tactics or similar blocking progress. And also, since lean 4 itself is written in lean 4 on the programming side, I think, we are in a good place to start building software.

I'd like for this project to be mostly community-driven. The way how it works for FAKE, and it work really well in that community, is that beyond the core, there are tons of helpers for various things you might want to do. They are things like:

  • execute the compiler
  • build documentation
  • create a nuget package
  • upload to nuget
  • download from nuget
  • create a zip file
  • upload to ftp
  • and many more

The helpers are fairly volatile and very actively developed and it's easy to accept contributions so the build system can evolve quickly.

Patrick Massot (May 21 2021 at 09:39):

Thank you very much for working on this!

Sebastian Ullrich (May 21 2021 at 09:44):

@Daniel Fabian I'm curious, could you describe how FAKE, and/or your plans for a Lean version of it, figure into the build system dimensions presented in https://www.cambridge.org/core/journals/journal-of-functional-programming/article/build-systems-a-la-carte-theory-and-practice/097CE52C750E69BD16B78C318754C7A4, esp. tables 1 & 2?

Sebastian Ullrich (May 21 2021 at 09:44):

(The paper is very well worth a read in general for anyone interested in the topic)

Daniel Fabian (May 21 2021 at 09:47):

@Sebastian Ullrich I know that paper, it's really nice. Basically the idea is to have just the bare minimum built-in to be able to execute code in dependency order. A task or target is a pure side-effect (MetaM Unit) or something.

Then you have a notion of dependencies. Finally you run one target and the system will eval in dependency order.

Daniel Fabian (May 21 2021 at 09:47):

#r "paket:
nuget Fake.IO.FileSystem
nuget Fake.Core.Target //"
#load "./.fake/build.fsx/intellisense.fsx"

open Fake.Core
open Fake.IO

// Properties
let buildDir = "./build/"

// Targets
Target.create "Clean" (fun _ ->
    Shell.cleanDir buildDir
)

Target.create "Default" (fun _ ->
    Trace.trace "Hello World from FAKE"
)

// Dependencies
open Fake.Core.TargetOperators

"Clean"
    ==> "Default"

// start build
Target.runOrDefault "Default"

Daniel Fabian (May 21 2021 at 09:48):

this would be an example. Your Target.create creates the pure side-effects. Then the "Clean" ==> "Default" is a really simple DSL for defining the dependencies between targets. And finally you runOrDefault the final target.

Sebastian Ullrich (May 21 2021 at 09:54):

Then how will we prevent rebuilds? Is it every task's own responsibility to do so?

Daniel Fabian (May 21 2021 at 09:56):

It's the helper's responsibility. In FAKE, you might just call msbuild which has a notion of not rebuilding everything every time. We might have a nix task to get it off the ground quickly, at a later point in time, we could have one that calls the compiler directly.

Daniel Fabian (May 21 2021 at 09:58):

btw, as you saw in this particular script, they explicitly cleanDir so as to rebuild everything.

Daniel Fabian (May 21 2021 at 09:58):

If you want it incremental, we can have a different helper for that kind of workflow.

Sebastian Ullrich (May 21 2021 at 10:03):

Ok, then I got the wrong idea :) . I assumed building the actual .lean files would be its prime and initial task.

David Renshaw (May 21 2021 at 12:32):

I'm looking at the examples at https://fake.build/, and they are triggering painful memories of me trying and failing to learn how to use tools like sbt and grunt in the past. (I get the impression that FAKE has a similar philosophy to those tools.)

I'm worried that if we had a FAKE-like manifest DSL then every project would end up developing its own little dialect of the DSL, and that developers would be burdened with the need to learn these dialects.

Johan Commelin (May 21 2021 at 12:39):

I think there was some plan to have a toml -> manifest tool, so that 95% of the packages would just use a .toml file.

Scott Morrison (May 21 2021 at 12:42):

sbt was (is?) a nightmare. So flexible that every project could do its own little interpretative dance about what a build process should do.

Gabriel Ebner (May 21 2021 at 12:57):

In my experience, sbt is completely ok as long you as you don't do anything fancy. IMHO its biggest shortcoming is that it tries to use magic to make things look simpler instead of using normal Scala code. (E.g you have top-level settings in addition to settings(), you have the whole key horror instead of using function arguments, etc. And since it's all custom-made you can't just use the REPL or println to debug it.)

I believe mill is the modern replacement. https://com-lihaoyi.github.io/mill/mill/

Daniel Fabian (May 21 2021 at 13:18):

From my experience, we didn't need to do anything beyond what the helpers could do for 95% of tasks and we would have 2-3 project specific functions.

Jannis Limperg (May 21 2021 at 14:23):

I would caution against NIH. My impression is that all these language-specific build systems inevitably blossom into a major time sink and possibly a pile of hacks. By a quick count, FAKE has some 60kloc, Cabal (for Haskell) 40kloc, sbt 70kloc (including tests). Not saying it can't be done, but if an off-the-shelf build system and package manager would do, what would we lose?

Daniel Fabian (May 21 2021 at 14:40):

We wouldn't. In fact we'd love that, but so far we haven't found one. I'd love nix for instance, but giving up on windows entirely feels like a bad trade off. Do you have an off the shelf one in mind?

Jannis Limperg (May 21 2021 at 15:20):

I don't have a great overview. I would have considered, off the top of my hat, make, Bazel and Shake. For package management, Opam. But I can see downsides to any of these options (mostly that they would rope other languages into the toolchain).

Gabriel Ebner (May 21 2021 at 15:21):

OMG, isn't opam the PSPACE-complete package manager? That doesn't really scream well designed to me.

Gabriel Ebner (May 21 2021 at 15:23):

I'd second looking at shake. It's only 6kLOC, which should be easy enough to RIIL.

Sebastian Ullrich (May 21 2021 at 15:25):

  • make: what we do right now, almost as bad as Nix on Windows
  • Bazel: no support for dynamic dependencies, so not usable for building individual .lean files

Jannis Limperg (May 21 2021 at 15:25):

Gabriel Ebner said:

OMG, isn't opam the PSPACE-complete package manager? That doesn't really scream well designed to me.

Afaik dependency resolution is a hard problem if you do it in sufficient generality. In any case, I've never had problems with Opam's performance (quite the opposite). Coq uses it as well.

Sebastian Ullrich (May 21 2021 at 15:26):

Shake is also the main focus of the a la carte paper :)

Daniel Fabian (May 21 2021 at 16:26):

@Jannis Limperg and @David Renshaw I wouldn't worry about 60kloc that much, to be honest. The core and the DSL are small. It pretty much is the notion of tasks, dependencies and pretty-printing.

The 60kloc would be coming mostly from the various helpers I'd consider them more of a sign that the library is very hackable and it's easy to upstream rather than that it is complicated.

It did have a level of features, that most of the stuff one might typically want to do was available as helpers and you didn't go and implement them.

And as for using off-the-shelf systems there is also the requirement that it ideally doesn't take with itself a huge external dependency. So anything .net or jvm based for instance isn't ideal (unless we'd be targetting those VMs and have that dependency anyway).

From a quick look it looks like shake might take haskell with it... For me, that'd almost be enough of a reason to rewrite it in lean.

Daniel Fabian (May 21 2021 at 16:27):

oh an opam also doesn't officially support windows.

Jannis Limperg (May 21 2021 at 17:21):

Daniel Fabian said:

oh an opam also doesn't officially support windows.

Right. They're planning to add support soon (TM) at least. In any case, I don't want to advocate for any particular solution; I don't have strong opinions on available build systems or package managers. I just wanted to point out that it might be prudent to spend some time thoroughly researching the available options before embarking on a large project.

The 60kloc would be coming mostly from the various helpers I'd consider them more of a sign that the library is very hackable and it's easy to upstream rather than that it is complicated.

If you're planning to implement only a small core, say 10kloc, that just means other people have to implement the other 50kloc (or perhaps 30kloc; maybe some of these helpers are very special-purpose). There's a bunch of inescapable complexity here and I suspect many details of the design will not be immediately clear. There's also the question of documentation, both internal and user-facing. Existing tools have solved a lot of little problems that, imo, we shouldn't solve again if it can be avoided. But I'm only speaking in generalities here; I don't have any privileged information to offer, so I'll shut up for now. (FWIW, reimplementing Shake in Lean sounds reasonable to me, then you at least have a nice design good to go.)

Chris B (May 21 2021 at 18:55):

Sebastian Ullrich said:

  • make: what we do right now, almost as bad as Nix on Windows
  • Bazel: no support for dynamic dependencies, so not usable for building individual .lean files

I have no idea how far along it is, but some people from the nix community including Eelco Dolstra are developing a build/config language called nickel. It might be worth a look.

Daniel Fabian (May 21 2021 at 19:04):

I'm studying the paper right now - last I saw it was the non-extended version. Maybe there's a quick win for us from it.

Mac (May 21 2021 at 19:27):

David Renshaw said:

I'm worried that if we had a FAKE-like manifest DSL then every project would end up developing its own little dialect of the DSL, and that developers would be burdened with the need to learn these dialects.

I mean this already going to be part of the the buy-in of Lean. Large packages in Lean are likely to define all kinds of DSLs using leans metaprogramming features to represent concepts within them. So I think getting a feel for the DSLs of a given larger project/package is inevitable part of using Lean.

However, for many smaller, les complex packages, I imagine the .lean packaging file will be much the same as an ordinary .toml config file, just written in Lean. I would think this would be much easier on new comers as instead of having two learn the intricacies of two languages (i.e., the build system language and Lean), you just have to learn one --- Lean.

Daniel Fabian (May 22 2021 at 19:36):

we can also generate a .toml or .json for machine reading purposes. If we want to upload to github or something. JSON or TOML are much less of a dependency than lean if all you care about is for it to be machine-readable manifests. But having lean be the source of truth is fine.

Mac (May 29 2021 at 13:13):

Fyi, I am currently working on prototyping a new Leanpkg with this design over at https://github.com/tydeu/lean4-leanpkg2. If anyone is interested in how things are going, take a look! Note that it is in early, early development so expect everything to not always be functional. Also, feel free to shoot me any suggestions, comments, or criticisms you may have.

Max (May 29 2021 at 14:04):

As someone who just discovered Lean yesterday, and mostly familiar with Cargo, I have to say the build-file-in-lean-itself approach absolutely terrifies me. It's less so about Lean syntax, and more about the declarative vs imperative paradigm. The thing I like about Cargo is that you define the project, and not how to build the project - Cargo just figures that out for you. Seeing functions named configure and build feels very wrong (unless in the rare case I need to do something fancy, then sure, allow for custom build scripts).

Max (May 29 2021 at 14:07):

The strong conventions making most configuration implicit and hide-able in Cargo is also a big plus. I would never want to specify where my source files are (src/* just being implicit), or where my binaries should go (bin/*). That should be the responsibility cargo/leanpkg. I just want to code lean, not have to worry about how to structure my project and how to build it.

Max (May 29 2021 at 14:10):

Having less configuration options can be a good thing. It means less things you have to consider, less things to do wrong, and most importantly: All the projects you find on github have the same structure, so picking them up is very easy, and you don't have to investigate where the main.c file is (no, not this one, the real main file, which is specified somewhere in the 500 loc Makefile), which dependencies go where, etc. It has a... normative effect, I would say?
But also, take everything I say with a grain of salt, since I am a newcomer and a dummy :).

Max (May 29 2021 at 14:30):

But either way I very much appreciate any effort into making a delightful package/build system. Glad it's seeing focus :)

Mac (May 29 2021 at 14:38):

Max said:

The thing I like about Cargo is that you define the project, and not how to build the project - Cargo just figures that out for you. Seeing functions named configure and build feels very wrong (unless in the rare case I need to do something fancy, then sure, allow for custom build scripts).

You should be happy to learn that that is the goal! As this evolves basic projects should only need to have a declarative manifest and nothing else. Only when custom build scripts are needed will functions like build and configure be used.

Mac (May 29 2021 at 14:45):

Max said:

Having less configuration options can be a good thing.

I think this is one of the great schism in computer science. Some people like opinionated systems because it enables uniformity in interaction. Others like highly unopinionated systems because it allows to follow their own style (which they may themselves be highly opinionated about). I fall in the later camp. I want the tools I use to allow my to structure things how I want and express things how I want. I would note that this not an uncommon desire in computer science and can be seen to tie into Postel's Law of UX -- i.e., being flexible in what a tool accepts.

On the other hand, the planned simple defaults I mentioned, should make most setups rather the same unless users go out of there way to change things. So, hopefully, there won't be to too much diversity in eventual setups. After all, nothing in C's build system requires source folders to be named src or header folders to be named include and yet the community has mostly come around to that practice.

Mac (May 29 2021 at 14:50):

Mac said:

As this evolves basic projects should only need to have a declarative manifest and nothing else. Only when custom build scripts are needed will functions like build and configure be used.

I would also like to re-emphasize this point: this stage of the prototype is simply meant to get a functional examples of Lean packages building through Lean file. It is not meant to be example of the final API / intended structure of said build / configuration files.

Patrick Massot (May 29 2021 at 14:50):

We certainly hope that most projects will need only a the default setup. But we already know we'll have at least one huge project with a tricky setup (mathlib).

Daniel Fabian (May 29 2021 at 15:17):

@Patrick Massot, I'm actually hoping 2... Over time, having the lean compiler use the new system would be a nice goal as it would drive requirements and validate the design.

Max (May 29 2021 at 15:50):

Patrick Massot said:

We certainly hope that most projects will need only a the default setup. But we already know we'll have at least one huge project with a tricky setup (mathlib).

Wouldn't the goal be to have each theory of the mathlib be a separate package, so to avoid an enormous mono-package? When I need something from the mathlib, my intuition says that I would add a dependency a la vector_fields = { version = "1.0", in = "mathlib/linalg" }, with in being some way to navigate around a directory of packages.
Either way, I'm probably way underqualified for this. All I can offer is the newcomer perspective. If that stops being helpful, do tell me.

Daniel Fabian (May 29 2021 at 16:11):

@Max the splitting of mathlib into a lot of small packages is something that has been discussed multiple times. I think, so far, there hasn't been a clear path how to achieve it, though. From what I understand (@Mario Carneiro understands the problem space much more deeply than I do), mathlib seems to have such a complex dependency graph, that taking even a small part of mathematics transitively pulls in a huge part of mathlib.

Scott Morrison (May 30 2021 at 00:39):

For context, here is the entire mathlib import graph as it stands today. mathlib-import-graph.pdf

It is unreadably dense, but gives some sense. If anyone is interested in seeing subsets (e.g. the imports required for any particular piece of mathematics) just ping me and I'll generate the graph.

It is _extremely_ crosslinked, despite quite a bit of effort of the years in paring things down.

Scott Morrison (May 30 2021 at 00:41):

Alternatively if anyone has a proposal for a particular subset of mathematics that could be split into a separate repo, I'm very happy to go through it in detail with them to see what is possible and what work would be required.

Scott Morrison (May 30 2021 at 00:50):

(import graphs in machine readable formats if anyone wants to explore: mathlib-import-graph.dot mathlib-import-graph.gml)

Sebastian Ullrich (May 30 2021 at 08:05):

@Scott Morrison Could you do the same graph with all nodes of the same top-level directory (e.g. linear_algebra) contracted into one node?

Scott Morrison (May 30 2021 at 10:25):

@Sebastian Ullrich: Screen-Shot-2021-05-30-at-8.24.55-pm.png

Scott Morrison (May 30 2021 at 10:26):

It is far from a DAG after this contraction, so I'm having trouble thinking of a way to draw it usefully!

Scott Morrison (May 30 2021 at 10:28):

Maybe this is slightly better: Screen-Shot-2021-05-30-at-8.27.11-pm.png

Nevertheless there are many double-headed arrows. linear_algebra, ring_theory, and algebra are, I guess, quite badly intermixed by now.

Sebastian Ullrich (May 30 2021 at 10:30):

Nice :) . I think this graph gives a much better picture for how (in)feasible partitioning (current) mathlib into packages is

Sebastian Ullrich (May 30 2021 at 10:31):

Could you post the dot file as well so I can throw it into xdot?

Scott Morrison (May 30 2021 at 10:32):

tl.dot

Scott Morrison (May 30 2021 at 10:32):

(I did the graph munging in mathematica, so I can only hope that is a correctly formed .dot file.)

Scott Morrison (May 30 2021 at 10:34):

I think it would not be insane to aim for a bit more layering, with data/logic/order/tactic as a low level layer (i.e., not importing stuff from higher layers), then to have algebra/topology/category_theory above that, then everything else...

Scott Morrison (May 30 2021 at 10:35):

It's not very sensible that all the theory about multivariable polynomials is in data...

Scott Morrison (May 30 2021 at 10:37):

I would suggest basically emptying data of anything that a mathematician would recognise. :-) Probably there shouldn't be anything in there that a mathematician would recognise as "needing proof".

Scott Morrison (May 30 2021 at 10:38):

If anyone is interested / game to review PRs that do this, perhaps we should start a topic in another stream to do some planning.

Joe Hendrix (May 30 2021 at 16:05):

Patrick Massot said:

But we already know we'll have at least one huge project with a tricky setup (mathlib).

Could you say a bit more about how mathlib setup is tricky? i recognize it is big and binary downloads are important. Are there other particularly tricky things that come to mind?

Alex J. Best (May 30 2021 at 18:31):

@Joe Hendrix one of the issues is the problem of needing tactics to be compiled before use in mathlib proper for efficiency (but of course the tactics depend on some math), see some of the comments in the mathlib4 thread, there were different solutions proposed:
Mario Carneiro said:

Additionally, as I've mentioned before, there is a second mathlib spinoff project, that I'm calling the mathlib prelude. This will collect all the tactics used by mathlib proper such as ring, linarith and the myriad small tactics. Due to lean 4's architecture, it is necessary for these tactics to exist in a separate compilation unit so that they can be available in their compiled form in mathlib. (You can use a tactic immediately after its definition but this uses the interpreter and so doesn't benefit from the tremendous work done on the lean 4 compiler.)

Which brings me to the issue: should these two be separate repositories or not? I think that having all of mathlib in one monorepo reduces the maintenance cost of adding features to lean significantly, and I think most would agree that making even a comparatively simple change in lean core today is higher friction than adding to mathlib, since at the very least a new version of lean containing the modification has to be released and mathlib updated for the new version, both things that mathlib changes don't have to worry about.

A natural solution would be to have mathlib and the prelude in two folders of one repo. But I don't know if lean 4 (or more accurately leanpkg) can currently handle packages that exist in subfolders of a repo.

Daniel Fabian (May 30 2021 at 18:37):

I think it's important to gather requirements as these, i.e. subfolder as packages. Since the requirements need to drive our development of the new leanpkg.

Mac (Jun 07 2021 at 10:07):

Hey, I'm back again to give a status report. The leanpkg++ project has been renamed Lake (aka Lean Make) with the repository for the prototype now located here: https://github.com/tydeu/lean4-lake. It should now be at feature parity with the existing leanpkg, so feel free to take it for a spin. If you find something is broken/missing, please let me know!

Max (Jun 07 2021 at 10:19):

I like the name! And then call package repositories oceans or something

Max (Jun 07 2021 at 10:20):

Hm, do you need to specify lake build bin? Why not just let it be inferred from whether we have a src/Main.lean or a src/Lib.lean by convention?

Jannis Limperg (Jun 07 2021 at 10:37):

I'm getting this from lake build:

package.lean:1:0: error: unknown package 'Init'
You might need to open '/home/jannis/uni/aesop' as a workspace in your editor
package.lean:3:14: error: unknown identifier 'Lake.PackageConfig'
package.lean:3:14: error: unknown constant 'sorryAx'

Code: https://github.com/JLimperg/aesop/tree/lake-experiment. My lake executable is built with the same Lean4 nightly specified in leanpkg.toml and package.lean. I use elan to execute that Lean version.

What am I doing wrong?

Mac (Jun 07 2021 at 20:19):

Jannis Limperg

Oh, sorry. Yeah, I should have probably added some build and run instructions. You need to build Lake with leanpkg build bin LINK_OPTs=-rdynamic on Unix platforms or leanpkg build bin LINK_OPTS="-Wl,--export-all" on Windows. When running the executable, you need to include the build directory of Lake and lib directory of lean (ex. <lean-home>/lib/lean) in LEAN_PATH. Alternatively, you can run it interpreted as the examples do.

I'll update the README soon.

Jannis Limperg (Jun 07 2021 at 21:43):

Yep, that worked and lake build lib successfully built my library.

Xubai Wang (Jul 19 2021 at 08:56):

It's very exciting to find the powerful Lake package manager! Can you provides some examples on how to use custom build scripts (or Makefile like Leanpkg)? In my case, I want to build my Lean code with external Rust HTTP library.

Mac (Jul 20 2021 at 02:48):

@Xubai Wang How exactly do you want to interact with the library? Have you written some C Lean bindings for the Rust library that you want to link to your Lean code that is using extern? Or do you have something else in mind? It would be easier to provide an example if I have some information about your use case. However, if you do already have C bindings, you should just be able add path/to/libBindings.a to the linkArgs for the package (if you are using the latest Lake master).

Mac (Jul 20 2021 at 07:45):

Also, since this topic has ben resuscitated, I should mention that development on Lake has progressed to the point where Lean packages (oleans, static libraries, and binaries) are now built entirely within Lake (no more Makefiles required). The next step is largely UX -- improving Lake's CLI and API to the point where it is as useful as can be to the Lean 4 community.

This means that if anyone has a project with a somewhat complex build setup they want try Lake with, I am more than willing to assist in getting things working (and tweaking Lake as needed). However, do be warned that Lake is very bleeding edge, and has only been minimally tested, so things may blow up from time-to-time. So only experiment with it if you are okay with such dangers.

Xubai Wang (Jul 21 2021 at 10:38):

@Mac Thanks! I managed to build and link external C/C++ code with my Lean code using the Lake.Packager API and linkArgs field. My package.lean file is like this (a simple add function from C):

import Lake.Package
open System

def package : Lake.Packager := fun path args => do
  -- compile c code
  let gccOut <- IO.Process.output {
    cmd := "gcc"
    args := #["-c", "-o", "add.o", "add.c"]
  }
  -- create static lib
  let arOut <- IO.Process.output {
    cmd := "ar"
    args := #["rcs", "libadd.a", "add.o"]
  }
  return {
    name := "simpleAdd"
    version := "0.1"
    -- specify path to the lib
    linkArgs := #["libadd.a"]
  }

I also write some other demos including using a C++ HTTP lib, which can be found here.

Xubai Wang (Jul 21 2021 at 10:47):

By the way, currently Lake works badly with VSCode (unknown package 'Lake'). I also tried adding my LAKE_HOME to the Server Env and Server Env Paths fields of the vscode-lean4 extension, but still no effect.
Is there any current solutions? Or shall we patch the vscode-lean4 extension to fix the problem?

Mac (Jul 21 2021 at 11:14):

Xubai Wang said:

By the way, currently Lake works badly with VSCode (unknown package 'Lake'). I also tried adding my LAKE_HOME to the Server Env and Server Env Paths fields of the vscode-lean4 extension, but still no effect.

You need to add the build directory of Lake to the LEAN_PATH environemt variable (ex. via 'Server Env'). For example, adding the following field to your VSCode settings.json should work:

"lean4.serverEnv": {
  "LEAN_PATH": "path/to/lake/build"
 }

Mac (Jul 21 2021 at 11:41):

Xubai Wang said:

Mac Thanks! I managed to build and link external C/C++ code with my Lean code using the Lake.Packager API and linkArgs field. I also write some other demos including using a C++ HTTP lib, which can be found here.

Cool! I am happy you got Lake to work for you. Would it be okay if I adapt some of your demos for inclusion in Lake''s examples folder?

Mac (Jul 21 2021 at 11:46):

On another note, I feel I should mention that Lake's API is still in the prototyping stages and is liable to change at anytime. I hinted at this before (when I described Lake as "bleeding edge"), but I feel I should reiterate this to make sure no one plans on relying on the API in Lake's master branch to remain stable in any significant way.

Mac (Jul 21 2021 at 11:57):

If one is looking for some level of stability, I would suggest waiting until a stable Lake v2 is released.

Xubai Wang (Jul 21 2021 at 12:24):

Mac said:

Cool! I am happy you got Lake to work for you. Would it be okay if I adapt some of your demos for inclusion in Lake''s examples folder?

Sure! Really looking forward to the stable release.

Yuri de Wit (Dec 10 2021 at 17:30):

FWIW, I am wondering if this could be another point in the design space : Unison definitions are identified by content (https://www.unisonweb.org/docs/tour#%F0%9F%A7%A0-the-big-technical-idea)


Last updated: Dec 20 2023 at 11:08 UTC