Zulip Chat Archive

Stream: lean4

Topic: lake new lean-data


Adam Topaz (Jun 02 2023 at 02:42):

The command lake new lean-data gives me a PANIC with a backtrace and the following error

error: no such file or directory (error code: 2)
  file:

I tried this on two different computers with different linux distros. Can someone else reproduce?

Adam Topaz (Jun 02 2023 at 02:43):

This is using Lake version 4.1.0-pre (Lean version 4.0.0-nightly-2023-06-01)

Scott Morrison (Jun 02 2023 at 05:18):

% lake +leanprover/lean4:nightly-2023-05-31 new lean-data
PANIC at Option.get! Init.Data.Option.BasicAux:16:14: value is none
error: no such file or directory (error code: 2)
  file:

on macos.

Scott Morrison (Jun 02 2023 at 05:19):

Looks like anything with a - in the name.

Henrik Böving (Jun 02 2023 at 05:24):

This is definitely a regression though, it used to work with doc-gen4

Mac Malone (Jun 02 2023 at 07:14):

I actually think this is a Lean regression -- not a Lake one -- as lake new lean-data works on the latest Lake master.

Mac Malone (Jun 02 2023 at 07:26):

Yep, Lean regression -- works 04-04, broken on 06-01

Mac Malone (Jun 02 2023 at 07:30):

(deleted)

Mac Malone (Jun 02 2023 at 07:39):

(deleted)

Mac Malone (Jun 02 2023 at 07:51):

testing reveals that 04-11 (edit: messed up my testing) is the first broken release.

Mario Carneiro (Jun 02 2023 at 07:54):

Looking at the code suggests this is the Option.get! that is panicking

Mario Carneiro (Jun 02 2023 at 07:57):

The code seems to be assuming that rootFile has some initial ./ on it; it is lean-data.lean in this example and rootFile.parent returns none

Mario Carneiro (Jun 02 2023 at 07:58):

my guess is that the lean change is doing some additional canonicalization to remove superfluous ./

Mario Carneiro (Jun 02 2023 at 08:07):

Ah! I suspected it might have been this: lean4#2233 and it is. You are calling String.toName on the string "lean-data", which previously generated `lean-data but now produces none (encoded as .anonymous, which is otherwise not a legal output of the function). The function String.toName was broken for actual lean names, which made it impossible to import lean files that start with a digit (such as those in the mathlib archive). It should not be used on things that are not intended to be interpreted via lean name parsing rules (i.e. dot-separated, and quote any weird characters with «»)

Mario Carneiro (Jun 02 2023 at 08:07):

@Mac

Mario Carneiro (Jun 02 2023 at 08:09):

I think the most logical fix in this case is to use Name.mkSimple (aka the coercion) instead of trying to parse it into multiple components

Mac Malone (Jun 02 2023 at 08:11):

@Mario Carneiro Lake accepts multiple components though and will produce a proper directory structure from it. This allows one to create a package like data.json and get the proper folder nesting.

Mario Carneiro (Jun 02 2023 at 08:11):

but does it have to do that in lake new?

Mario Carneiro (Jun 02 2023 at 08:11):

what does it mean to lake new foo.bar

Mac Malone (Jun 02 2023 at 08:13):

it will create a

Mario Carneiro said:

what does it mean to lake new foo.bar

It creates a package foo.bar with its main library file being Foo/Bar.lean

Mac Malone (Jun 02 2023 at 08:18):

However, I guess I can just inline the old algorithm instead of using toName.

Mario Carneiro (Jun 02 2023 at 08:19):

you shouldn't, it was broken

Mac Malone (Jun 02 2023 at 08:19):

@Mario Carneiro what was wrong about it for my use case?

Mario Carneiro (Jun 02 2023 at 08:19):

how about using String.toName and if that fails reverting to mkSimple?

Mario Carneiro (Jun 02 2023 at 08:21):

you are inventing your own name encoding mechanism here, which should stick out as a bad idea. Names can have lots of weird things in them, is this CLI only or will it end up in file names and reparsed by other things leading to inconsistency?

Mac Malone (Jun 02 2023 at 08:22):

The bug you mention in the issue is just about french quotes which should not be used on the command line anyway.

Mario Carneiro (Jun 02 2023 at 08:23):

This kind of disagreement between two parties about how to parse a thing is where tons of bugs / security vulnerabilities come from

Mac Malone (Jun 02 2023 at 08:24):

Mario Carneiro said:

you are inventing your own name encoding mechanism here, which should stick out as a bad idea. Names can have lots of weird things in them, is this CLI only or will it end up in file names and reparsed by other things leading to inconsistency?

The name encoding system is just split the CLI input (which should not have french quotes) on . and then construct each component withNane.str. Since the name will be wrapped in french quotes, this should be a perfectly valid name construction.

Mario Carneiro (Jun 02 2023 at 08:25):

so if I give it «a.b».c then it turns into «a..c and you generate some lean files with broken escapes

Mac Malone (Jun 02 2023 at 08:25):

Yes, if you are for some reason inputting french quotes on the command line.

Mario Carneiro (Jun 02 2023 at 08:26):

FYI I had to write exactly that for mathport

Mac Malone (Jun 02 2023 at 08:26):

Why?

Mario Carneiro (Jun 02 2023 at 08:26):

scripts are often calling CLI utilities with arguments that would be weird to type directly

Mario Carneiro (Jun 02 2023 at 08:27):

the example I just gave isn't even that weird, it is the logical name of a file Foo.Bar/Baz.lean

Mac Malone (Jun 02 2023 at 08:28):

Mario Carneiro said:

the example I just gave isn't even that weird, it is the logical name of a file Foo.Bar/Baz.lean

That is a horribly invalid module path in Lean.

Mario Carneiro (Jun 02 2023 at 08:28):

no it's not

Mac Malone (Jun 02 2023 at 08:28):

Yes, it does not round-trip on modToLeanPath which will break tons of assumptions in the importing procedure.

Mario Carneiro (Jun 02 2023 at 08:28):

maybe lake breaks it but lean is able to handle module names like «Foo.Bar».Baz

Mario Carneiro (Jun 02 2023 at 08:30):

I don't see why it wouldn't roundtrip?

Mac Malone (Jun 02 2023 at 08:30):

Oh my.

Mario Carneiro (Jun 02 2023 at 08:30):

the components of that name are "Foo.Bar", "Baz" so you would get Foo.Bar/Baz.lean out

Mac Malone (Jun 02 2023 at 08:30):

Yeah, I guess that would work with Lean if you use french quotes like that. The UX for that seems miserable though. I can just imagine the Zulip questions when someone forgets the french quotes.

Mario Carneiro (Jun 02 2023 at 08:31):

lake already puts french quotes in most lakefiles IIRC

Mac Malone (Jun 02 2023 at 08:34):

Yes, it escapes all components of the identifiers it generates.

Mario Carneiro (Jun 02 2023 at 08:34):

isn't there a function to do that already?

Mario Carneiro (Jun 02 2023 at 08:35):

if you used it it would probably solve the issue where it french quotes things that don't need it

Mario Carneiro (Jun 02 2023 at 08:36):

Name.toString

Mac Malone (Jun 02 2023 at 08:36):

Mario Carneiro said:

isn't there a function to do that already?

No, the function escapes only identifiers with special characters. Lake has to escape it always to prevent things like lake new if from breaking.

Mario Carneiro (Jun 02 2023 at 08:36):

oh I wrote a version that does that somewhere

Mario Carneiro (Jun 02 2023 at 08:37):

it escapes keywords by checking the grammar

Mac Malone (Jun 02 2023 at 08:40):

Mario Carneiro said:

maybe lake breaks it but lean is able to handle module names like «Foo.Bar».Baz

Lake will break when using globs on this because our glob traversal also used toName so Foo.Bar was split on . and the new toName will just break on any file with special characters (e.g., Foo-Bar/Baz).

Mario Carneiro (Jun 02 2023 at 08:42):

the new toName is expecting properly constructed names. That example of a file with special characters is not a name

Mario Carneiro (Jun 02 2023 at 08:43):

how is it being turned into one?

Mario Carneiro (Jun 02 2023 at 08:43):

you can't have a file with / in it anyway (in linux, mac, windows, at least)

Mac Malone (Jun 02 2023 at 08:44):

https://github.com/leanprover/lake/blob/0d4da61cbfe65f19ac7070c2c9f62f36db529c4c/Lake/Config/Glob.lean#L30

Mac Malone (Jun 02 2023 at 08:44):

Mario Carneiro said:

you can't have a file with / in it anyway (in linux, mac, windows, at least)

Ah, no, I mean a directory named Foo-Bar with a file Baz.lean in it.

Mac Malone (Jun 02 2023 at 08:46):

Another use case similar to globs would be the eventual solution to lake#147.

Mac Malone (Jun 02 2023 at 08:47):

We will need to turn the path into a module name, so Foo-Bar/Baz.lean becomes <<Foo-Bar>>.Baz

Mac Malone (Jun 02 2023 at 08:48):

For those cases, Name.str should be fine though I believe?

Mac Malone (Jun 02 2023 at 08:49):

I need to get off to bed, so I will be jumping off. Happy to discuss further tomorrow if you want, Mario.

Mario Carneiro (Jun 02 2023 at 08:55):

Mac said:

We will need to turn the path into a module name, so Foo-Bar/Baz.lean becomes <<Foo-Bar>>.Baz

right, I guess my point is that if you do this everything just works

Mario Carneiro (Jun 02 2023 at 08:57):

because as long as you just use names, nothing will be sneakily re-parsing the text (this is one thing I don't miss about shell scripting), and using String.toName and Name.toString results in a roundtripping pair if you want to make a string out of it (modulo the issue about keywords)

Mac Malone (Jun 08 2023 at 01:42):

@Mario Carneiro Sorry to resurrect this thread, but I encounter more concerns trying to fix this issue. Presumably, we want to be able to name packages things like lean-data, correct? Since these are not proper names, toName no longer works. However, it seems like we still need them to operate like names. For example, if the lean-data package has a target named linter, that may be built via lake build lean-data.linter (i.e., a two part name of lean-data, linter). I do not see how to accomplish this without replicating the old toName behavior.

Mario Carneiro (Jun 08 2023 at 01:44):

is lean-data the module name?

Mac Malone (Jun 08 2023 at 01:44):

lean-data is the package name

Mario Carneiro (Jun 08 2023 at 01:46):

If this is part of the lake build CLI parsing, you can use dot separators if you like, that field is already hella overloaded

Mario Carneiro (Jun 08 2023 at 01:46):

I don't think it necessarily has to be interpreted as a name in that case

Mac Malone (Jun 08 2023 at 01:46):

In general, CLI targets often want to be improper names (e.g. using hypens, reserved words, etc.) but still namespace like normal names. I do not see how to synergize these two desires without a custom name parser.

Mario Carneiro (Jun 08 2023 at 01:47):

TBH I'm not sure that lowercase package names is a good idea

Mario Carneiro (Jun 08 2023 at 01:47):

it's a bit confusing that the package name of the mathlib package is mathlib even though literally every other place it appears it is spelled Mathlib in code

Mac Malone (Jun 08 2023 at 01:48):

Virtually every modern major languages uses lowercase package name (JS, Python, Ruby, Rust).

Mario Carneiro (Jun 08 2023 at 01:48):

...

Mario Carneiro (Jun 08 2023 at 01:48):

this is not helpful

Mac Malone (Jun 08 2023 at 01:49):

Its not? I think it is good to keep a standard to help with transitioning new and returning users.

Mac Malone (Jun 08 2023 at 01:49):

Hence why I copied NPM's command names and syntax for Lake were possible.

Mario Carneiro (Jun 08 2023 at 01:50):

I think it does not match other uses in lean

Mario Carneiro (Jun 08 2023 at 01:50):

when I want to import things from the std package I import Std, why is that?

Mario Carneiro (Jun 08 2023 at 01:51):

People were quite insistent that everything in mathlib had to be in the Mathlib folder

Mac Malone (Jun 08 2023 at 01:51):

One reason (though I am not sure you would like it), is that I am importing the module Std not the package.

Mario Carneiro (Jun 08 2023 at 01:51):

everything in the mathlib package

Mario Carneiro (Jun 08 2023 at 01:52):

The whole package has to be under that path

Mario Carneiro (Jun 08 2023 at 01:52):

import Std is just the main target

Mac Malone (Jun 08 2023 at 01:52):

I think the logic there was to avoid namespace population?

Mario Carneiro (Jun 08 2023 at 01:52):

well now we have two names for a thing and that's dumb

Mac Malone (Jun 08 2023 at 01:52):

Mario Carneiro said:

The whole package has to be under that path

No it doesn't, a package can have multiple libraries and multiple roots per library.

Mario Carneiro (Jun 08 2023 at 01:53):

remind me why packages exist?

Mac Malone (Jun 08 2023 at 01:54):

That was specifically added to support Haskell-like module naming (e.g., Data.* and Control.*)

Mario Carneiro (Jun 08 2023 at 01:54):

why can't it just be libs

Mac Malone (Jun 08 2023 at 01:54):

Mario Carneiro said:

remind me why packages exist?

To group libraries, exes, and other targets.

Mario Carneiro (Jun 08 2023 at 01:54):

and why do we need said groups

Mac Malone (Jun 08 2023 at 01:54):

e.g. in mathlib, you have both the linter, cache, and library in the same package

Mario Carneiro (Jun 08 2023 at 01:55):

in the same lakefile at least

Mac Malone (Jun 08 2023 at 01:55):

To index the objects e.g. for dependency management.

Mac Malone (Jun 08 2023 at 01:55):

Mario Carneiro said:

in the same lakefile at least

And the same package.

Mario Carneiro (Jun 08 2023 at 01:55):

but what if I need dependencies between libs?

Mac Malone (Jun 08 2023 at 01:56):

Currently unsupported, only package level dependencies are specifiable.

Mac Malone (Jun 08 2023 at 01:56):

(outside of manually coded custom dependency targets)

Mario Carneiro (Jun 08 2023 at 01:58):

so, if every package had to have one lib/exe but a lakefile could have multiple packages, then that would basically mean packages don't have to exist

Mac Malone (Jun 08 2023 at 01:59):

Incorrect, it is the other way around. A lakefile must have exactly one package declaration, but can have zero or more libs, exes, or other targets (and other such things like scripts and deps).

Mario Carneiro (Jun 08 2023 at 01:59):

I'm aware

Mac Malone (Jun 08 2023 at 01:59):

Oh, that was a proposal.

Mario Carneiro (Jun 08 2023 at 01:59):

I'm saying that if it was the other way around then the "package" abstraction would be superfluous

Mac Malone (Jun 08 2023 at 02:00):

But packages don't need targets. They can also e.g. be all scripts.

Mario Carneiro (Jun 08 2023 at 02:00):

scripts don't have to be packages

Mac Malone (Jun 08 2023 at 02:00):

Also, if they have multiple targets, which one would decide the name?

Mac Malone (Jun 08 2023 at 02:00):

Mario Carneiro said:

scripts don't have to be packages

They have to be in one though.

Mario Carneiro (Jun 08 2023 at 02:00):

@[default_target]?

Mario Carneiro (Jun 08 2023 at 02:01):

what do you mean?

Mac Malone (Jun 08 2023 at 02:01):

Multiple libraries can be the default target.

Mario Carneiro (Jun 08 2023 at 02:01):

??

Mario Carneiro (Jun 08 2023 at 02:01):

what kind of default is this

Mac Malone (Jun 08 2023 at 02:01):

Mario Carneiro said:

what do you mean?

You can execute scripts from other packages e.g. via lake run pkg/script

Mario Carneiro (Jun 08 2023 at 02:01):

ok so do that

Mac Malone (Jun 08 2023 at 02:01):

which means the package needs a name

Mac Malone (Jun 08 2023 at 02:02):

Mario Carneiro said:

what kind of default is this

the default_target just means build the this target on a bare lake build.

Mario Carneiro (Jun 08 2023 at 02:03):

for lake run, you could either leave them unnamespaced, or allow the lakefile to namespace entities from another lakefile on require

Mac Malone (Jun 08 2023 at 02:03):

which is useful e.g. in a big package like LLVM with multiple sublibraries and executables of which many should get built by default.

Mac Malone (Jun 08 2023 at 02:04):

Also, the package is convenient for setting package wide settings such as build flags and server flags.

Mario Carneiro (Jun 08 2023 at 02:05):

eh, that has been causing us issues anyway, both std and mathlib use definitions for that now

Mac Malone (Jun 08 2023 at 02:05):

I am also curious why you are attempting to remove the notion of a package name.

Mario Carneiro (Jun 08 2023 at 02:06):

Well, if it had the same name as the main lib there would not be so much problem

Mario Carneiro (Jun 08 2023 at 02:06):

but like I said, two names for the same thing is dumb

Mac Malone (Jun 08 2023 at 02:07):

It's not the same thing though, the package is a superset of the library.

Mario Carneiro (Jun 08 2023 at 02:07):

especially if it's using a naming convention (kebab case) that doesn't play well with lean

Mario Carneiro (Jun 08 2023 at 02:07):

yeah yeah I get it

Mac Malone (Jun 08 2023 at 02:07):

I mean kebab-case doesn't play well with JS, Ruby, Python, or Rust, and they still use it for their package names.

Mario Carneiro (Jun 08 2023 at 02:08):

rust at least renames it before it enters a rust file

Mario Carneiro (Jun 08 2023 at 02:08):

and JS doesn't have it in JS files at all

Mario Carneiro (Jun 08 2023 at 02:08):

not sure about ruby/python package management

Mac Malone (Jun 08 2023 at 02:09):

It appears in JS import statements.

Mario Carneiro (Jun 08 2023 at 02:09):

in quotes?

Mario Carneiro (Jun 08 2023 at 02:10):

do any of those languages use them as identifiers?

Mac Malone (Jun 08 2023 at 02:10):

Python sorta, it works like Lean -- package names and module names are orthogonal and module names are valid identifiers.

Mario Carneiro (Jun 08 2023 at 02:11):

maybe that would solve the issues: just use package "foo" where in the lakefile

Mac Malone (Jun 08 2023 at 02:11):

I could always change the keyword package syntax to use quotes if that would appeals more.

Mario Carneiro (Jun 08 2023 at 02:11):

and likewise in require

Mario Carneiro (Jun 08 2023 at 02:12):

where else do you have to type a package name?

Mac Malone (Jun 08 2023 at 02:12):

However, I still want to transform them into Names for quicker storage in maps (but in that case, they can just be simple names).

Mac Malone (Jun 08 2023 at 02:13):

However, this also applies to targets and facets as well -- they also appear on both CLI and in lakefiles. I guess I could force valid Lean names there though.

Mac Malone (Jun 08 2023 at 02:14):

Or use simple names if the user uses a string.

Mario Carneiro (Jun 08 2023 at 02:15):

IIUC those also coincide with the name of a module unless you do something special

Mac Malone (Jun 08 2023 at 02:16):

That is just for libraries (and sometimes executables), custom targets do not necessarily even have modules associated with them.

Mario Carneiro (Jun 08 2023 at 02:21):

well as ever our naming conventions are in shambles: Qq, doc-gen4 and proofwidgets are packages, runLinter and cache are lake_exe targets, MathlibExtras is a lake_lib target, and that's just in the mathlib lakefile

Mac Malone (Jun 08 2023 at 02:23):

Yeah, there does not seem to be an agreed upon name convention in the mathlib world for this stuff.

Mario Carneiro (Jun 08 2023 at 02:23):

I suspect that lake itself is causing some of the problem here, e.g. ProofWidgets is written like that everywhere in the readme and the repo name is ProofWidgets4 but the package name is proofwidgets, most likely because lake new did that

Mario Carneiro (Jun 08 2023 at 02:24):

My sense is that there is a strong tendency toward capital camel case package names

Mario Carneiro (Jun 08 2023 at 02:26):

people mostly just do what lake new recommends, and since the package name almost never matters it's not a surprise that you get weird stuff

Mac Malone (Jun 08 2023 at 02:27):

Lake does not all lowercase package name, it just decapitalizes them, so proofwidgets had to be manually named to get that lowercase w.

Mario Carneiro (Jun 08 2023 at 02:28):

why does it do that?

Mario Carneiro (Jun 08 2023 at 02:29):

e.g. in cargo new you type the package name not the module name

Mac Malone (Jun 08 2023 at 02:29):

Also, this whole thread is about Adam naming his package with a hyphen, so there is definitely people who feel more comfortable with that (despite it being an illegal identifier and not recommended anywhere).

Mario Carneiro (Jun 08 2023 at 02:29):

making it not a "name" mostly solves that problem

Mac Malone (Jun 08 2023 at 02:30):

Mario Carneiro said:

why does it do that?

For legacy reasons, to prevent a clash between the package identifier and the library identifier.

Mario Carneiro (Jun 08 2023 at 02:30):

I hope those reasons no longer apply?

Mario Carneiro (Jun 08 2023 at 02:30):

because I definitely want them to clash

Mac Malone (Jun 08 2023 at 02:30):

The still applies to prevent a CLI clash between the two targets.

Mac Malone (Jun 08 2023 at 02:31):

Mario Carneiro said:

because I definitely want them to clash

I mean they are definitions, so they generally can't clash?

Mario Carneiro (Jun 08 2023 at 02:31):

huh?

Mario Carneiro (Jun 08 2023 at 02:31):

this architecture has so many surprises

Mac Malone (Jun 08 2023 at 02:31):

I just happen to have elided the packages name because, since there can only be one, I can just have the definition use a fixed name (e.g. _package).

Mac Malone (Jun 08 2023 at 02:32):

Being able to refer to this stuff can be useful when building custom targets.

Mario Carneiro (Jun 08 2023 at 02:34):

in that case there isn't a clash though, if you have a package named the same as a lib

Mario Carneiro (Jun 08 2023 at 02:34):

unless the lib is named _package

Mac Malone (Jun 08 2023 at 02:35):

Yes, hence me stating the clash avoidance in packages was for legacy reasons. Clash avoidance is still relevant in targets though.

Mac Malone (Jun 08 2023 at 02:38):

Mac said:

Lake does not all lowercase package name, it just decapitalizes them, so proofwidgets had to be manually named to get that lowercase w.

Though, having stated this, since I am changing the code here anyway, I think I will just use to toLower (that will help make the convention more obvious).

Mario Carneiro (Jun 08 2023 at 02:38):

or how about just not touching the name specified?

Mario Carneiro (Jun 08 2023 at 02:38):

did you consider maybe taking a poll on this?

Mac Malone (Jun 08 2023 at 02:39):

Well that has the disadvantage of creating the very confusing mix we have currently.

Mario Carneiro (Jun 08 2023 at 02:39):

if I cargo new FooBar-baz that's exactly the package name I get

Mac Malone (Jun 08 2023 at 02:39):

I was basing the conventions on those used by other package managers.

Mac Malone (Jun 08 2023 at 02:40):

But sure, I can not touch it -- I do not really care that much.

Mac Malone (Jun 08 2023 at 02:41):

As I said, the decapitalization was just designed to avoid a previous clash (and that particular transformation was chosen to fit with the usual naming convention for packages in other languages).

Mac Malone (Jun 08 2023 at 02:43):

I am generally open to style change suggestions. In designing Lake, I just did whatever seemed best to me at the time based on my knowledge of other language's package managers and there conventions.

Mac Malone (Jun 08 2023 at 02:44):

My only major constraints on suggestions is that I have to not think they look ugly and they have to work with the existing/planned features of Lake.

Mario Carneiro (Jun 08 2023 at 02:44):

in that case, do you mind if I put 4 spaces in all your function signatures? :D

Mac Malone (Jun 08 2023 at 02:44):

Yes, because as I told Sebastian, I do think that looks ugly. :rofl:

Mac Malone (Jun 08 2023 at 02:45):

Its hard for me to parse and I am the primary person who has to work with this code.

Mario Carneiro (Jun 08 2023 at 02:45):

yeah, I was going to say that this

I am generally open to style change suggestions.

is basically the opposite of this

I have to not think they look ugly

Mario Carneiro (Jun 08 2023 at 02:46):

obviously you think all styles except your own are ugly

Mac Malone (Jun 08 2023 at 02:47):

There is a wide range of things that I do not think look ugly. I am happy to support anything (even if I think it looks ugly) and am willing to recommend, mandate, or use anything that I do not think is ugly.

Mario Carneiro (Jun 08 2023 at 02:47):

Mac said:

I am the primary person who has to work with this code.

As you know, I would like to change this

Mac Malone (Jun 08 2023 at 02:47):

That is actually news to me.

Mac Malone (Jun 08 2023 at 02:48):

What do you mean by that? Do you want to contribute or do you want someone else to take over the project? I am confused.

Mario Carneiro (Jun 08 2023 at 02:49):

I want to distribute maintenance over multiple people who each feel relatively okay with making changes

Mario Carneiro (Jun 08 2023 at 02:54):

Lake is an important project which affects many people, and I want some more consensus decision making and less "I think this is the best way, let's assume everyone else does too"

Mac Malone (Jun 08 2023 at 02:56):

I mean, that is pretty much everything in Lean is developed? If Leo likes it (and secondarily Sebastian likes it), it gets in Lean; if you like it, it gets in Std; If Gabriel likes it (and secondarily Wojciech likes it), it gets in the VSCode extension; and if I like it, it gets in Lake. There is some negotiation obviously, but that is the general rule.

Mario Carneiro (Jun 08 2023 at 02:58):

you forgot one (big) project

Mac Malone (Jun 08 2023 at 02:59):

Also, for Lake, it is not like there have been a whole lot of suggestions to the contrary. I have merged virtually every PR made to Lake.

Mac Malone (Jun 08 2023 at 02:59):

And I have yet to close an issue as wontfix.

Mac Malone (Jun 08 2023 at 03:01):

Mario Carneiro said:

you forgot one (big) project

Mathlib is kind of a special case. Both because of its side and its target audience. It is still somewhat true though there as well. If the lead mathematicians and maintainers their don't like it, it doesn't get added. Hence its Classical focus as opposed to something else.

Mac Malone (Jun 08 2023 at 03:02):

It is also technically not part of the Lean infrastructure.

Mario Carneiro (Jun 08 2023 at 03:04):

but also, for my part and I think several of the other examples you gave, maintainers defer a significant amount of decision making to consensus. I try very hard not to push changes in opposition to the popular view and argue my case instead

Mac Malone (Jun 08 2023 at 03:04):

What did you not like about that? Are you arguing mathlib is part of the Lean infrastructure? It is undoubtedly a key (and probably the key) component of the ecosystem, but that is not the same as being part of the infrastructure.

Mario Carneiro (Jun 08 2023 at 03:05):

If you think it doesn't act as infrastructure you aren't paying attention

Mac Malone (Jun 08 2023 at 03:06):

I am confused. Maybe you and I have a different definition of infrastructure?

Mario Carneiro (Jun 08 2023 at 03:06):

maybe you don't realize what is in mathlib

Mario Carneiro (Jun 08 2023 at 03:07):

lake exe cache is literally an implementation of a missing feature in lake

Mac Malone (Jun 08 2023 at 03:08):

Mario Carneiro said:

I try very hard not to push changes in opposition to the popular view and argue my case instead

I do too. Again, I have yet to have a feature suggestion I have rejected.

Mario Carneiro (Jun 08 2023 at 03:08):

the mathlib tactic library is more comprehensive than lean core and std combined

Mac Malone (Jun 08 2023 at 03:08):

Mario Carneiro said:

lake exe cache is literally an implementation of a missing feature in lake

That feature exists, it is called lake upload. You simply cannot feasibly use it because Mathlib updates too often.

Mac Malone (Jun 08 2023 at 03:09):

And I cannot implement cache get in Lake because I do not have a general repository to store builds of arbitrary packages in.

Mario Carneiro (Jun 08 2023 at 03:09):

right, you aren't meeting people where they are so they have to write a thing on the side

Mario Carneiro (Jun 08 2023 at 03:10):

gosh, that almost sounds like infrastructure

Mac Malone (Jun 08 2023 at 03:10):

cache get uses mathlib's proprietary azure instance. That does not work as a general Lake feature.

Mario Carneiro (Jun 08 2023 at 03:10):

it is configurable

Mario Carneiro (Jun 08 2023 at 03:11):

also I don't think that's what proprietary means

Mac Malone (Jun 08 2023 at 03:11):

Mario Carneiro said:

gosh, that almost sounds like infrastructure

Just because LLVM has its own infrastructure does not make it part of the C++ infrastructure. The same is true for mathlib.

Mario Carneiro (Jun 08 2023 at 03:12):

somehow cargo does make use of crates.io "proprietary" infrastructure

Mac Malone (Jun 08 2023 at 03:12):

crates.io is owned by the Rust Foundation.

Mac Malone (Jun 08 2023 at 03:13):

It is certainly the hope that we will have an analogous service eventually but we currently do not have it.

Mario Carneiro (Jun 08 2023 at 03:13):

we do have the service, we don't have the foundation

Mac Malone (Jun 08 2023 at 03:14):

Also, I should note that mathlib went and did its own thing without attempting to integrate it with Lake. So, I am not sure how I caused a problem there.

Mario Carneiro (Jun 08 2023 at 03:14):

because you make it very clear what your opinions are and how interested you would be in PRs that go against those opinions

Mac Malone (Jun 08 2023 at 03:15):

Mario Carneiro said:

we do have the service, we don't have the foundation

Yes, and I (and everyone else) would need to be able to upload to that service to support it in Lake, which no outside the mathlib maintainers can currently do.

Mario Carneiro (Jun 08 2023 at 03:15):

you have already more or less said that lake exe cache would not make it into lake as it currently exists

Mario Carneiro (Jun 08 2023 at 03:16):

and yet, there are conferences coming up, people want to distribute lean now and what are they going to do?

Mac Malone (Jun 08 2023 at 03:16):

I would love it to be part of Lake. I just, personally, don't see how it could. If you can figure it out, I would be all ears.

Mario Carneiro (Jun 08 2023 at 03:17):

simple, put it in lake as is and work out how to generalize later

Mac Malone (Jun 08 2023 at 03:19):

Mario Carneiro said:

because you make it very clear what your opinions are and how interested you would be in PRs that go against those opinions

If that is the case, I am apparently bad at communication. I am happy to have PRs or RFCs or whatever to address any deficiencies.

Mac Malone (Jun 08 2023 at 03:20):

Mario Carneiro said:

simple, put it in lake as is and work out how to generalize later

I do not have the keys. I cannot test it. That would be irresponsible. Secondly, if it only works for mathlib that will be extremely confusing to anyone not using mathlib.

Mac Malone (Jun 08 2023 at 03:21):

I am happy to add the functionality with a configurable server that mathlib can set to its current one. But I still need a way to test that it works.

Mario Carneiro (Jun 08 2023 at 03:21):

I'm sure we can set something up. It doesn't just work for mathlib, although we might want to whitelist some projects to make it work for

Mario Carneiro (Jun 08 2023 at 03:22):

I don't think it would be a problem to have the keys in lake CI too if necessary, although I doubt it would be

Mac Malone (Jun 08 2023 at 03:22):

See, that is what I would like to hear. It seems to me that you have just assumed I am nonresponsive without testing to see if that hypothesis was true.

Mario Carneiro (Jun 08 2023 at 03:23):

okay, I'm working on a PR to delete all the monads from lake, we'll see how it goes :)

Mac Malone (Jun 08 2023 at 03:24):

Obviously, that would need an RFC before a PR (just like it would on Lean).

Mac Malone (Jun 08 2023 at 03:25):

Which is kind of my point, Leo is vocally more closed to PRs than I am and yet you seem to have assumed I am more opposed to them. I am very confused as to how that happened.

Mac Malone (Jun 08 2023 at 03:28):

I will fully admit there are many things about Lake's codebase that could use improvement. But I currently work on Lake for free, so I tend to prioritize bug fixing, PR reviews, and heavily demanded requests when I have time. Or, if I happen to be in the mood to work on Lake for fun, adding things that I want to it.

Mario Carneiro (Jun 08 2023 at 03:32):

Right, that's because it's a one-man show and why I want to diversify the maintainership

Mac Malone (Jun 08 2023 at 03:32):

Why Lake in particular? Couldn't you have the same objection to the Lean core itself?

Mac Malone (Jun 08 2023 at 03:35):

Also, I am curious what you are hoping to accomplish. I would be happy to accept a RFC/PR dealing with any of the outstanding issues, so what exactly is it that you want addressed that requires a different approach?

Mac Malone (Jun 08 2023 at 03:36):

Or does this just have to do with code style or something like that?

Mario Carneiro (Jun 08 2023 at 03:36):

Do RFCs get resolved? The ones I can see on lake repo don't look too fruitful

Mac Malone (Jun 08 2023 at 03:41):

The only two RFCs currently on the issue board are related to cache and appear to have been dropped for that instead of proceeding with the discussion. The were two outstanding points: 1) the fact that non-mathlib maintainers do not have access to the storage and 2) and an issue raised by Gabriel about how to handle dependencies.

Mac Malone (Jun 08 2023 at 03:43):

There was also the question about OS/architecture independent hashing, which my last comment on was expressing acceptance of a mentioned solution.

Mario Carneiro (Jun 08 2023 at 03:43):

can you elaborate on the problem with (1)?

Mario Carneiro (Jun 08 2023 at 03:45):

More precisely, no one pushes to the storage except mathlib4 CI, for which CMU is donating some machines because it would be too much for GH actions

Mario Carneiro (Jun 08 2023 at 03:47):

the situation is pretty similar for crates.io: it is run by foundation money and is maintained by small team but any cargo can connect to it

Mario Carneiro (Jun 08 2023 at 03:48):

we could make it work with a different server, but that would still require setting up a server just for testing

Mac Malone (Jun 08 2023 at 03:48):

There are two problems: 1) the storage needs to be accessible for testing the Lake code and 2) general end users need to have some direction on how to use this feature for their own projects

Mac Malone (Jun 08 2023 at 03:48):

If it only works for mathlib, then it makes sense to leave it in mathlib.

Mario Carneiro (Jun 08 2023 at 03:48):

it's really not that hard to make it work with another server, it's just a URL

Mac Malone (Jun 08 2023 at 03:49):

One thing I would suggest on that front is hooking the lake exe cache get into the standard build process for mathlib, which is already possible through extraDepTargets.

Mac Malone (Jun 08 2023 at 03:51):

It is also worth noting that Arthur's RFC was not just cache, but also a unified store for packages (so that the files were not downloaded to each project).

Mario Carneiro (Jun 08 2023 at 03:51):

yes, I agree with both points

Mario Carneiro (Jun 08 2023 at 03:52):

having a copy of mathlib in every project is kind of crazy right now

Mac Malone (Jun 08 2023 at 03:52):

Gabriel's comments were largely critical of that idea, because it is unlikely for package versions and configurations to match across projects.

Mario Carneiro (Jun 08 2023 at 03:53):

lake exe cache doesn't have a big issue with that, because it hashes all the relevant dependencies, including the lean version

Mario Carneiro (Jun 08 2023 at 03:53):

worst case scenario you don't get sharing, which is the status quo

Mac Malone (Jun 08 2023 at 03:54):

Yeah, those critiques do not apply to cache.

Mac Malone (Jun 08 2023 at 03:56):

He did, however, note that in the general sense for packages, there is a significant trade-off right now between hashing and scalability and he was not sure it was solvable at the moment (due to the very discrete nature of Lean versioning).

Mario Carneiro (Jun 08 2023 at 03:56):

Mac said:

One thing I would suggest on that front is hooking the lake exe cache get into the standard build process for mathlib, which is already possible through extraDepTargets.

On this point, couldn't the mathlib lakefile just declare the cache server URL? You don't have to run cache get until after you have parsed the lakefile anyway

Mac Malone (Jun 08 2023 at 03:58):

Of course,, incorporating cache is easy. The question is whether the current version is actually useful for anyone out of mathlib. If not, this could actually hurt mathlib as and changes to the caching algorithm would require PRs to Lake/Lean and thus not enable the same kind of speedy changes mathlib is used to.

Mac Malone (Jun 08 2023 at 03:59):

That is why my current suggestion for improvement would be to just incorporate it into mathlib's build itself using extraDepTargets.

Mac Malone (Jun 08 2023 at 04:00):

Once we have a general repository for packages then I think merging it with Lake would be a good idea.

Mario Carneiro (Jun 08 2023 at 04:02):

The question is whether the current version is actually useful for anyone out of mathlib.

Well, std is also being cached and lots of non-mathlib projects use std

Mac Malone (Jun 08 2023 at 04:03):

Overall, my last comment on the cache/store RFC is this one, which did three things: 1) summarize Gabriel's objections related to likely cache misses and ask for further input, 2) accept a version of his suggestion regarding hashing, and 3) note the storage issue.

Mac Malone (Jun 08 2023 at 04:04):

Mario Carneiro said:

The question is whether the current version is actually useful for anyone out of mathlib.

Well, std is also being cached and lots of non-mathlib projects use std

True, but unless I am mistaken, Std has pretty fast build times, correct?

Mac Malone (Jun 08 2023 at 04:04):

(Hence why cache is in mathlib not Std?)

Mario Carneiro (Jun 08 2023 at 04:05):

pretty fast, comparatively, but as pointed out in the RFC it's still nontrivial especially if you have many projects depending on it

Mac Malone (Jun 08 2023 at 04:05):

I will also admit that I have not fully read cache's code so I am not sure what it is doing different from the stuff Lake already supports.

Mario Carneiro (Jun 08 2023 at 04:07):

it collects the unbuilt dependencies of the project, downloads them using curl --parallel and unzips them

Mac Malone (Jun 08 2023 at 04:07):

Mario Carneiro said:

pretty fast, comparatively, but as pointed out in the RFC it's still nontrivial especially if you have many projects depending on it

How would the number of projects effect build times? The RFC I believe was talking about storage (for the unified store), which Gabriel rather solidly shot down (at least that was my impression).

Mario Carneiro (Jun 08 2023 at 04:07):

If you have 3 projects depending on std that's 3 builds

Mario Carneiro (Jun 08 2023 at 04:08):

even if they use the same version of everything

Mac Malone (Jun 08 2023 at 04:08):

But I imagine you are not building them in sequence?

Mac Malone (Jun 08 2023 at 04:08):

Mario Carneiro said:

even if they use the same version of everything

Gabriel's main point was that this is highly unlikely.

Mario Carneiro (Jun 08 2023 at 04:08):

well it's not like anything helps us ensure it

Mario Carneiro (Jun 08 2023 at 04:09):

it's not very desirable to have many different versions of lean simultaneously

Mario Carneiro (Jun 08 2023 at 04:09):

right now the best we can do is try to update the world in waves

Mac Malone (Jun 08 2023 at 04:09):

And yet that is the way Lean currently works (which is not a result of Lake).

Mario Carneiro (Jun 08 2023 at 04:11):

One thing that we could do is put the package store inside the .elan/today-toolchain directory

Mac Malone (Jun 08 2023 at 04:11):

Gabriel's point was packages other than mathlib would not likely be updated in this manner so the caching was likely to miss if they all were on different versions of Lean (which is likely for everything except mathlib).

Mario Carneiro (Jun 08 2023 at 04:12):

then if you wipe the elan store the packages go with it

Mario Carneiro (Jun 08 2023 at 04:12):

those build artifacts aren't useful for other versions anyway

Mac Malone (Jun 08 2023 at 04:12):

Yeah, that is how I would do it.

Mario Carneiro (Jun 08 2023 at 04:13):

if someone is maintaining 3+ projects they are probably also doing this dance

Mario Carneiro (Jun 08 2023 at 04:13):

maybe not on the same cadence as mathlib

Mac Malone (Jun 08 2023 at 04:13):

And I think that could reasonably work right now. Store each package with its commit id and first trying symlinking from the package store before cloning. That would help with deduplication within a given toolchain.

Mac Malone (Jun 08 2023 at 04:16):

Gabriel's point is that it would be hard to have cloud builds for these packages, though. As the each package is likely to be built on a different nightly and only store the cloud build for that nightly (e.g., via its CI). Then, a package which depends on those disparate package is likely to be on a different nightly (or at least a different nightly from some of them) and thus require rebuilding all the disparately-versioned packages on its nightly.

Mac Malone (Jun 08 2023 at 04:16):

And that is just the Lean version, there is also the OS, architecture, configuration options, build flags, and other things to think of that could cause a rebuild for more CS-oriented packages (which Gabriel also brought up).

Mario Carneiro (Jun 08 2023 at 04:17):

Mac said:

Gabriel's point is that it would be hard to have cloud builds for these packages, though. As the each package is likely to be built on a different nightly and only store the cloud build for that nightly (e.g., via its CI).

cache doesn't do that, it stores builds for lots of commits on lots of nightlies

Mac Malone (Jun 08 2023 at 04:18):

But just the nightlies mathlib has been updated to, right?

Mario Carneiro (Jun 08 2023 at 04:19):

yes, mathlib has a lean-toolchain file after all

Mac Malone (Jun 08 2023 at 04:19):

His point was most other packages would only update its nightly infrequently so the space of nightlies with a cloud build for such packages would be sparse.

Mario Carneiro (Jun 08 2023 at 04:19):

okay, so use those nightlies

Mario Carneiro (Jun 08 2023 at 04:20):

that's what we recommend for users in the readme

Mac Malone (Jun 08 2023 at 04:21):

Again, the problem is that for package which depends on multiple packages with different sparse nightly builds, it is unlikely there will be significant overlap, so most of them will have to rebuilt no matter what nightly is picked.

Mario Carneiro (Jun 08 2023 at 04:22):

sure, but if, say, one of those packages is 10x the size of the others, you know which nightly to use

Mac Malone (Jun 08 2023 at 04:22):

Mathlib avoids this because it uploads its entire chain of dependencies and most packages which depend on it only depend on it and not other packages with conflicting nightlies.

Mac Malone (Jun 08 2023 at 04:23):

Mario Carneiro said:

sure, but if, say, one of those packages is 10x the size of the others, you know which nightly to use

The discussion there was not just for packages which depend on mathlib. Also, it can be the case the end user package use a newer nightly than its dependencies.

Scott Morrison (Jun 08 2023 at 04:23):

I don't want to interrupt the show here, but chiming in on something mentioned above --- I find it confusing and annoying that mathlib is lowercase in lakefile.lean and Mathlib everywhere else, and I would prefer if this discrepancy were compulsorily removed (i.e. always using Mathlib).

Johan Commelin (Jun 08 2023 at 04:24):

The question is... is mathlib a type? Or just the conjunction of all its theorems?

Mac Malone (Jun 08 2023 at 04:36):

@Scott Morrison Ah, so you fit with Mario here. I am curious, do you feel similarly about the mathlib GitHub repository? Should its name also be capitalized?

Scott Morrison (Jun 08 2023 at 04:37):

Interesting! I hadn't thought of that one, and while is seems strange at first, consistency sounds great! :-)

Mac Malone (Jun 08 2023 at 04:38):

So you would go with uppercase everywhere. That certainly does have the advantage of consistency.

Mac Malone (Jun 08 2023 at 04:39):

Note that it is perfectly possible to adopt that naming convention. Lake will work fine if the mathlib package is renamed Mathlib (other than downstream require statement also needing to change the name).

Mario Carneiro (Jun 08 2023 at 04:42):

I was thinking about that as well, and I personally prefer keeping the repo lowercase, as well as the name of the folder in which I store the mathlib4 repo

Mac Malone (Jun 08 2023 at 04:44):

And for me, since I think of the folder and repo name as the name of the package, it makes sense for the package name in the lakefile to match them.

Mac Malone (Jun 08 2023 at 04:45):

Mario successfully convinced me to leave the package name alone in lake new, so everyone will be able use the naming convention they want now without Lake's input on the topic.

Mac Malone (Jun 08 2023 at 05:04):

@Mario Carneiro I like your suggestion of supporting / using strings and/or simple names for packages (and possibly targets) to avoid the illegal name debacle. However, I have been attempting to implement his and discovered that doing this would require a large refactor than expected and/or break things (it is not currently easy to prohibit a package / target from acquiring a non-simple name and breaking things). Thus, I am going use toName defaulting to mkSimple as a stop-gap measure for now.

James Gallicchio (Jun 09 2023 at 20:51):

where are target names used, other than lake build?

James Gallicchio (Jun 09 2023 at 20:52):

i was skimming the earlier convo pretty fast so I apologize if I missed that

Scott Morrison (Jun 09 2023 at 23:20):

e.g. lake exe cache get, cache is a taregt.


Last updated: Dec 20 2023 at 11:08 UTC