Zulip Chat Archive

Stream: mathlib4

Topic: treating mathlib as a stdlib


Mario Carneiro (May 18 2021 at 20:08):

I feel like this is directly opposed to the stated goal of making Lean 4 a general purpose programming language. I highly doubt anyone is going to think of it that way if the "stdlib" of the library is called "mathlib" and the vast majority of the discussion on the forums is from people deep in mathematics.

I don't think it's a bad thing that lots of mathematicians are involved. It's sad that there aren't more computer scientists, but there are at least a dozen people who are relatively active here and quite good at writing metaprogramming tools for mathlib or otherwise. But I don't think that the number of people involved has anything to do with the stated focus of the (mathlib) project. It's certainly not anything I have any control over, so I don't worry about it so much. As you say, lean 4 is more focused on being a general purpose programming language, but it's already quite clear that it will not have a large built in standard library, there just isn't the manpower, so having a good std++ library is more important than ever.

Mac (May 18 2021 at 20:12):

Mario Carneiro said:

I don't think it's a bad thing that lots of mathematicians are involved.

Sorry, It wasn't my intent to criticize that. As I said, I quite enjoy many of the deep math discussions here (and by "on the forums" I meant here on Zullip). I have no qualms with the people involved. I just think it may help to also have computer science targeted std++ as well. I am not saying this should be part of the Lean core, but that there should be some sore of a separate package from mathlib that incorporates the more general additions and targets a more general audience.

Mario Carneiro (May 18 2021 at 20:12):

I just think it may help to also have computer science targeted std++ as well.

I don't know how to put this any more plainly: mathlib is a computer science targeted std++ library. It is other things too.

Mario Carneiro (May 18 2021 at 20:13):

yeah the name's weird, please look past it

Mac (May 18 2021 at 20:14):

Mario Carneiro said:

yeah the name's weird, please look past it

Is it fair to state here the one of the most important aspects of marketing is branding? :smile:

Mario Carneiro (May 18 2021 at 20:15):

Again, I will point out that the name itself was chosen by leo, who isn't really involved with the project and I think misunderstands the focus to some extent

Mario Carneiro (May 18 2021 at 20:15):

you aren't the only one who thinks mathlib is only about math

Mario Carneiro (May 18 2021 at 20:16):

But that's not a hill I want to die on. I'm not a PR guy

Mac (May 18 2021 at 20:17):

The point I am trying to make is that while some of the content of mathlib is, in fact, for a general audience, the branding and the community around it would give most people a different first impression which may cause them (newbies) to discount it and thus, possibly, since it is touted as the basic library of lean (ala old boost in C++), Lean as a whole.

Mario Carneiro (May 18 2021 at 20:17):

So in that case we should just be clear and upfront in our messaging that it's about more than that

Mario Carneiro (May 18 2021 at 20:18):

boost is a pretty good analogue

Mario Carneiro (May 18 2021 at 20:20):

In any case, I have messaged to you about this fact, so unless you have some PR suggestions for the leanprover-community webpage or other places about it I think we can drop the topic

Mac (May 18 2021 at 20:21):

I also think the mathematically oriented programming community has a unfortunate love for the concept of monorepos (ex. numpy) that is not shared by (and is, in fact, often diametrically opposed to the view of) the computer science oriented programming community.

Mac (May 18 2021 at 20:22):

Admittedly, the web dev community seems split between the two (both monorepos and tiny one feature repos are popular there).

Mario Carneiro (May 18 2021 at 20:22):

Maybe take it to #mathlib4 , that's a whole long discussion we don't need to have here

Mac (May 18 2021 at 20:23):

true, this has gotten rather off topic XD

Mac (May 18 2021 at 20:26):

Mario Carneiro said:

you aren't the only one who thinks mathlib is only about math

Mario Carneiro said:

Again, I will point out that the name itself was chosen by leo, who isn't really involved with the project and I think misunderstands the focus to some extent

I one last word on this topic though. I do think mathlib's PR problem is somewhat crystallized by the fact that the primary developer of Lean itself thinks it is math focused. With that, I rest my case for now.

Mario Carneiro (May 18 2021 at 20:28):

yes, it's a whole bunch of self fulfilling prophecies. Not much to do about it at this point

Mac (May 18 2021 at 20:35):

Mario Carneiro said:

Not much to do about it at this point

Really? I would think the advent of the Lean 4 is the perfect time to restructure, refocus, and rebrand. I would suggest keeping the math parts in mathlib and the other parts in some other library stdlib/corelib/metalib/boost or whatever. But, that is just my opinion.

Also how did you migrate part of thread to another stream? I feel like the tail end of this discussion should probably be in #mathlib4 .

Mario Carneiro (May 18 2021 at 20:37):

admin powers. point me to what and where to move

Mac (May 18 2021 at 20:38):

I would say from https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/.60quote.60.2F.60toExpr.60.20id.2Fnon-id.20solution.20.5BRFC.5D/near/239317769 the discussion is definitely mathlib-focused. Though it really needs the second half your comment here: https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/.60quote.60.2F.60toExpr.60.20id.2Fnon-id.20solution.20.5BRFC.5D/near/239313689 for context. It could also use the first half of my comment here: https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/.60quote.60.2F.60toExpr.60.20id.2Fnon-id.20solution.20.5BRFC.5D/near/239315530

Mario Carneiro (May 18 2021 at 20:39):

I don't really know what the topic is :P

Mac (May 18 2021 at 20:40):

"treating mathlib as a stdlib" or something thereabouts?

Notification Bot (May 18 2021 at 20:41):

This topic was moved here from #lean4 > quote/toExpr id/non-id solution [RFC] by Mario Carneiro

Mario Carneiro (May 18 2021 at 20:42):

I can't move half a comment so I'll just leave it

Mac (May 18 2021 at 20:43):

makes sense

Mac (May 18 2021 at 20:45):

Mario Carneiro said:

In any case, I have messaged to you about this fact, so unless you have some PR suggestions for the leanprover-community webpage or other places about it I think we can drop the topic

I just noticed this comment (I kind of focused on the back and forth and missed it) what do you mean you messaged me? Do you mean a PM, because I haven't got one.

Mario Carneiro (May 18 2021 at 20:46):

Mac said:

Mario Carneiro said:

Not much to do about it at this point

Really? I would think the advent of the Lean 4 is the perfect time to restructure, refocus, and rebrand. I would suggest keeping the math parts in mathlib and the other parts in some other library stdlib/corelib/metalib/boost or whatever.

This also touches on the point about why mathlib is a monorepo. The basic issue is that maintenance requires coordinating across several things that might look like separate projects to end users, and we know from experience that this is significantly harder when multiple repos are in use.

Mario Carneiro (May 18 2021 at 20:48):

Mac said:

Mario Carneiro said:

In any case, I have messaged to you about this fact, so unless you have some PR suggestions for the leanprover-community webpage or other places about it I think we can drop the topic

I just noticed this comment (I kind of focused on the back and forth and missed it) what do you mean you messaged me? Do you mean a PM, because I haven't got one.

I mean that I have explained the situation to you in comments here so that you are no longer confused/mislead like the newbies you referenced. If you want to make life better for them too then this conversation or something similar needs to exist in relevant documentation, but if it's just us talking then there is no reason to continue to play the newbie role.

Mac (May 18 2021 at 20:51):

Mario Carneiro said:

This also touches on the point about why mathlib is a monorepo. The basic issue is that maintenance requires coordinating across several things that might look like separate projects to end users, and we know from experience that this is significantly harder when multiple repos are in use.

I certainly can understand that and sympathize heavily (I tend to through everything in one repo/package myself when developing on large projects). However, unfortunately, what is convenient for development is often not convenient for the end-user. When it comes to Git repo itself, some of this can be solved through submodules. For the Lean package though, a do think a split is necessary. But as I am not the one with either the control or the maintenance burden, this is only a recommendation, but one I would suggest heavily considering.

Mac (May 18 2021 at 20:53):

Mario Carneiro said:

then there is no reason to continue to play the newbie role.

Ah. Makes sense.

Mario Carneiro (May 18 2021 at 20:54):

The packaging situation in lean 4 still needs time to mature. I don't think that we know what that will look like ultimately, and it has come up a few times now that mathlib certainly needs the ability to be packaged in a more fine grained way. But that need not entail that the project itself be atomized, and for maintenance reasons as I mentioned this is something I would certainly like to avoid.

Mario Carneiro (May 18 2021 at 20:57):

Maybe we could try having a zillion separately compiled mini-projects in the mathlib4 repo, but right now I think that would mean a suboptimal module structure and file organization, and it might also be more expensive overall, although this requires testing to be sure.

Mac (May 18 2021 at 21:04):

Mario Carneiro said:

The packaging situation in lean 4 still needs time to mature. I don't think that we know what that will look like ultimately, and it has come up a few times now that mathlib certainly needs the ability to be packaged in a more fine grained way. But that need not entail that the project itself be atomized, and for maintenance reasons as I mentioned this is something I would certainly like to avoid.

I understand and that sounds fine. My only additional suggestion would be to split up the subpackages into Git submodules, so that while all the work can be done, on the developer side, in a single mathlib monorepo, consumers can encounter it (and depend on it) through different lenses.

This may already be compatible with leanpkg. A directory structure like so might work:

mathlib4/
  .git
  stdlib4\
   .git
    Stdlib\
      <sources>
    Stdlib.lean
    leanpkg.toml
  Mathlib\
     <sources>
  Mathlib.lean
  leanpkg.toml

If I am not mistaken, Stdlib can be a submodule and the top-level leanpkg.toml can depend on it with local. It can then be used in mathlib with Stdlib.* and consumers can depend on either mathlib as a whole (through its collective git repo) or stdlib only (through its own git repo). This might also resolve some of the compile boundaries that where discussed earlier (though not all of them).

Mario Carneiro (May 18 2021 at 21:05):

I understand and that sounds fine. My only additional suggestion would be to split up the subpackages into Git submodules, so that while all the work can be done, on the developer side in a single mathlib repo, consumers can encounter it (and depend on it) through different lenses.

Submodules are literally different repos though. They go through separate PR processes and everything

Mario Carneiro (May 18 2021 at 21:07):

That directory structure looks very confusing, you have two Mathlib/Mathlib/ directories, and what is in Mathlib/Stdlib/Stdlib/?

Mac (May 18 2021 at 21:07):

True, though that is somewhat of the goal? Though I can see how PRs for multiple repos could get annoying. I doubt you would need that many different repos though.

Mario Carneiro (May 18 2021 at 21:08):

the "from experience" I mentioned earlier is referring to leanprover-community/lean and leanprover-community/mathlib

Mac (May 18 2021 at 21:09):

Mario Carneiro said:

That directory structure looks very confusing, you have two Mathlib/Mathlib/ directories, and what is in Mathlib/Stdlib/Stdlib/?

That is how leanpkg works, yes? You have the out repo directory and then leanpkg demands the sources go in a directory with the same name as the package hence Mathlib/Mathlib. Or maybe I missed something.

Mario Carneiro (May 18 2021 at 21:09):

I don't know any filesystem that lets you have two directories with the same name in the same directory

Mac (May 18 2021 at 21:09):

That's how is currently structured on the repo: https://github.com/leanprover-community/mathlib4

Mac (May 18 2021 at 21:10):

Mario Carneiro said:

I don't know any filesystem that lets you have two directories with the same name in the same directory

Oops! That second Mathlib\ was supposed to be Mathlib.lean. Fixed it.

Mario Carneiro (May 18 2021 at 21:10):

If mathlib could compile on multiple versions of lean it wouldn't be as bad, but lean 3 at least did not make this easy at all, so the upgrades have to proceed in lock step and that is much easier when everything is in one repo and you can simultaneously change multiple projects in one PR.

Mac (May 18 2021 at 21:12):

Mario Carneiro said:

If mathlib could compile on multiple versions of lean it wouldn't be as bad

I don't know what you mean here / what you are getting at.

Mario Carneiro (May 18 2021 at 21:13):

Mathlib generally only compiles on one version of lean

Eric Wieser (May 18 2021 at 21:13):

Mac said:

I also think the mathematically oriented programming community has a unfortunate love for the concept of monorepos (ex. numpy) that is not shared by (and is, in fact, often diametrically opposed to the view of) the computer science oriented programming community.

Some comments with my numpy maintainer hat on: I wouldn't consider it a mono-repo, but numpy is admittedly large. Some things worth noting about it:

  • It was deemed hard to manage as part of the Python standard library (presumably for exactly the same reason as mathlib is separate from lean 3), even though many language features were introduced specifically for numpy-like usage
  • From what I remember, it actually used to be part of scipy, and was split out aeons ago, long before I earned my hat.
  • Now that the repos are separate, scipy now lags four releases (around 2 years!) behind numpy. At this point cross-project refactors are almost impossible.
  • Numpy has accumulated a handful of submodules (np.lib.rec_functions, np.char, np.matrixlib) which are somewhere between hacks and a bad idea, and definitely would have been better as a standalone package; but it's decades too late to remove them now without breaking old research code.

Mario Carneiro (May 18 2021 at 21:13):

because it's very difficult to polyglot over the existence of a definition in lean core, say

Mac (May 18 2021 at 21:15):

Mario Carneiro said:

Mathlib generally only compiles on one version of lean

Yes, okay. What does that have to do with anything?

Mac (May 18 2021 at 21:17):

Eric Wieser said:

  • Numpy has accumulated a handful of submodules (np.lib.rec_functions, np.char, np.matrixlib) which are somewhere between hacks and a bad idea, and definitely would have been better as a standalone package; but it's decades too late to remove them now without breaking old research code.

I would argue that this largely supports my point. Parts of mathlib that are clearly separate from doing math (though doing math may rely on them) should be in separate packages.

Mario Carneiro (May 18 2021 at 21:17):

If the projects weren't so tightly coupled making a PR to the upstream and downstream libraries separately wouldn't be as much of a pain

Mario Carneiro (May 18 2021 at 21:18):

But given that they are, it makes sense to retain the ability to change them simultaneously

Mac (May 18 2021 at 21:19):

Mario Carneiro said:

If the projects weren't so tightly coupled making a PR to the upstream and downstream libraries separately wouldn't be as much of a pain

I am confused though, when using submodules the downstream library doesn't have to reflect the public upstream library. You could do a PR to the mathlib repo that updates mathlib and its fork of stdlib. And then make a separate PR from that fork that updates stdlib (after the mathlib one is merged).

Mac (May 18 2021 at 21:22):

Workflow:
Update my fork of mathlib w/ forked stdlib -> push it -> make PR to mathlib w/ forked -> merged -> make PR from forked stdlib to main stdlib -> merged.

Mario Carneiro (May 18 2021 at 21:22):

Yes, solutions exist. They are higher friction for contributors.

Mario Carneiro (May 18 2021 at 21:23):

We could do that, but I think that right now getting a usable project structure is more important than solving the packaging situation for end users because there are still many open questions there

Mac (May 18 2021 at 21:23):

How? it is the same amount of friction as exist currently:
mathlib (w/ forked stdlib) -> push it -> make PR to mathlib (w/ forked stdlib)

The maintainers of mathlib could periodically make PRs to stdlib and not burden the contributors with that.

Mac (May 18 2021 at 21:23):

Just as the Lean core does with updated stage0. The developers do that periodically after incorporating many changes, it is not a burden of the contributor.

Mario Carneiro (May 18 2021 at 21:24):

The lean devs will be the first ones to say that stage0 is a huge pain and makes external contributions delicate

Mac (May 18 2021 at 21:25):

True, but stage0 (and staging in general) also as many more complexities than submodules

Mario Carneiro (May 18 2021 at 21:25):

Mathlib has a lot of contributors so we want the contributor experience to be simple and easy where possible

Mac (May 18 2021 at 21:26):

I really don't think a submodule introduces that much more complexity. Though, I will admit, it does introduce some.

Mario Carneiro (May 18 2021 at 21:26):

This is not optimal for end users perhaps but I think that we will have technical solutions to the end user experience in time

Eric Wieser (May 18 2021 at 21:26):

What are you optimizing for by splitting out the submodule? Certainly not ease of contributing or refactoring. "Dependency size" seems like a somewhat artificial metric.

Mac (May 18 2021 at 21:28):

Eric Wieser said:

What are you optimizing for by splitting out the submodule?

End-users.

Mario Carneiro (May 18 2021 at 21:28):

specifically?

Eric Wieser (May 18 2021 at 21:28):

Why is having multiple dependencies easier for end users than depending on the mono-repo?

Mac (May 18 2021 at 21:28):

The goal is to not bog them down with all the math stuff when that is not what they are looking for.

Mario Carneiro (May 18 2021 at 21:29):

so "dependency size" then

Eric Wieser (May 18 2021 at 21:29):

Then they can not look at it?

Mac (May 18 2021 at 21:29):

This is especially important now as Mathlib operates at global level and thus concepts can clash.

Mario Carneiro (May 18 2021 at 21:29):

I see a path forward for solving the dependency size problem without splitting mathlib

Mario Carneiro (May 18 2021 at 21:29):

so I don't think we should assume these are irreconcilable

Eric Wieser (May 18 2021 at 21:30):

Clashing concepts don't matter if you never import the mathlib one

Eric Wieser (May 18 2021 at 21:30):

Just because you depend on a git repo doesn't mean you import every file within it

Mac (May 18 2021 at 21:32):

Eric Wieser said:

Just because you depend on a git repo doesn't mean you import every file within it

Eric Wieser said:

Clashing concepts don't matter if you never import the mathlib one

Transitive imports make it highly unlikely for the end-user to this level of fine-grain control over imports (especially considering the tight coupling between aspects of mathlib).

Mario Carneiro (May 18 2021 at 21:33):

Transitive imports make it highly unlikely for the end-user to this level of fine-grain control over imports (especially considering the tight coupling between aspects of mathlib).

This is a problem that can be solved by more fine grained imports in mathlib

Eric Wieser (May 18 2021 at 21:33):

If a "non-mathsy" file depends on a "mathsy" one, then either:

  • We should just fix it to not be the case
  • Your definition of "mathsy" isn't consistent with the maintainers' / the constraints imposed by lean

Mac (May 18 2021 at 21:33):

Mario Carneiro said:

so "dependency size" then

Modularity and separation of concerns has nothing to with "size".

Eric Wieser (May 18 2021 at 21:35):

Why does modularity matter at a dependency level though? The usual answer would be to allow development to be parallelized, but frankly mathlib doesn't yet have a high enough rate of conflicting contributions (or conflicting maintainer opinions) for that to matter.

Mac (May 18 2021 at 21:35):

Eric Wieser said:

If a "non-mathsy" file depends on a "mathsy" one, then either:

  • We should just fix it to not be the case
  • Your definition of "mathsy" isn't consistent with the maintainers' / the constraints imposed by lean

This violates the principle of encapsulation. Just because some non-mathsy concept (in implementation) depends on a mathsy one, does not mean that it should pollute the user's namespace with mathsy concept if the user just wants the non-mathsy one.

Eric Wieser (May 18 2021 at 21:36):

This feels like an #xy problem where the X is namespace pollution and the Y is make the dependencies smaller

Mac (May 18 2021 at 21:37):

Eric Wieser said:

This feels like an #xy problem where the X is namespace pollution and the Y is make the dependencies smaller

I never said the issue was Y (make dependencies smaller). In fact, I just stated it wasn't.

Mario Carneiro (May 18 2021 at 21:37):

Note that mathlib is designed so that you still get a usable lean setup even if you import every file simultaneously. We only make global notations / unnamespaced definitions for things that can reasonably be expected not to conflict with something the end user wants

Mac (May 18 2021 at 21:38):

Mario Carneiro said:

We only make global notations / unnamespaced definitions for things that can reasonably be expected not to conflict with something the end user wants

I think you will find that different end-users have different ideas what they want. This is why the principles of modularity, encapsulation, and separation of concerns are key concepts in computer science.

Mario Carneiro (May 18 2021 at 21:39):

maybe, but then those users should speak up

Mac (May 18 2021 at 21:39):

I am!

Mario Carneiro (May 18 2021 at 21:39):

I mean more specifically about problematic global definitions

Eric Wieser (May 18 2021 at 21:40):

@Mac, are you arguing for something like either a C++-style approach where even most of the stdlib is namespaced with std:: (aka, introduce a mathlib. or std. namespace), or a python-style approach where the "global namespace" only exists per file?

Mac (May 18 2021 at 21:40):

Well @Kevin Buzzard hightlighted the problems he had will teaching various concepts (such a groups) with mathlib as he wants his students to define Groups but mathlib has already stolen that object.

Mac (May 18 2021 at 21:41):

I have previously highlighted my many problems with namespace clashes due to my own interests of wanting to construct alternative logic/metaprogramming systems in Lean.

Mario Carneiro (May 18 2021 at 21:41):

Some parts of this can/should be solved with better conflict resolution on lean's part, but a lot of it is simply us deciding that it is okay to name a definition "topology" because if you have a topology it should be isomorphic to our topology definition and if not your definition is the more niche one and should be namespaced

Mac (May 18 2021 at 21:42):

Eric Wieser said:

Mac, are you arguing for something like either a C++-style approach where even most of the stdlib is namespaced with std:: (aka, introduce a mathlib. or std. namespace), or a python-style approach where the "global namespace" only exists per file?

Either works. I would suggest the former in Lean as the later would require radically altering the way the Lean core functions.

Mac (May 18 2021 at 21:43):

Mario Carneiro said:

Some parts of this can/should be solved with better conflict resolution on lean's part, but a lot of it is simply us deciding that it is okay to name a definition "topology" because if you have a topology it should be isomorphic to our topology definition and if not your definition is the more niche one and should be namespaced

Again I would like to remind everyone that namespacing things does not resolve global clashes in Lean.

Mario Carneiro (May 18 2021 at 21:43):

Some parts of this can/should be solved with better conflict resolution on lean's part

Eric Wieser (May 18 2021 at 21:43):

If we did add a mathlib. prefix, what problems would remain that dividing mathlib into submodules would solve?

Mac (May 18 2021 at 21:44):

I think throwing the burden on Lean to vastly alter its conflict resolution when some this could be resolved with a simple namespace is kind silly.

Mario Carneiro (May 18 2021 at 21:44):

the namespace solution causes a lot of knock on effects

Mario Carneiro (May 18 2021 at 21:45):

it would require a similarly radical change to lean to actually be practical

Mac (May 18 2021 at 21:45):

Mario Carneiro said:

Some parts of this can/should be solved with better conflict resolution on lean's part, but a lot of it is simply us deciding that it is okay to name a definition "topology" because if you have a topology it should be isomorphic to our topology definition and if not your definition is the more niche one and should be namespaced

Also, what happens when then end-user depends on two libraries with different implementations of "topology" that both felt their definition was worthy of the global treatment?

Eric Wieser (May 18 2021 at 21:45):

(to be clear, I'm not saying "let's just add mathlib.", I want to check whether that would address all of Mac's concerns)

Mario Carneiro (May 18 2021 at 21:46):

What happens when then end-user depends on two libraries with different implementations of "topology" that both felt their definition was worthy of the global treatment?

That's exactly why mathlib exists, so that you don't have those kinds of conflicts

Kevin Buzzard (May 18 2021 at 21:46):

Just to restate the problem I had: if you decide to redefine groups for teaching purposes but want to import tactic for simplicity then you need to work in a namespace and then you have to be a little careful with everything. But it wasn't a deal-breaker and furthermore this was in some sense rather an odd project -- it was the development of a substantial body of theory which was already in the library, something which is vanishingly rare amongst projects I've supervised. And once we had the hang of it things progressed fine -- we just had to learn the tricks like what to open.

Mario Carneiro (May 18 2021 at 21:47):

Coq did this kind of thing and the result is a huge fragmentation of the community

Mac (May 18 2021 at 21:47):

Mac said:

Also, what happens when then end-user depends on two libraries with different implementations of "topology" that both felt their definition was worthy of the global treatment?

This is also not just a theoretical concern. There have been many different suggestions for how to structure the algebraic hierarchy -- different libraries could easily form over different definitions and boom the problem arises.

Eric Wieser (May 18 2021 at 21:48):

While we're on namespaces and tactics; at least in lean3, tactic.interactive is a special namespace, and putting tactics from mathlib in mathlib.tactic.interactive wouldn't work

Mario Carneiro (May 18 2021 at 21:48):

The idea is for us to decide on one solution and implement it globally across mathlib. Multiple competing hierarchies would be a nightmare

Mac (May 18 2021 at 21:48):

Mario Carneiro said:

That's exactly why mathlib exists, so that you don't have those kinds of conflicts

Everyone is not going to agree on one approach, that just is not how people (or programmers) work.

Mario Carneiro (May 18 2021 at 21:49):

Yes, this is centralization. Decentralized libraries suffer greatly from bitrot

Mac (May 18 2021 at 21:49):

That has never worked in any programming language ever.

Mac (May 18 2021 at 21:50):

In fact, languages which tend to force to much users in one way tend to die out and be replaced by more versatile ones.

Mac (May 18 2021 at 21:50):

Same is true for libaries.

Mac (May 18 2021 at 21:51):

In fact, as I have argued before, this is the main reason for Lean's extension metaprogramming capabilities to enforce very little style on the end-user.

Mario Carneiro (May 18 2021 at 21:51):

All I can say is, it's worked pretty well for mathlib so far (frankly, a lot better than people initially expected)

Mac (May 18 2021 at 21:51):

Also, there is also @Kevin Buzzard example, which even comes up if people do agree on the definitions. They may want to redefine them for teaching purposes.

Mario Carneiro (May 18 2021 at 21:52):

Sure, and we should have mechanisms for that

Mario Carneiro (May 18 2021 at 21:52):

the current recommendation is to put your whole project in a namespace and it works pretty well

Mac (May 18 2021 at 21:53):

Mario Carneiro said:

the current recommendation is to put your whole project in a namespace and it works pretty well

That doesn't work if your library also has subnamespaces and you need to open one into the other, then a clash emerges.

Mario Carneiro (May 18 2021 at 21:53):

and that's a problem that should be solved by better control over conflict resolution in lean

Mac (May 18 2021 at 21:55):

I just don't understand why you are so against sticking mathlib in a namespace.

Mac (May 18 2021 at 21:55):

It seems like an rather easy resolution to a lot of potential problems.

Mac (May 18 2021 at 21:56):

Though, the "dependency size" and presentation problem still remains.

Mario Carneiro (May 18 2021 at 21:56):

If it were possible to put one line at the top of every file and it would have that effect I would consider it, but if you actually try it you will find out that there are many practical problems that arise

Eric Wieser (May 18 2021 at 21:57):

Here's an example problem: should mathlib define docs#multiset.prod, or is multiset "not mathsy"?

Because to do so, it needs bits of the algebraic heirarchy, and that's incompatible with a library that defines algebra differently

Mario Carneiro (May 18 2021 at 21:57):

for example, if mathlib defines Mathlib.List.foobar and you have l : List A then l.foobar won't work

Mac (May 18 2021 at 22:02):

On a more general note, I think goes back to my point about a very large gap between what mathematics focused programmers and computer science focused programmers think is proper in software development. Decentralization, modularity, encapsulation, separation of concerns are all fundamental principles in general computer science that are heavily strived for in design (take the ES module system, Rust's module system, and the many efforts at create a C++ module system as examples). Mathlib seems to take the opposite approach (which is also somewhat shared by things like Python/numpy -- which is why I think this seems to be a trend among the mathematically focused).

Mario Carneiro (May 18 2021 at 22:03):

I don't think this has anything to do with mathematics focus

Kevin Buzzard (May 18 2021 at 22:03):

@Mac speaking now with my professional number theory hat on: number theory has existed for over 2000 years and as a subject it is like a parasite. When new areas of mathematics are developed like topology or algebraic geometry or analysis or category theory, number theorists tend to find a way to use these tools to push their own ideas further. For me one of the reasons mathlib has been so successful is that it is a monorepo, relying on nothing more than core lean. Beyond some point mathematics can be regarded as surprisingly interconnected. Massot, Commelin and I tried breaking off and making a dependent perfectoid space repo but we simply didn't have the energy to keep it up to date and then it rotted.

Kevin Buzzard (May 18 2021 at 22:05):

The fact that everything was there for us all at once (topology, geometry, analysis, number theory, algebra) has been essential for the rapid progress of the Scholze project and Commelin has been working very hard to keep our repo up to date with mathlib master -- we have learnt from our mistakes. The perfectoid space repo is worth nothing any more because nobody can use it with a modern mathlib.

Mac (May 18 2021 at 22:06):

Mario Carneiro said:

I don't think this has anything to do with mathematics focus

That may be entirely true. My only point is that it is odds with the fundamentals of computer science. Such violations also seem more common among the more mathematically-included, but that my just be a false perception on my part (or it may have to do with interconnectedness of mathematics that @Kevin Buzzard mentioned).

Mario Carneiro (May 18 2021 at 22:06):

One other major difference between the big programming languages you mention and mathlib is that the number of contributors is larger by a few orders of magnitude. Mathlib structure would have to phase change if it had 100 times more PRs per day

Mario Carneiro (May 18 2021 at 22:08):

With more contributors comes more manpower to support more elaborate and multiply-centric maintenance structures

Mac (May 18 2021 at 22:09):

Mario Carneiro said:

for example, if mathlib defines Mathlib.List.foobar and you have l : List A then l.foobar won't work

Yes, this is certainly a concern (and a place I can see an argument for using the global namespace).

I would note, though, that adding fields to existing classes is still a violation of the principles I mentioned before and other languages (like Ruby with its "refinements") have figured out ways to do this without sacrificing encapsulation. For Lean, though, yes adding it to global List is probably the way to go for now.

Kevin Buzzard (May 18 2021 at 22:10):

@Mac it _is_ possible to develop your own stuff in a namespace, there are tricks to help you avoid problems with things in the global namespace -- you work in your own namespace, you define your versions there, and lean chooses them by default. It was only very rarely we had to explicitly disambiguate. As far as I can see this is a reasonable solution to your root namespace issue. As for breaking mathlib up, why mess with something which is working?

Eric Wieser (May 18 2021 at 22:11):

This namespace issue seems to come up (and be mischaracterized) repeatedly; can we record a #mwe somewhere less transient than Zulip? Would a issue against lean4 be appropriate? A wiki page somewhere?

Kevin Buzzard (May 18 2021 at 22:12):

We tried to break off perfectoid spaces and in some sense the only thing we learnt was that the project enhanced mathlib because of all the stuff that ended up in it as a result of the project

Mario Carneiro (May 18 2021 at 22:12):

How about a documentation folder in mathlib4 on implementation notes?

Kevin Buzzard (May 18 2021 at 22:13):

@Jason KY. Can you remember an explicit problem we had when trying to define Xena.group when _root_.group existed?

Mac (May 18 2021 at 22:15):

Kevin Buzzard said:

As far as I can see this is a reasonable solution to your root namespace issue.

This does not work in the following pattern, which crops up in my use case:

def foo := ..
def bar := ..

namespace MyStuff

namespace ImpA
def foo := ..
def bar := ..
end

namespace ImpB
def foo := ..
def bar := ..
end

namespace ConcernA
open ImpA (foo)
open ImpB (bar)
-- ImpA.foo and ImpB.bar now clashes with global foo/bar
end

namespace ConcernB
open ImpA (bar)
open ImpB (foo)
-- ImpA.bar and ImpB.foo now clashes with global foo/bar
end

end

Mario Carneiro (May 18 2021 at 22:17):

open (priority := high) ImpA (foo)?

Mario Carneiro (May 18 2021 at 22:17):

I don't think that exists but it would solve the problem

Mac (May 18 2021 at 22:18):

Mario Carneiro said:

One other major difference between the big programming languages you mention and mathlib is that the number of contributors is larger by a few orders of magnitude. Mathlib structure would have to phase change if it had 100 times more PRs per day

This may also be though why you have encountered so few difficulties with your centralized approach. Your small user-base makes it possible for most people to agree on things. There is not currently a significant enough number of people who disagree to create an alternative and maintain it. In languages with big userbases, this happens all the time so the centralized approach is often quickly abandoned.

Mario Carneiro (May 18 2021 at 22:18):

arguably, an open of a particular name should normally be high priority

Mario Carneiro (May 18 2021 at 22:19):

In languages with big userbases, this happens all the time so the centralized approach is often quickly abandoned.

Note that even here there is a desire for centralization when multiple libraries need to communicate or interoperate

Mario Carneiro (May 18 2021 at 22:20):

having a centralized standard library to provide the "vocabulary" for other libraries is considered valuable

Jason KY. (May 18 2021 at 22:20):

Kevin Buzzard said:

Jason KY. Can you remember an explicit problem we had when trying to define Xena.group when _root_.group existed?

I think we had a similar problem to the example Mac gave. We had groups in the mygroup namespace and so can't open anything without ambiguity. We just ignored it by not opening anything and prefixing everything with mygroup.. What Mario suggested would be great for this I think!

Mac (May 18 2021 at 22:20):

Mario Carneiro said:

arguably, an open of a particular name should normally be high priority

Sure, and I have taken up the issue with Lean as well. However, as you yourself stated, the Lean developers have a lot on there plate and don't have time for such changes, so concerns should instead be addressed to mathlib. So, here I am! :smile:

Mario Carneiro (May 18 2021 at 22:21):

If we can fix this in mathlib, we probably should

Mario Carneiro (May 18 2021 at 22:21):

for example, by overriding the open command

Mario Carneiro (May 18 2021 at 22:21):

as long as the behavior is compatible I don't see any reason not to override various bits of core syntax

Mario Carneiro (May 18 2021 at 22:22):

although it is more overall code than if we were to just modify lean

Mario Carneiro (May 18 2021 at 22:23):

still, it lets us test out changes and PR it to lean only once we are reasonably sure that it's worth it

Mario Carneiro (May 18 2021 at 22:25):

well, priority in the open command needs more than a cosmetic change, it needs OpenDecl to track priority and the priority to get used in conflict resolution; I doubt those things can be accomplished by mathlib overrides

Scott Morrison (May 18 2021 at 23:18):

(Coming to this all a bit late.)

I'm very sympathetic to the idea that the current structure and "branding" of mathlib is scaring off computer scientists, to the detriment of the everyone.

That said, I am a huge fan of the monorepo style for doing mathematics. You only have to look at the fundamentally unique successes Lean has had relative to other theorem provers (perfectoid and LTE, I'm looking at you) --- what is going on is that they are doing real modern mathematics, relying on a surprising variety of mathematical topics that don't obviously belong together. Mathlib provides a guarantee that if if two bits of mathematics exist in the library, you can use them together.

In other theorem provers, this is woefully far from the case. You can ask "do you have X?" and "do you have Y?" and receive enthusiastic responses, but when you ask "how do I import X and Y at the same time?" you tend to get a lot of explanation of why that just isn't a reasonable thing to do...

This is only possible because it's practical to do library-wide refactors, and having to synchronize commits across different repos would completely kill this.

Scott Morrison (May 18 2021 at 23:32):

Perhaps we could try a concrete experiment in Lean3. There is an obvious segmentation point in the current library: basically tactic.basic. A while back we put in a lot of work to minimise the imports of tactic.basic, and it's still pretty clean. There's no other division in the library that comes close.

What could making a sharper division here look like? Is there a setup that would allow someone to just depend on stuff below tactic.basic, but not make the PR process more painful?

Mario Carneiro (May 18 2021 at 23:35):

In a lot of ways this segmentation is easier in lean 3 than lean 4, because you can have multiple projects with overlapping module names. So we could move tactic.basic and its dependents to some other subfolder of mathlib with its own leanpkg.toml, tweak the toml files a bit, and no lean files have to change textually

Mario Carneiro (May 18 2021 at 23:38):

In lean 4, module names start with the package name, so moving things from one package to another means at least changing the import references and folder organization. Moving to another repo is not required, however; leanpkg does support multiple projects in one repository.

Mario Carneiro (May 18 2021 at 23:40):

The way leanproject works, however, you still have to check out the mathlib repository (shallowly) in order to depend on a subfolder, so you would still have the "big dependency" problem as an end user if you want to depend on the "tactic.basic + deps" project without mathlib

Mario Carneiro (May 18 2021 at 23:46):

An architecture that would support fine grained dependencies even to the extent of not pulling in unnecessary things via some cargo-like package manager designed to work with mathlib might look like this: Mathlib compiles its dependency graph into a mathlib.deps file that is served by some server. leanpkg downloads this file, computes the up-set of the files imported by the current project, and requests the .olean or .o files corresponding to these from the server and puts them in the local build/ directory. Then it compiles the project as normal.

That would allow users to freely import Mathlib.Whatever, and leanpkg will just download built files as needed, so there is no "big dependency" problem and mathlib can grow arbitrarily in any direction while still keeping the monorepo design.

Mac (May 18 2021 at 23:50):

Scott Morrison said:

That said, I am a huge fan of the monorepo style for doing mathematics. You only have to look at the fundamentally unique successes Lean has had relative to other theorem provers (perfectoid and LTE, I'm looking at you) --- what is going on is that they are doing real modern mathematics, relying on a surprising variety of mathematical topics that don't obviously belong together. Mathlib provides a guarantee that if if two bits of mathematics exist in the library, you can use them together.

In spite of all that has been said, I actually agree with this. Since all of the mathematics in mathlib is done using the same base theory (by a very feneral notion of thoery) it makes sense that all the results of said theory would be in the same repo. I would still adovocate its contents be placed in a Mathlib namespace so that end-users could perform analysis between different axiomatizations of mathematics. But I can understand that the burden to do so may be considered to great to appeal to such niche interests (at this time).

My problem is really only with the non-math std++ part of mathlib, which I feel should take the more standard CS approach (of modularity and separation of concerns) in is structure as that is the audience it is (hopefully) targeting in Lean 4.

Mario Carneiro (May 18 2021 at 23:53):

Personally I don't think lean 4 (or lean 3) is a good place for exploring alternative mathematical foundations, reverse mathematics, and other things like that. Lean has progressively taken steps away from this since lean 2 which supported HoTT and had a number of kernel flags to allow for impredicativity or not, proof irrelevance or not, inductive types or not etc

Mario Carneiro (May 18 2021 at 23:54):

If you want to do other logics they are all basically deeply embedded second class citizens

Mario Carneiro (May 18 2021 at 23:56):

I think it's okay if lean doesn't do everything, that doesn't have to be a design goal

Mac (May 18 2021 at 23:57):

Mario Carneiro said:

Personally I don't think lean 4 (or lean 3) is a good place for exploring alternative mathematical foundations, reverse mathematics, and other things like that.

You are assuming one has to explore them within Lean 4's current logical framework. With Lean 4's new metaprogramming framework, you can define entire custom DSLs and rebuild the logic from the ground up in them. I love Lean for its metaprogramming power, not really its internal logic.

Mario Carneiro (May 18 2021 at 23:57):

Lean doesn't separate them though

Mario Carneiro (May 18 2021 at 23:57):

Lean has one builtin kernel and doesn't let you swap it out for something else

Mac (May 18 2021 at 23:57):

You can force it to separate them. You don't actually need to reduce Syntax to Exor or real terms for example.

Mario Carneiro (May 18 2021 at 23:59):

I would like to see a "full conversion" proof of concept for that. I am extremely dubious that you can actually do that without having to rebuild half of lean

Mac (May 18 2021 at 23:59):

You can do all the computation at the meta level and theorem proving at the meta level. Just as you can write a kernel in C++, you can write a new kernel in Lean. You don't have to appeal to Lean's kernel

Mac (May 19 2021 at 00:02):

i.e. you could use Lean to as an interpreter to check a DSL or use write a compiler in Lean for the DSL in Lean that you could use an external check for. Or you can could your own checker in lean which you compile and use. There are many ways to do logic in Lean that don't require grounding the logic on Lean's logic system.

Mario Carneiro (May 19 2021 at 00:02):

without Expr that means that you need to rewrite all elabs, def and theorem, and an environment extension that calls your kernel. I am honestly curious if this is possible, because it would probably beat the pants off my MM1 proof assistant, but it sounds like a tremendous amount of work and I'm not even sure all the hooks to do so are available in user space

Mac (May 19 2021 at 00:03):

Environment extension are available that's what the initialize attribute is for

Mario Carneiro (May 19 2021 at 00:03):

I've had issues trying to employ it in the past

Mario Carneiro (May 19 2021 at 00:04):

Nothing I said sounds impossible from what I know, but there may be unforeseen issues which is why I mention a POC

Mac (May 19 2021 at 00:05):

I don't doubt it. But I am excited to look into the possibility. It's what I'll probably work on the next time I have a lot of free time.

Scott Morrison (May 19 2021 at 00:10):

I think not readily supporting alternative axiomatizations is a feature, not a bug. I love computer scientists, but remain dubious about constructivists. :-) (That is of course a joke --- but I think for acceptance by mathematicians, we've already seen that being bog-standard-classical has been a huge success.)

Scott Morrison (May 19 2021 at 00:12):

(Although when we meet alien mathematicians, I for one welcome our new constructivists overlords.)

Mario Carneiro (May 19 2021 at 00:12):

I think it a bit unfortunate that "constructivist" is what most people around here think of / are exposed to when talking about alternate axiomatizations. I would like to see more stuff like modal logic, linear logic, exponential function arithmetic, ACA0, that kind of stuff

Mac (May 19 2021 at 00:12):

Scott Morrison said:

That is of course a joke --- but I think for acceptance by mathematicians, we've already seen that being bog-standard-classical has been a huge success.

Hence why I am a computer scientist, not a mathematician! ;)

Mac (May 19 2021 at 00:13):

Mario Carneiro said:

I think it a bit unfortunate that "constructivist" is what most people around here think of / are exposed to when talking about alternate axiomatizations. I would like to see more stuff like modal logic, linear logic, exponential function arithmetic, ACA0, that kind of stuff

There is also nonstandard analysis, which I am quite fond of.

Mario Carneiro (May 19 2021 at 00:14):

You can do nonstandard analysis without changing axioms though, there is even a definition of hyperreals in mathlib

Mac (May 19 2021 at 00:16):

True (though I would not qualify it as bog-standard). There are also ways of doing it with different axiomizations though.

Mario Carneiro (May 19 2021 at 00:19):

I think the most useful application of nonstandard analysis is using nonstandard analysis methods to construct theorems in regular analysis; for this you actually want to be working in some kind of facade or deep embedding translation over a standard mathematical setup, and the key is to have a nice input method that hides all that

Mac (May 19 2021 at 00:20):

Scott Morrison said:

I think not readily supporting alternative axiomatizations is a feature, not a bug. I love computer scientists, but remain dubious about constructivists. :-) (That is of course a joke --- but I think for acceptance by mathematicians, we've already seen that being bog-standard-classical has been a huge success.)

This also touches on the point I was getting at early. I think people who fall firmly in computer scientist camp tend to like (and desire) a million different ways of doing things (hence the plethora of programming languages and frameworks), whereas those more mathematically oriented are much fonder of working off a tried and true base/standard (hence the continuing survival of classical logic).

Mac (May 19 2021 at 00:21):

Then again, I may just be asserting patterns where there are none.

Mario Carneiro (May 19 2021 at 00:23):

I think people who fall firmly in computer scientist camp tend to like (and desire) a million different ways of doing things (hence the plethora of programming languages and frameworks)

I think a plethora of frameworks indicates that lots of people want one way to do things but they all disagree on what that one way is

Mac (May 19 2021 at 00:23):

same diff?

Mario Carneiro (May 19 2021 at 00:24):

Every individual framework is unified in vision and generally has take-over-the-world ambitions

Mario Carneiro (May 19 2021 at 00:24):

Mathlib is one such framework

Mario Carneiro (May 19 2021 at 00:25):

If you want to do your own thing, okay, but don't mind us while we take over the rest of the world

Mac (May 19 2021 at 00:25):

Yes, true, computer scientists tend to think they have the one answer but they have worked with so many other people also have think they have the one answer (and that answer is different than theirs) tends to result in a few deciding to build systems that support as many different one answers as they can. XD

Mario Carneiro (May 19 2021 at 00:27):

Once several competing libraries exist, of course you want to be able to mediate between them and this leads to increasing abstraction. But this is a suboptimal situation for a number of reasons, and we're trying to keep things unified where possible

Mac (May 19 2021 at 00:27):

I am a participant in both these camps. I have what I think to be my one answer, but I also try to encourage and create systems that support many possible (including my own). ;)

Mario Carneiro (May 19 2021 at 00:28):

When there are good reasons to have multiple versions of a thing, mathlib has multiple versions of the thing and relates them

Mario Carneiro (May 19 2021 at 00:29):

when multiple versions of a thing are done in independent libraries, the "relating" part gets lost in transmission

Mac (May 19 2021 at 00:30):

For me, math/logic (and computer science) is a creative enterprise, my goal is express what I want to express using the tools given to me (and creating new tools if the current tools are insufficient).

Mario Carneiro (May 19 2021 at 00:30):

My position here is to try to understand your problem to see whether mathlib can accomodate a solution to it so that we don't need to have multiple libraries and the aforementioned abstraction

Mario Carneiro (May 19 2021 at 00:31):

For me, math/logic (and computer science) is a creative enterprise, my goal is express what I want to express using the tools given to me (and creating new tools if the current tools are insufficient).

I don't think anyone will disagree with that

Mac (May 19 2021 at 00:33):

Again, my interest, in particular, at the moment is mostly in the stdlib rather than mathlib part of mathlib, i.e. can I get some tools to help out with my metaprogramming and metalogic explorations without cluttering my namespace with names and syntax clashing with concepts for which I wish to explore alternate definitions.

Mac (May 19 2021 at 00:34):

I already get enough clashes from Lean itself, I would kind of like to avoid more.

Mario Carneiro (May 19 2021 at 00:34):

You could just not put things in the global namespace unprotected; mathlib has microcosms of this issue all over the place

Mario Carneiro (May 19 2021 at 00:35):

just don't call your fancy and _root_.And and it's fine

Mario Carneiro (May 19 2021 at 00:35):

you can use typeclasses to abstract over syntax too

Mario Carneiro (May 19 2021 at 00:36):

lots of mathlib types define their own + or whatever, see the surreals for example

Mac (May 19 2021 at 00:36):

but if I call it And I get clashes unless I wish to write out MyStuff.And which I find aesthetically (and typing-wise) burdensome.

Mario Carneiro (May 19 2021 at 00:36):

Then don't call it And

Mac (May 19 2021 at 00:36):

Bur it is And!

Mac (May 19 2021 at 00:37):

I could also make the same argument for mathlib, i.e., don't call it And call it MAnd.

Mario Carneiro (May 19 2021 at 00:37):

No, And already exists and means the one that comes with lean. Even if you could solve the naming clash issue, it's confusing for readers to not make them distinct

Mario Carneiro (May 19 2021 at 00:38):

Mathlib doesn't define a competing And for exactly this reason

Mario Carneiro (May 19 2021 at 00:38):

for other types it either uses typeclasses for syntax overloading or namespaced names

Mac (May 19 2021 at 00:38):

I think the namespace argument is a happy medium: you get Mathlib.And which you and I can refer to as And when Mathlib is open and I get MyStuff.And which I can refer to as And when MyStuff is open.

Mac (May 19 2021 at 00:39):

Mario Carneiro said:

No, And already exists and means the one that comes with lean. Even if you could solve the naming clash issue, it's confusing for readers to not make them distinct

I was using And as an example, not the actual case in point.

Mario Carneiro (May 19 2021 at 00:40):

So was I

Mac (May 19 2021 at 00:40):

So how does this also apply to Semigroup?

Mario Carneiro (May 19 2021 at 00:41):

Semigroup extends Mul which supplies the notation *, so you can write x * y for x y : A where Semigroup A

Mario Carneiro (May 19 2021 at 00:41):

that's what I mean by "typeclasses for syntax overloading"

Mac (May 19 2021 at 00:41):

Let's say I want to define an algebra hierarchy like Kyle's and Mathlib currently has its own notion, I want to stop clashes. What are you proposing?

Mario Carneiro (May 19 2021 at 00:42):

suppose you are doing this as part of mathlib

Mac (May 19 2021 at 00:42):

Mario Carneiro said:

Semigroup extends Mul which supplies the notation *, so you can write x * y for x y : A where Semigroup A

This assumes I want Semigroup to extend Mul and not use a different symbol like <> (which Haskell uses).

Mario Carneiro (May 19 2021 at 00:43):

That's a whole digression

Mario Carneiro (May 19 2021 at 00:43):

Suffice it to say, you extend a notation typeclass for the operator you want to use

Mac (May 19 2021 at 00:46):

I will admit that at this point I have also confused myself.

The simple point I was trying to make is a might want to define my own version (or, in fact, multiple different versions) of a mathlib concept like Semigroup and I want to call them all Semigroup (as they all different formulations of the concept). In fact, Mathlib itself has some problems with the difference between bundled and unbundled notions of Semigroup. I think a reasonable solution is to use namespaces (as that is what they are for) and I do not see the problem with this generally. Yes, there are some situations where this is not feasible (so don't do it there), but every it is, I see no problem.

Mario Carneiro (May 19 2021 at 00:47):

Different things get different names, the same thing gets the same name

Mac (May 19 2021 at 00:47):

So who gets the name of Semigroup? Or do none of them?

Mario Carneiro (May 19 2021 at 00:47):

Competing definitions for the same concept is studiously avoided in mathlib

Mario Carneiro (May 19 2021 at 00:47):

there is only one Semigroup

Mario Carneiro (May 19 2021 at 00:48):

for bundled / unbundled, there is semigroup and is_semigroup

Mario Carneiro (May 19 2021 at 00:48):

the category is Semigroup I guess

Mario Carneiro (May 19 2021 at 00:48):

the main one is semigroup

Mac (May 19 2021 at 00:49):

So there are 3 semigroups in Mathlib, and one of those will need a new name in Lean4 since all types are now upper case.

Mario Carneiro (May 19 2021 at 00:49):

But if there was another semigroup definition that was equivalent to one of the above, it would be deleted and references would go instead to the existing concept

Mario Carneiro (May 19 2021 at 00:50):

We still need to figure out how to make the lean 4 naming convention work with stuff like categories

Mario Carneiro (May 19 2021 at 00:50):

Probably SemigroupCat

Mario Carneiro (May 19 2021 at 00:51):

I'm not sure those all exist btw, but that's what they would be called if they did

Mac (May 19 2021 at 00:52):

My point is this name clashing is even a problem within mathlib. Outside of mathlib, people, like me, may disagree with what you have choosen to be the canonical semigroup definition (or just, like me, wish to explore alternatives) and want to define their own while still using bits and pieces of mathlib (as it is Lean's defacto standard lib), I don't see why they have to lose access to the right of the name of Semigroup.

Mario Carneiro (May 19 2021 at 00:53):

Why not build from within mathlib? Then that issue goes away

Mac (May 19 2021 at 00:53):

What do you mean?

Mario Carneiro (May 19 2021 at 00:54):

You want to extend semigroups with some theorem that we don't have? Use the one that exists

Mario Carneiro (May 19 2021 at 00:54):

You want to experiment with alternative definitions for semigroup? Refactor the library

Mario Carneiro (May 19 2021 at 00:55):

You don't want to refactor the library but define your own semigroup anyway? Use a namespace (and don't PR it to mathlib because we will certainly remove the duplication during review)

Mac (May 19 2021 at 00:56):

I don't understand why consumers are obligated to use a namespace but mathlib itself isn't.

Mario Carneiro (May 19 2021 at 00:56):

If you can define something compatible with mathlib's definitions you don't need a namespace

Mac (May 19 2021 at 00:57):

Why is mathlib the end-all-and be all of how things should be canonical defined?

Mario Carneiro (May 19 2021 at 00:57):

Only recalcitrant mathlib users need the namespace

Mac (May 19 2021 at 00:57):

Lol

Mario Carneiro (May 19 2021 at 00:57):

Mac said:

Why is mathlib the end-all-and be all of how things should be canonical defined?

That's the point

Mario Carneiro (May 19 2021 at 00:57):

without canonical definitions you don't get interop

Mario Carneiro (May 19 2021 at 00:58):

and all the success stories mentioned earlier are a direct consequence of this interop

Mac (May 19 2021 at 00:59):

Interop/canonicity within mathlib is fine, why does mathlib also require end-users to also concur with those canonical definitions.

Mario Carneiro (May 19 2021 at 00:59):

End users are expected to also be working on something mathlib compatible

Mac (May 19 2021 at 00:59):

Why?

Mario Carneiro (May 19 2021 at 01:00):

because otherwise they wouldn't be depending on mathlib?

Mario Carneiro (May 19 2021 at 01:00):

This is mathlib's value proposition

Mario Carneiro (May 19 2021 at 01:03):

For example all the mathematical projects like LTE, sphere-eversion, perfectoid-spaces, flypitch all build on mathlib, they don't try to redefine it (although they almost always have a "mathlib addendum" section containing work that probably should be PR'd to mathlib)

Mac (May 19 2021 at 01:03):

For one, since mathlib is the de facto standard lib, I may want mathlib for parts of the stdlib and not all want to depend on the math part at all, but am forced to anyway due to it currently being one monolith package, Second, I may be using mathlib along with many other alternative "mathlib"s and/or my own custom "mathlib" that all feel the same way, so It would be nice if they worked together without clashing.

Mac (May 19 2021 at 01:05):

However, I feel like this discussion is going nowhere, it is clear to me that you (and its seems like a decent bit of the mathlib community) and I have difference of opinion in how libraries show be presented to end-users that is not going to be resolved any time soon. I think I have my opinion and the reasons for it clear and so have you, so I don't think there is really much else to discuss on this matter.

Mario Carneiro (May 19 2021 at 01:07):

For one, since mathlib is the de facto standard lib, I may want mathlib for the stdlib and not all want to depend on the rest of it, but am forced to anyway due to it currently being one monolith package,

You can depend on only part of it as mentioned earlier. If the part you want is already too opinionated, well tough. (Or bring it up to the maintainers and try to make mathlib not clash with your usage.)

Second, I may be using mathlib along with many other alternative "mathlib"s that all feel the same way

Let's try to keep these to a minimum, because as you point out there is a significant issue with interop when they exist. We were successfully able to redirect all such effort in lean 3 into mathlib itself and I'm glad. Hopefully we can keep it up in mathlib4.

Mario Carneiro (May 19 2021 at 01:08):

Discussions at this level of abstraction are unlikely to lead to any meaningful changes. Try to come up with a more specific issue where just using mathlib without fighting the design decisions causes problems for you.

Mario Carneiro (May 19 2021 at 01:11):

We can chalk it up to a difference of opinion, but I see a user who could potentially be a mathlib contributor (and probably would be if you were working in lean 3) whose needs are not being met by mathlib (or at least seems to think so), and I would like to suss out exactly what mathlib needs to do to satisfy this use case, because I don't think your general target area is out of scope

Mario Carneiro (May 19 2021 at 01:12):

Talking about our respective opinions is unlikely to go anywhere, but talking about concrete pain points might

Mac (May 19 2021 at 02:06):

Mario Carneiro said:

We can chalk it up to a difference of opinion, but I see a user who could potentially be a mathlib contributor (and probably would be if you were working in lean 3) whose needs are not being met by mathlib (or at least seems to think so), and I would like to suss out exactly what mathlib needs to do to satisfy this use case, because I don't think your general target area is out of scope

Mario Carneiro said:

Talking about our respective opinions is unlikely to go anywhere, but talking about concrete pain points might

I appreciate the sentiment. When/if I come up with specific cases where I really want to use part of mathlib and am encountering problems, I will mention them in the channel. Regardless, thank you all for taking the time to discuss this with me. :smile:

Daniel Fabian (May 19 2021 at 09:16):

Would it be reasonable from mathlib's perspective to have some data structures and algorithm libraries separated out?

Think: List, Array, RBMap, topological sort, etc.

Especially if those are not verified, you probably can make them work well for programming and proving independently and you could prove lemmas about those in the larger mathlib, keeping the raw data structures and algorithms separate.

Mario Carneiro (May 19 2021 at 09:28):

That's already the way things are set up

Mario Carneiro (May 19 2021 at 09:28):

for the most part the raw data structures are separate from the proofs

Mario Carneiro (May 19 2021 at 09:29):

and they are in different files so that you can depend on one without the other

Mario Carneiro (May 19 2021 at 09:29):

In particular, this is necessary for data.list.defs vs data.list.basic because the proofs in data.list.basic use tactics that use definitions in data.list.defs

Mario Carneiro (May 19 2021 at 09:32):

However, that's probably not what you mean by "separated out". I would like lean 4 packaging to improve to the point that this setup, where the files are intermingled on disk and in module naming, is not penalized compared to having those files in a separate project

Daniel Fabian (May 19 2021 at 10:06):

You did get the point, yes. I was asking if we can turn them into separate packages. Or is your question deeper, asking if you can create a package ad-hoc by just what you import?

Daniel Fabian (May 19 2021 at 10:24):

@Mario Carneiro also, are you aware of another programming language having a module / package system in line with what you're envisioning?

Mario Carneiro (May 19 2021 at 18:54):

C doesn't really have a module system, but if you use make in the usual way to produce multiple executables I think this comes for free

Mario Carneiro (May 19 2021 at 18:57):

Or is your question deeper, asking if you can create a package ad-hoc by just what you import?

Yes, this is basically what I'm getting at. Lean files are separately compiled anyway so the package boundary is mostly a leanpkg concept.

Mac (May 19 2021 at 19:05):

Mario Carneiro said:

C doesn't really have a module system, but if you use make in the usual way to produce multiple executables I think this comes for free

You do realize that C's packaging system is generally not considered something to emulate? The whole point of the invention of modern module systems was to get away from C's system (and, in fact, there are many efforts within the standards committee to add/impose a modern module system on C/C++). Could it be that you are not really a big fan of the modern design? I don't mean this as a criticism, everyone can like what the like, but I am beginning to suspect this may be the case.

Mario Carneiro (May 19 2021 at 19:25):

I am not trying to imply that. But I feel that the question is coming from the wrong direction. We should try to solve the problems that arise, and borrow solutions from elsewhere where they fit, not start from some other language's model and try to map them onto our circumstances

Mario Carneiro (May 19 2021 at 19:27):

I'm a big fan of the rust model for packaging and compilation, but lean already significantly diverges from it by using per-file compilation, and furthermore the rust model is obviously not a good fit for mathlib.

Mario Carneiro (May 19 2021 at 19:28):

Haskell is also separately compiled, maybe there is something similar there

Mario Carneiro (May 19 2021 at 19:31):

I wasn't really considering C++ modules but they don't really differ from the C situation in the dimension I've been talking about. There are no file placement or naming restrictions regarding the use of C++ modules

Mario Carneiro (May 19 2021 at 19:32):

I don't dislike C++ modules at all, although I haven't seen them in production so it remains to be seen how effective they are. But C++ header files have been stupidly inefficient for years and I'm glad they finally found a workable solution

Mac (May 19 2021 at 19:37):

Mario Carneiro said:

We should try to solve the problems that arise, and borrow solutions from elsewhere where they fit, not start from some other language's model and try to map them onto our circumstances

However, it is clean that part of the future roadmap of Lean 4 is to completely overhaul its module and packaging system, as can be seen from the Projects.

Mario Carneiro said:

I'm a big fan of the rust model for packaging and compilation, but lean already significantly diverges from it by using per-file compilation, and furthermore the rust model is obviously not a good fit for mathlib.

Currently, the plan seems to be to follow in Rust footsteps and mirror Cargo based both on the leanpkg++ project and this issue: #397. So I think you may want to open a dialogue with the Lean developers about this.

Mac (May 19 2021 at 19:38):

I actually intend to post my own slight alternative to this plan in the #lean4 forum soon.

Mario Carneiro (May 19 2021 at 19:38):

By the way, it's possible to still mostly follow the footsteps of cargo here with mathlib having to set some fancy options to get what it needs. It's obviously not a normal project in several ways

Mario Carneiro (May 19 2021 at 19:41):

If we consider the rust analogy, rustc has a lot of options that are used specifically for the std crate: bootstrap stuff, stability markers, unstable macros, feature gates starting rustc_

Mario Carneiro (May 19 2021 at 19:41):

I don't think they even can use cargo and use a custom x.py wrapper around cargo

Mario Carneiro (May 19 2021 at 19:42):

I can imagine that mathlib will be in a similar situation, requiring some additional features from the package manager that aren't normally exercised by "regular" packages, and also having to wrap the package manager to do other mathlib-specific bookkeeping

Daniel Fabian (May 19 2021 at 19:45):

The module system and package system are very much up for debate. It is clear we want a better story than we have today but it's not quite clear what the target is yet. There are also multiple implementation strategies. What would certainly be helpful would be to gather some of the requirements you folks see

Mario Carneiro (May 19 2021 at 19:52):

The main requirement that I see at the moment for mathlib would be the leanpkg-side support for https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/treating.20mathlib.20as.20a.20stdlib/near/239343061 . Since the dependencies here are implicit, the manifest itself only has to declare the location of mathlib4 as normal, but perhaps should include an extra option to download/build only files that are imported, unless that becomes the default option (building on demand by default is probably more reasonable than download on demand for a normal package)

Mario Carneiro (May 19 2021 at 19:56):

Also, cargo works with crates.io (and haskell stack works with stackage). Assuming we're going to continue using github as our "package registry", mathlib becomes special in another way, namely it's the only package to deliver pre-built object files in addition to sources. Right now we have leanproject for this, but maybe the necessary features can be bundled into leanpkg

Daniel Fabian (May 19 2021 at 20:06):

I'm quite opposed to another package registry because maintaining a service is a non-trivial amount of effort. So ideally we can somehow leverage github, cachix (nix), crates.io or nuget.

Huỳnh Trần Khanh (Jun 10 2021 at 17:24):

Hmm... I understand the amount of work involved in maintaining a separate service. But how can you leverage existing services for package management? I'm genuinely curious.

Huỳnh Trần Khanh (Jun 10 2021 at 17:24):

GitHub and Cachix sound plausible because they are language agnostic but I'm not sure about Cargo.

Huỳnh Trần Khanh (Jun 10 2021 at 17:24):

Ah I mean crates.io.

Daniel Fabian (Jun 10 2021 at 17:25):

the crates.io idea was from when we thought the lean build system would be a fork of cargo.

Daniel Fabian (Jun 10 2021 at 17:26):

And how you leverage them... well, at the fundamental level, they are storage containers for packages. And those packages are usually uploaded as blobs.

Huỳnh Trần Khanh (Jun 10 2021 at 17:27):

I read that discussion already. But I couldn't really imagine how such a system would work without annoying the people involved with the Rust programming language. Like... are they fine with Lean using the crates.io registry for something completely unrelated to Rust? npm already bans packages that are not actually JavaScript packages.

Daniel Fabian (Jun 10 2021 at 17:27):

I should also qualify, that maintaining just a dumb storage layer would be quite easy using, say an azure storage account. It's the whole pretty UI, package search, etc. that's a lot of work.

Huỳnh Trần Khanh (Jun 10 2021 at 17:28):

(retracted—invalidated by next message)

Daniel Fabian (Jun 10 2021 at 17:29):

well, the crates idea was really more of a story for when you'd align quite closely with rust interop. Nuget on the other hand is language agnostic

Daniel Fabian (Jun 10 2021 at 17:29):

(albeit .net)

Daniel Fabian (Jun 10 2021 at 17:29):

I am not aware that we had in-depth conversations beyond just brainstorming options.

Joe Hendrix (Jun 10 2021 at 21:03):

I've been working on various experiments on Lean/Rust integration here in my free time (which isn't a lot at the moment).

I have a fork of cargo that is working to support "external build systems". The idea is that Cargo will delegate to an external agent how to compute various build steps such as building, fingerprinting, producing documentation, etc. It's going fairly smooth on a technical level. I think there's a lot of work that can be done to integrate the two languages if others are interested.


Last updated: Dec 20 2023 at 11:08 UTC