Zulip Chat Archive

Stream: lean4

Topic: Partitioning mathlib


Mario Carneiro (May 30 2021 at 10:44):

@Scott Morrison What are the smallest strongly connected components? (i.e. pick a point, close under all forward edges)

Mario Carneiro (May 30 2021 at 10:46):

Regarding data, I think you should just treat the top level subfiles/folders in data as their own nodes in this graph. It's kind of a "misc" directory

Scott Morrison (May 30 2021 at 11:01):

In[76]:= ConnectedComponents[t]

Out[76]= {{"algebraic_topology"}, {"geometry"}, \
{"representation_theory"}, {"probability_theory"}, {"computability"}, \
{"algebra", "algebraic_geometry", "analysis", "category_theory",
  "combinatorics", "control", "data", "deprecated", "dynamics",
  "field_theory", "group_theory", "linear_algebra", "logic",
  "number_theory", "order", "ring_theory", "system", "tactic",
  "topology", "measure_theory", "testing", "meta", "set_theory"}}

Scott Morrison (May 30 2021 at 11:02):

:-(

Scott Morrison (May 30 2021 at 11:05):

Doesn't appreciably improve if we explode data:

{{"algebraic_topology"}, {"computability"}, {"data$ordmap"}, \
{"geometry"}, {"probability_theory"}, {"data$analysis"}, \
{"data$qpf"}, {"data$pfunctor"}, {"data$fp"}, \
{"representation_theory"}, {"algebra", "algebraic_geometry",
  "analysis", "category_theory", "combinatorics", "control", "data",
  "data$complex", "data$equiv", "data$finset", "data$finsupp",
  "data$fintype", "data$int", "data$list", "data$matrix",
  "data$multiset", "data$mv_polynomial", "data$nat", "data$padics",
  "data$pnat", "data$polynomial", "data$rat", "data$real", "data$set",
   "data$zmod", "data$zsqrtd", "deprecated", "dynamics",
  "field_theory", "group_theory", "linear_algebra", "logic",
  "number_theory", "order", "ring_theory", "system", "tactic",
  "topology", "measure_theory", "data$array", "data$buffer",
  "data$dlist", "data$lazy_list", "testing", "data$bitvec",
  "data$string", "set_theory", "data$num", "data$seq", "data$stream",
  "meta", "data$option", "data$setoid", "data$sigma"}}

Johan Commelin (May 30 2021 at 12:30):

Scott Morrison said:

In[76]:= ConnectedComponents[t]

Out[76]= {{"algebraic_topology"}, {"geometry"}, \
{"representation_theory"}, {"probability_theory"}, {"computability"}, \
{"algebra", "algebraic_geometry", "analysis", "category_theory",
  "combinatorics", "control", "data", "deprecated", "dynamics",
  "field_theory", "group_theory", "linear_algebra", "logic",
  "number_theory", "order", "ring_theory", "system", "tactic",
  "topology", "measure_theory", "testing", "meta", "set_theory"}}

Note that there is nothing special about algebraic_topology, geometry, or representation_theory. The fact that these are their own SCC just means that we haven't gone very far in these directions yet.
There are quite-easy-to-state conjecturestheorems in commutative algebra that were only proven a couple of years ago using perfectoid spaces. The Breen-Deligne resolutions in homological algebra that we carefully avoided in LTE are proven to exist using some hammers from algebraic topology. Representation theory uses a lot of Lie theory and algebraic geometry. At the same time, motivic conjectures in number theory/algebraic geometry use a lot of representation theory.
This list can go on and on.

Mario Carneiro (May 30 2021 at 14:29):

Something looks odd about that result. I know that computability references data$nat for example. Perhaps we have the arrows the wrong way around?

Mario Carneiro (May 30 2021 at 14:31):

(I'm interested in a node in the graph and it's dependencies, since that represents a subcomponent that could possibly be extracted to a library. The lonely points in your analysis are just "endpoint theorems" that have no current applications outside their own field, like computability and representation theory.)

Scott Morrison (May 31 2021 at 01:02):

What I did was take the equivalence relation a ~ b iff there is a path from a to b, and a path from b to a. (I think...)

Scott Morrison (May 31 2021 at 01:05):

This is the strongly connected components. (And is invariant under reversing all arrows.)

What you suggested above (just take the closure under arrows in one direction) doesn't give a partition, but I agree is still useful. I'll compute it in a moment.

Scott Morrison (May 31 2021 at 07:50):

The smallest set of top-level (+data) directories closed under adding imports is sadly:

{"algebra", "analysis", "category_theory", "control", "data", \
"data.equiv", "data.finset", "data.finsupp", "data.fintype", \
"data.int", "data.list", "data.matrix", "data.multiset", \
"data.mv_polynomial", "data.nat", "data.option", "data.polynomial", \
"data.rat", "data.seq", "data.set", "data.zmod", "deprecated", \
"group_theory", "linear_algebra", "logic", "number_theory", "order", \
"ring_theory", "tactic", "topology", "combinatorics", "data.complex", \
"data.real", "field_theory", "measure_theory", "data.sigma", \
"set_theory", "data.stream", "data.array", "data.num", "data.pnat", \
"data.string", "data.setoid", "dynamics", "data.zsqrtd", \
"algebraic_geometry", "data.padics", "data.buffer", "data.dlist", \
"meta", "testing", "data.bitvec", "data.lazy_list", "system"}

Johan Commelin (May 31 2021 at 07:52):

Wait, I don't think I understand what you mean with "closed under adding imports".

Scott Morrison (May 31 2021 at 07:52):

Say directory A imports directory B if _some_ file in A imports _some_ file in B.

Scott Morrison (May 31 2021 at 07:53):

Say directory A transitively imports directory B is there is a chain of such...

Johan Commelin (May 31 2021 at 07:53):

Ooh, wow. What is the complement of that smallest set?

Scott Morrison (May 31 2021 at 07:53):

Claim, this is the smallest set of directories closed under "imports".

Scott Morrison (May 31 2021 at 07:54):

{"algebraic_topology", "computability", "data.analysis", "data.fp", \
"data.ordmap", "data.pfunctor", "data.qpf", "geometry", \
"probability_theory", "representation_theory"}

Scott Morrison (May 31 2021 at 07:54):

So just the "brand new" stuff plus some abandonware.

Mario Carneiro (May 31 2021 at 07:54):

If you take any point v in the graph and add all points it points to and all points they point to and so on, you get a set. You can do this for every point in the graph and many of them will be the same, and the ones that are minimal by subset order will also be SCCs (every point in the set will generate the set by this procedure)

Scott Morrison (May 31 2021 at 07:55):

The problem here is that blurring things out to directory level makes mathlib very much not a DAG.

Mario Carneiro (May 31 2021 at 07:56):

I think this is all making the point that every directory uses every other directory very well though

Mario Carneiro (May 31 2021 at 07:58):

for example with the graph A -> B <-> C you get the sets {A, B, C} (not a SCC) and {B, C} (a SCC)

Eric Wieser (May 31 2021 at 08:37):

I don't think that effect of directory blurring is surprising; if you have a lemma connecting two directories A and B, its file usually ends up in one of A or B without much thought as to which.

Mac (May 31 2021 at 15:21):

Any particular reason why dependencies are being analyzed at the directory level rather than the file level? I would imagine that each area of theory would be interconnected with others. However, I also imagine there exists some "basic" foundation that each area defines without reference to others that could be extracted out into some "prelude" that could be imported by each area that leverages it. The interconnected components would then be add latter in the main file.

For example, you could have prelude/algebra which defines all the type classes relevant to algebra and that may be imported by other things like ring_theory and then algebra proper that imports prelude/algebra along with stuff in others areas to fill in the interconnections.

Johan Commelin (May 31 2021 at 15:35):

I guess that means we'll have north of 50 pkgs. Which is probably fine. As long as they don't have to be maintained manually.

Mario Carneiro (May 31 2021 at 16:21):

Yeah I have no problem with having 50 packages if it can all be handled transparently and doesn't affect the naming of things. The problem is that naming things prelude27/algebra makes it absolutely atrocious to find anything. I really want a solution that allows the packaging to work on a level orthogonal to file layout and module naming (which is topical, not based on dependency order)

Brandon Brown (May 31 2021 at 18:17):

Is just splitting mathlib into two things still on the table, a "complib" and a "mathlib" where complib (or whatever it would be called) just contains tactics and other basic things that mathlib or any computation-focused people alike would need?

Mario Carneiro (May 31 2021 at 18:20):

quite frankly I would like to call that other lib "std"

Mario Carneiro (May 31 2021 at 18:23):

I do think there is a place for a more separate piece of mathlib which is more focused on comp sci, tactics, and the basic mathematics needed to support it, while still being batteries-included like mathlib

Mario Carneiro (May 31 2021 at 18:25):

That is at least topically separate enough that people can probably get their heads around it (although there is conceivably a good deal of overlap when it comes to the kind of stuff that ends up in data)

Mac (May 31 2021 at 18:25):

Mario Carneiro said:

Yeah I have no problem with having 50 packages if it can all be handled transparently and doesn't affect the naming of things. The problem is that naming things prelude27/algebra makes it absolutely atrocious to find anything.

I am not sure this would be a problem. For example if I have I want all of algebra I importAlgebra, if I just want algebra's prelude I import Algebra.Prelude. The prelude, if necessary, could be split into separate modules as well (ex. Algebra.PreludeAand Algebra.PreludeB -- the real modules would have more informative names -- like Classes or Basic or whatever). If said splits are sequential (i.e. PreludeB imports PreludeA) then only the relevant level would need to be imported by users. If said splits are independent than the user can imported whichever of PreludeA and/or PreludeB they need.

Mario Carneiro (May 31 2021 at 18:26):

if said splits are independent than the user can imported whichever of PreludeA or PreludeB they need.

Right, this is the problem - you won't be able to tell which one you want most of the time because the names are likely to be very difficult to make informative

Mario Carneiro (May 31 2021 at 18:29):

We already have this kind of thing, at the file level. Mathlib almost always uses precise imports to combat dependency creep. I just want to leverage that instead of adding another thing for users to deal with

Mac (May 31 2021 at 18:29):

Mario Carneiro said:

you won't be able to tell which one you want most of the time because the names are likely to be very difficult to make informative

Why? What makes it so hard to give informative names? For example, If have a module with foo, bar, and baz. I could, in the worst case, call it FooBarBaz.

Mac (May 31 2021 at 18:30):

Mario Carneiro said:

We already have this kind of thing, at the file level. Mathlib almost always uses precise imports to combat dependency creep. I just want to leverage that instead of adding another thing for users to deal with

I thought we were talking about the file level?

Mario Carneiro (May 31 2021 at 18:30):

What if it's foo and then bar and then more foo and then foo and bar together?

Mario Carneiro (May 31 2021 at 18:30):

sometimes it gets hard to name things beyond Foo1, Bar, Foo2, FooBar

Johan Commelin (May 31 2021 at 18:31):

Or what if there are 10 files that progressively treat more and more theory of foo?

Mario Carneiro (May 31 2021 at 18:31):

and then if you have a foo fact you want you have to go looking through several files

Mac (May 31 2021 at 18:31):

Mario Carneiro said:

What if it's foo and then bar and then more foo and then foo and bar together?

If there is no greater logical grouping than I really should have six modules: Foo.Basic, Bar.Basic, Foo.Extra, FooBar, Foo (Foo.Basic + Foo.Extra + FooBar), and Bar (Bar.Basic + FooBar).

Mario Carneiro (May 31 2021 at 18:32):

mathlib has this issue a lot, because that's how math works: the concepts are few but the layers of theorems keep coming. Reminder that FLT is a theorem about the natural numbers

Mario Carneiro (May 31 2021 at 18:34):

So Foo is the complete theory of foo? Does that mean that it conceptually could import literally anything in mathlib?

Mario Carneiro (May 31 2021 at 18:34):

because that means it will be unusable (in mathlib)

Mac (May 31 2021 at 18:35):

That is what I would expect, yes

Mac (May 31 2021 at 18:35):

(as an end user)

Mario Carneiro (May 31 2021 at 18:35):

Unfortunately a lot of mathlib files fall in roughly this category which is why the dependency graph is such a rats nest

Mac (May 31 2021 at 18:39):

In what category do you mean?

Mario Carneiro (May 31 2021 at 18:39):

From an end user perspective (someone importing the library), there aren't any dependency issues, you can just use whatever you like. I'm talking about the within-mathlib perspective. (When I said "users" earlier I meant contributors)

Mario Carneiro (May 31 2021 at 18:40):

Almost every new file conceptually builds on everything that came before. If you are a mathematician expanding the theory in some area you don't want to be bothered with this dependency tetris, you just import what you need

Mac (May 31 2021 at 18:41):

Mario Carneiro said:

Almost every new file conceptually builds on everything that came before. If you are a mathematician expanding the theory in some area you don't want to be bothered with this dependency tetris, you just import what you need

That's fine for prototyping, but upon actual addition to the library, optimizing imports should be done.

Mario Carneiro (May 31 2021 at 18:41):

As a maintainer, this approach doesn't bother me so much, because dependency pruning is an activity that can be dealt with separately and requires very little mathematical skill

Mac (May 31 2021 at 18:42):

Okay so we seem to be on the same page there.

Mac (May 31 2021 at 18:43):

In which case a contributor could write their new file using the complete endpoints (ex. Foo and Bar) and the later dependency pruning step could optimize the dependencies and add that file to its relevant theories (including, possibly Foo and/or Bar).

Mario Carneiro (May 31 2021 at 18:44):

However, what this means is that the actual intended position of most files in the dependency graph is not clear. Is this a high-end theorem? Something foundational? Many files mix a bit of both, and high-end theorems often try to race each other to the top

Mac (May 31 2021 at 18:45):

Maybe some tooling could be written to take a Lean file and inject its definitions into the proper place in the dependency graph based on used name?

Mario Carneiro (May 31 2021 at 18:46):

Given that, a naming/foldering convention that makes dependency height important is a distraction for most contributors. Dependency organization is important but should be centralized in a way that doesn't impact the contributor experience

Mario Carneiro (May 31 2021 at 18:46):

for example, a build file that explicitly orders everything

Mario Carneiro (May 31 2021 at 18:47):

although full dependency inference is also possible

Mario Carneiro (May 31 2021 at 18:47):

the advantage being that you never have to worry about dependency order unless you hit a cycle

Mac (May 31 2021 at 18:48):

Yeah that is what I was thinking. The idea would be contributors would all stick their contributions in a newstuff folder (or something like that) that would use the easy-to-access endpoints (ex. Algebra or Foo/Bar) and a latter dependency pruning / build step would reorder things to be reasonable (by refactor additions into Foo.Basic / Bar.Basic or whatever).

Mario Carneiro (May 31 2021 at 18:49):

We want people to organize new material though. The organization is based on topic, not dependency order, though

Mac (May 31 2021 at 18:50):

okay, so the contributions in the postulated newstuff folder (or whatever) could be organized into folders based on topic?

Mario Carneiro (May 31 2021 at 18:51):

and the other part?

Mac (May 31 2021 at 18:51):

which other part?

Mario Carneiro (May 31 2021 at 18:51):

right now all of mathlib is organized like your newstuff folder

Mario Carneiro (May 31 2021 at 18:52):

it sounds like you are trying to add an explicit transition point to a different organization scheme which is based on dependencies

Mac (May 31 2021 at 18:52):

So then the dependency pruning step involves taking those definitions and sticking them into the proper modules based on their dependency order (and topic).

Mario Carneiro (May 31 2021 at 18:53):

but then users have to know about that dependency order in order to import the file

Mac (May 31 2021 at 18:54):

Yes, that is generally how end user importing of files/libraries works? If they want to be specific about things, otherwise they also just use the easy endpoints.

Mac (May 31 2021 at 18:55):

The point in this whole effort would be for those users who specifically only want to leverage stuff at higher levels of the dependency chain.

Mac (May 31 2021 at 18:56):

It would also then help in migrating some of that stuff into a separate less mathy, more standard library.

Mario Carneiro (May 31 2021 at 18:57):

Why can't the naming just not be related to dependency order?

Mario Carneiro (May 31 2021 at 18:57):

It is strange to me that this needs to be so intertwined

Mario Carneiro (May 31 2021 at 18:57):

Just let people name things as they like

Daniel Fabian (May 31 2021 at 19:00):

When you start going to the very fine grained approach, your packages again won't make much sense. You need the packages to talk about a topic, at least roughly

Mac (May 31 2021 at 19:01):

Yes, but I think we are talking about files within packages though.

Mario Carneiro (May 31 2021 at 19:02):

If end users have the ability to make a la carte sub-packages from mathlib according to whatever they feel like importing, then that pretty much solves the problem without having to do solve the hardest problem in computer science (naming things)

Patrick Massot (May 31 2021 at 19:02):

Why don't we simply have a MacLib organized and developped by Mac the way Mac thinks is good and keep mathlib the way it works so great for everyone else?

Mac (May 31 2021 at 19:03):

Patrick Massot said:

Why don't we simply have a MacLib organized and developped by Mac the way Mac thinks is good and keep mathlib the way it works so great for everyone else?

Fyi, I wasn't the one who proposed partitioning mathlib here.

Mac (May 31 2021 at 19:03):

I was simply providing some ideas as to how that could occur.

Daniel Fabian (May 31 2021 at 19:05):

I asked about topical packages early in the thread and I got legitimate issues. The idea of splitting up must make sense in order to happen.

Daniel Fabian (May 31 2021 at 19:06):

The idea of doing ad hoc packages sounds like dead code elimination which is something we could do in the compiler domain

Mario Carneiro (May 31 2021 at 19:07):

this would be more like leanpkg domain, but that's the general idea

Daniel Fabian (May 31 2021 at 19:09):

Not really... Why bother with files. Might just as well do it on the definition level

Daniel Fabian (May 31 2021 at 19:09):

It's like dead code elimination and you choosing to re export what you want to reuse

Mario Carneiro (May 31 2021 at 19:09):

that would be even better, but it's almost science fiction level I think

Daniel Fabian (May 31 2021 at 19:10):

It's what Javascript does more of less

Mac (May 31 2021 at 19:10):

The domain of leanpkg is download/building/running packages, it has more-or-less nothing to do with how those packages are used in code.

Mario Carneiro (May 31 2021 at 19:10):

dependency management is definitely in the purview of cargo

Daniel Fabian (May 31 2021 at 19:12):

To get that fine grained dependency management that you like about mathlib, the most logical approach is a source dependency with dead code elimination

Mac (May 31 2021 at 19:13):

Daniel Fabian said:

It's like dead code elimination and you choosing to re export what you want to reuse

For the latter, this should actually be pretty possible once Lean gets it module system. The former would require retooling the compiler to keep track of which imported symbols are used and prune the unused ones from the result olean and providing that information in a way that the rebuilder (which is part of leanpkg) can figure out whether or not to rebuild the file when its dependencies change.

Daniel Fabian (May 31 2021 at 19:13):

That could totally be a way of depending on mathlib, effectively as a submodule with pruning

Mario Carneiro (May 31 2021 at 19:13):

How do you know if an import is dead though? Simon looked at this problem a while back and unfortunately the most reliable method is just to take the import line out and see if lean complains

Mario Carneiro (May 31 2021 at 19:14):

because lean imports can have so many kinds of side effects on the environment and parsing

Mac (May 31 2021 at 19:14):

Yeah, that is the major problem with automatic approach.

Daniel Fabian (May 31 2021 at 19:15):

Interesting. I think we will learn more about the problem space when we build the module system

Mario Carneiro (May 31 2021 at 19:15):

You can still do it, but it is computationally expensive enough that it's worth just taking the import lines at face value and relying on external processes to optimize them

Daniel Fabian (May 31 2021 at 19:15):

I could for instance see having an isolated environment per module or similar

Mac (May 31 2021 at 19:16):

@Daniel Fabian In fact, this also sounds similar to the discussion around changing Notation.lean and having to rebuild all of Lean.

Daniel Fabian (May 31 2021 at 19:16):

It is certainly worth thinking about

Mac (May 31 2021 at 19:17):

Mario Carneiro said:

You can still do it, but it is computationally expensive enough that it's worth just taking the import lines at face value and relying on external processes to optimize them

Alternatively, the ad-hoc package approach (within the module system) where someone just imports the entirety of Mathlib and then only manual exports select symbols would also work (and remove the need for optimizing imports).

Mac (May 31 2021 at 19:18):

ala JavaScript and Haskell

Brandon Brown (May 31 2021 at 19:19):

If mathlib becomes more narrowly math-focused, then it seems that it makes sense to keep it a monolith since a monolith is simple and many of the math-focused users are not well-versed with complexities of all these dependency/build issues and just want to be doing math, and it's easy to contribute to a monolith. Packages that are meant for writing practical programs can do the typical programming language package dichotomization.

Daniel Fabian (May 31 2021 at 19:20):

Another option might be ex post dead code elimination. After elaboration it might be much easier

Mario Carneiro (May 31 2021 at 19:20):

It is definitely easier post elaboration, but it's also "wrong", you miss out on imports that were required for the lean file to build

Mario Carneiro (May 31 2021 at 19:21):

If you only care about the output files then that's quite doable. olean files have an easily accessible explicit dependency relation

Eric Wieser (May 31 2021 at 19:23):

If lean4 compiles to C, can't the gcc linker do the dead code elimination? If we're not eliminating dead code to reduce binary sizes, what are we eliminating it for?

Mario Carneiro (May 31 2021 at 19:26):

Regarding "Packages that are meant for writing practical programs can do the typical programming language package dichotomization.", keep in mind that a big part of the draw (for me, at least) is the ability to write programs and then prove things about them. This is an anti-modular pattern though, because you need to know what a program is doing to prove it is correct. So splitting the mathy stuff from the programming stuff doesn't really work, or rather it works as long as you accept that the packages are separate but actually tightly coupled

Mario Carneiro (May 31 2021 at 19:27):

This is part of why I have argued against the pervasive use of private in lean core, because it hinders reusability and introspection from external packages like mathlib

Mac (May 31 2021 at 19:34):

Eric Wieser said:

If lean4 compiles to C, can't the gcc linker do the dead code elimination? If we're not eliminating dead code to reduce binary sizes, what are we eliminating it for?

The idea is to create ad-hoc packages that only have some subset of the functionality of mathlib. So let's say I want to create the package OpenPrivate that only has the open private command from mathlib. I could import all of mathlib export open private and its related syntax and then the dead code elimination would remove the rest of the code in mathlib so that the resulting binary library for OpenPrivate only has the code it needs to support open private.

Mario Carneiro (May 31 2021 at 19:36):

If you are building a binary, the linker should already do that

Gabriel Ebner (May 31 2021 at 19:37):

open private seems like one of those this-should-actually-be-in-core modules that could be easily split from mathlib.

Mac (May 31 2021 at 19:37):

Mario Carneiro said:

Regarding "Packages that are meant for writing practical programs can do the typical programming language package dichotomization.", keep in mind that a big part of the draw (for me, at least) is the ability to write programs and then prove things about them.

However the idea is that Lean is now going to be a general purpose language. That is, many people will write programs and libraries without necessarily proving anything about them. Then, if someone else comes along and wants to verify the contents of said libraries they can create a separate (but related) library that proves things about it. Thus, the dichotomization can still appear.

Mac (May 31 2021 at 19:38):

Mario Carneiro said:

If you are building a binary, the linker should already do that

I'm pretty sure it doesn't. In fact, I believe that resulting library will be the entirety of math lib plus whatever code you write (because transitive imports).

Mario Carneiro (May 31 2021 at 19:38):

That is, many people will write programs and libraries without necessarily proving anything about them. Then, if someone else comes along and wants to verify the contents of said libraries they can create a separate (but related) library that proves things about it.

Yes, that's exactly the situation we currently find ourselves in. The unverified package is lean core, and the someone else is mathlib

Mac (May 31 2021 at 19:39):

Yes? So? I'm confused as to what you are getting at.

Mario Carneiro (May 31 2021 at 19:39):

But the dependent proofy package needs to be able to couple to the unverified package and encapsulation mechanisms from traditional programming languages are actively hostile to this effort

Mac (May 31 2021 at 19:40):

I agree completely with your desire to remove private as concept.

Mac (May 31 2021 at 19:40):

I think, if desired, being able to get at any definition should be possible.

Mario Carneiro (May 31 2021 at 19:41):

I am vaguely uncomfortable with discussions about a module system because I expect it to be another, possibly stronger, iteration of the same idea

Daniel Fabian (May 31 2021 at 19:41):

why not do an in-between thing? private could hide intellsense, making the UX nice and clean, but you might eg do something like obj.#privateMethod to access privates?

Daniel Fabian (May 31 2021 at 19:42):

There's value to private stuff due to encapsulation, but in a pure language you don't need to worry about corrupting state.

Daniel Fabian (May 31 2021 at 19:42):

so maybe making it a UX thing is good enough?

Mario Carneiro (May 31 2021 at 19:42):

That kind of privacy is totally fine by me

Mario Carneiro (May 31 2021 at 19:42):

I think python also does something like that

Mario Carneiro (May 31 2021 at 19:42):

privacy by ugly naming convention

Mac (May 31 2021 at 19:43):

Sounds good to me as well.

Daniel Fabian (May 31 2021 at 19:43):

but in python you can shoot yourself into the foot.

Daniel Fabian (May 31 2021 at 19:43):

In a pure language, not so much...

Brandon Brown (May 31 2021 at 19:43):

In Python the "private" methods start with underscores. Not really hidden just a prefix naming convention

Mac (May 31 2021 at 19:44):

I would suggest leaving # to mean side-effecting commands though and use something else for the notation.

Mac (May 31 2021 at 19:45):

Brandon Brown said:

In Python the "private" methods start with underscores. Not really hidden just a prefix naming convention

Yeah I like the underscore / double underscore naming convention unfortunately that clashes with placeholders in Lean.

Daniel Fabian (May 31 2021 at 19:45):

sounds at least worth an RFC, we can see what others think?

Mac (May 31 2021 at 19:49):

This would also have the possible side effect of fixing #418.

Brandon Brown (May 31 2021 at 19:51):

How about obj.@private_method since the @ symbol is already used to make the implicit explicit in functions, so sort of semantically in the same ballpark

Mac (May 31 2021 at 19:53):

That's a good idea! The only qualm I have is that users may confuse it to mean explicit applying a function field.

Daniel Fabian (May 31 2021 at 19:54):

not @, please. Because private still can have implicits.

Brandon Brown (May 31 2021 at 19:55):

ah right, of course

Mario Carneiro (May 31 2021 at 19:55):

I don't think anything like this is really needed, though. We already have a convention of using fooAux or foo.impl and such for internal function names, that is generally sufficient

Mario Carneiro (May 31 2021 at 19:56):

leading underscore is also used in some places, although lean takes a harder stance here since you can't actually use it in definition names

Mac (May 31 2021 at 19:56):

But that would appear in auto-complete which we want to avoid by marking the definitions private.

Mario Carneiro (May 31 2021 at 19:57):

you could have the effect of private or an equivalent attribute be to hide it from such displays, or the tooling could pick up on the naming convention

Mac (May 31 2021 at 19:57):

The goal is to make private definitions "hidden" in some sense (ex. from auto complete) while still providing a means to access them.

Daniel Fabian (May 31 2021 at 19:58):

private serves three encapsulation purposes in a normal programming language: Hiding stuff in the namespace, preventing corruption of state, and making it impossible to depend on internal details to lower coupling.

Daniel Fabian (May 31 2021 at 19:59):

we need all 3 to still be true after we weaken the restrictions imposed by private.

Daniel Fabian (May 31 2021 at 19:59):

State corruption we get for free due to being a pure language.

Daniel Fabian (May 31 2021 at 19:59):

the other two we have to solve somehow.

Mac (May 31 2021 at 19:59):

In Lean, (2) isn't a concern (as you said), and (3) is just undesirable, because as @Mario Carneiro said proofs sometimes need to do just that.

Mac (May 31 2021 at 20:01):

so only real use is (1)

Mac (May 31 2021 at 20:01):

and since we need to violate that sometimes for the same reason as (3), even it should only hold conditionally.

Daniel Fabian (May 31 2021 at 20:02):

yeah, I see how (3) is a potential concern. We might restrict accessing privates to proofs? It feels a bit weird. But on the other hand, if someone goes through enough effort to access privates, I wouldn't care too much to break them by refactoring. (E.g. in .net you can use reflection to access privates).

Mac (May 31 2021 at 20:03):

Mario Carneiro said:

you could have the effect of private or an equivalent attribute be to hide it from such displays

The disadvantage here is that it would then always hide it (which would be annoying in proofs that rely heavily on it). The .# or whatever syntax would allow for auto-completion in such contexts.

Johan Commelin (May 31 2021 at 20:03):

local attribute [-private] foobar would fix that

Mario Carneiro (May 31 2021 at 20:04):

But on the other hand, if someone goes through enough effort to access privates, I wouldn't care too much to break them by refactoring. (E.g. in .net you can use reflection to access privates).

Exactly. open private isn't a code smell, it is explicitly opting out of stability of the upstream package. When you have dependent types, dependent packages can be a thing too

Mario Carneiro (May 31 2021 at 20:04):

it doesn't need to be easy, but it does need to be possible

Daniel Fabian (May 31 2021 at 20:05):

open private is a bit like .net reflection to access privates.

Mac (May 31 2021 at 20:05):

I think the "private" as an attribute seems reasonable. I would also probably rename it to "hidden" as that is more indicative of what is actually doing.

Mario Carneiro (May 31 2021 at 20:05):

rust has doc(hidden)

Daniel Fabian (May 31 2021 at 20:06):

the experience for writing production code needs to be natural. You can easily have a large-ish development with only very light proving, e.g. lean4 compiler itself.

Daniel Fabian (May 31 2021 at 20:06):

And that kind of thing needs to be natural, so a keyword for private stuff seems warranted.

Mac (May 31 2021 at 20:07):

seems reasonable

Daniel Fabian (May 31 2021 at 20:07):

so... long story short... open private would pretty much cover all the requirements?

Mario Carneiro (May 31 2021 at 20:07):

I think avoiding accidental use of an unstable internal implementation detail is a good reason to have private as a language concept

Mario Carneiro (May 31 2021 at 20:09):

so something like open private is probably the best UX to give it just enough syntax salt. The only thing I would change is that it shouldn't need to be open private foo in bar, which is currently needed due to the absolute hackery going on in the implementation

Mac (May 31 2021 at 20:09):

Daniel Fabian said:

so... long story short... open private would pretty much cover all the requirements?

That and export private as well (for cases where an dependent package disagrees on visibility). It is also currently defined/used in mathlib

Mario Carneiro (May 31 2021 at 20:10):

I have more misgivings about export private, because it's basically saying the upstream author was wrong about their decision

Mac (May 31 2021 at 20:11):

Which as the fallible humans we are, that can, in fact, sometimes be the case. ;)

Daniel Fabian (May 31 2021 at 20:11):

plus you can always emulate it by def foo := private.foo

Sebastian Ullrich (May 31 2021 at 20:11):

I don't mind open private as an escape hatch, but I can almost guarantee that it will not become part of the stdlib, nor will the semantics of private change. On the positive side, that means the current open private code is relatively unlikely to break in the future.

Mario Carneiro (May 31 2021 at 20:12):

I would prefer if private just kept the name as is but made it an error to directly reference

Mario Carneiro (May 31 2021 at 20:12):

rather than mangling the name and/or sticking some _hyg's in there

Mario Carneiro (May 31 2021 at 20:13):

those names are way too hard to reference, even after you have paid the "I know what I'm doing" tax

Mac (May 31 2021 at 20:13):

Daniel Fabian said:

plus you can always emulate it by def foo := private.foo

Sadly, that is semantic different from an alias like that created by export in many cases.

Sebastian Ullrich (May 31 2021 at 20:14):

Mario Carneiro said:

I would prefer if private just kept the name as is but made it an error to directly reference

That would preclude using the same name for privates in module diamonds, which would be super counterintuitive. We went through multiple designs before converging on the current one.

Mac (May 31 2021 at 20:14):

Mario Carneiro said:

those names are way too hard to reference, even after you have paid the "I know what I'm doing" tax

So hard, in fact, that it creates issue #418.

Mario Carneiro (May 31 2021 at 20:14):

Module diamonds?

Sebastian Ullrich (May 31 2021 at 20:15):

private foo in two files you import

Gabriel Ebner (May 31 2021 at 20:16):

Mario Carneiro said:

so something like open private is probably the best UX to give it just enough syntax salt. The only thing I would change is that it shouldn't need to be open private foo in bar, which is currently needed due to the absolute hackery going on in the implementation

You can also do open private something from MyModule now. Not sure if that helps you.

Mario Carneiro (May 31 2021 at 20:18):

While we're on the topic, I've seen some issues with excessive privacy around deriving instance DecidableEq and other automatic constructions

Gabriel Ebner (May 31 2021 at 20:19):

DecidableEq might not be the best example here. The type already contains everything you could prove about it.

Mario Carneiro (May 31 2021 at 20:32):

Actually it was BEq. I'll post the example in another thread


Last updated: Dec 20 2023 at 11:08 UTC