Zulip Chat Archive

Stream: general

Topic: mathlib file organization


view this post on Zulip Sean Leather (Jun 01 2018 at 08:33):

For those interested in the file organization of mathlib, I've written up a proposal for a small change that, I think, will improve things a bit. Feedback welcome!

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:39):

Interesting proposal. I have also noticed myself getting lost in the long files...

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:40):

I would also like to separate the "interface lemma's" from the "real beef", although maybe there is not always a clear line between them.

view this post on Zulip Sean Leather (Jun 01 2018 at 08:41):

I have also noticed myself getting lost in the long files...

I'm glad I'm not the only one. I think this is a first logical step to reducing the size of files. There are other steps that can be taken later, of course.

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:42):

Also, I have never read what the purpose of default.lean is. But if I inferred it correctly, then I think we should make more use of it.

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:43):

And then using more files will not increase the length of import lines.

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:43):

I've been deliberately avoiding the use of default.lean within mathlib, and I think it's good practice without as well

view this post on Zulip Sean Leather (Jun 01 2018 at 08:44):

AFAIK, the file data/list/default.lean allows you to write import data.list instead of import data.list.default. So, the default.lean file generally imports everything in the directory.

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:44):

exactly

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:44):

right

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:44):

and that's exactly what I don't want

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:44):

Why not?

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:45):

that adds a bunch of spurious dependencies in an already delicate graph

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:45):

remember that circular dependencies are bad but there isn't an obvious linear order of files

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:46):

Sure, but there is a partial order, right?

view this post on Zulip Johannes Hölzl (Jun 01 2018 at 08:46):

It should be a DAG

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:46):

Do we already have a script that generates a graphviz visualisation?

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:48):

mathlib.gif

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:49):

Right, I had a similar graph on the level of theorems and proofs for my thesis.

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:50):

But why are you scared that the graph becomes to delicate for Lean?

view this post on Zulip Sean Leather (Jun 01 2018 at 08:50):

I don't quite see the problem with default.lean. Nonetheless, I see it as just an extra and there's no actual need to have one.

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:50):

Well, we need it to keep imports manageable.

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:50):

Already it is common to have files with >8 imports

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:51):

If you want to split a 3000-line file into 5 files

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:51):

Currently, folders are organized by topic, and files are organized by dependency units

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:51):

so if there is no dependency issue and the topic is still the same, it all goes in one file

view this post on Zulip Sean Leather (Jun 01 2018 at 08:52):

At least with my proposal, once you split a file into one for definitions and one for theorems, the theorems file imports the definitions file, and you just import the theorems file. The default.lean just allows you to leave off one part of the hierarchy.

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:52):

What is a "dependency issue"?

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:52):

A is used by B, which is used by A'

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:53):

and A' wants to be in the same file as A

view this post on Zulip Johan Commelin (Jun 01 2018 at 08:53):

Ok, and there is a reason (I guess the topic) that B should not be in that file as well.

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:53):

I've thought many times about separating out definitions. But I'm not sure it's as easy as that

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:54):

in lean, definitions and theorems are all mixed up thanks to curry howard

view this post on Zulip Sean Leather (Jun 01 2018 at 08:54):

As I said, there may be exceptions, but I think there are many obvious easy examples.

view this post on Zulip Sean Leather (Jun 01 2018 at 08:55):

And I'm happy to give it a try myself.

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:55):

If you are using list as your test bed it's not representative

view this post on Zulip Sean Leather (Jun 01 2018 at 08:55):

If other people think this is a good thing.

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:55):

programming stuff in general tends to be less dependent

view this post on Zulip Sean Leather (Jun 01 2018 at 08:56):

Perhaps. But there is a lot of programming stuff in mathlib. :simple_smile:

view this post on Zulip Sean Leather (Jun 01 2018 at 08:56):

And what's the harm with doing it there?

view this post on Zulip Mario Carneiro (Jun 01 2018 at 08:58):

also, how is theorem organization handled? What to do if a definition depends on a theorem?

view this post on Zulip Sean Leather (Jun 01 2018 at 08:59):

It probably depends on the example.

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:00):

What about instances? are they definitions or theorems?

view this post on Zulip Sean Leather (Jun 01 2018 at 09:00):

To keep the proposal minimal, I don't say anything about that.

view this post on Zulip Sean Leather (Jun 01 2018 at 09:01):

This is where concrete examples help.

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:02):

rat is a field

view this post on Zulip Sean Leather (Jun 01 2018 at 09:02):

What I really want to know is whether people are, in general, in favor of having definitions separated from theorems. From my experience using mathlib, I believe I am. There are certainly wrinkles to be ironed out.

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:02):

I am worried about losing topicality

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:03):

particularly for smaller definitions that are more auxiliary or only used in a few theorems

view this post on Zulip Sean Leather (Jun 01 2018 at 09:03):

Isn't rat a structure?

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:03):

I mean a mathematical field

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:04):

That's an instance

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:04):

in rat.lean

view this post on Zulip Sean Leather (Jun 01 2018 at 09:04):

You mean instance field_rat : discrete_field ℚ?

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:04):

yeah

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:07):

@Jeremy Avigad may want to chip in on this topic

view this post on Zulip Sean Leather (Jun 01 2018 at 09:07):

It seems to me that data/rat.lean would also benefit from extracting the defs.

$ grep '^def' data/rat.lean | wc -l
       7
$ grep '^theorem' data/rat.lean | wc -l
      44
$ grep '^instance' data/rat.lean | wc -l
      16

Whether you put instances in with theorems or not is a secondary question. Since I'm not familiar with it, I don't have the answer for you.

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:08):

You can't just count them, theorems will almost always outnumber defs by a large margin

view this post on Zulip Sean Leather (Jun 01 2018 at 09:08):

That's kind of the point. :wink:

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:09):

But there is often def -> theorem -> def -> theorem dependencies that will be messed up with such a reorganization

view this post on Zulip Sean Leather (Jun 01 2018 at 09:09):

Indeed! So it needs a concrete attempt.

view this post on Zulip Johannes Hölzl (Jun 01 2018 at 09:09):

and the instances and theorems are very interdependent

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:11):

By the way, one downside of the partial order structure of files is that it's not exactly a lattice, so it's not clear where to put cross cutting theorems

view this post on Zulip Johan Commelin (Jun 01 2018 at 09:11):

With my current PR on algebraic topology, I split it up in 3 files... in this example, would you have rather had 1 file? (Question is for mathlib maintainers.)

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:11):

like if you have incomparable files A and B and you have a theorem about AB which is used by C

view this post on Zulip Sean Leather (Jun 01 2018 at 09:12):

By the way, one downside of the partial order structure of files is that it's not exactly a lattice, so it's not clear where to put cross cutting theorems

Which is one of the reasons I left that concern out of the proposal.

view this post on Zulip Johan Commelin (Jun 01 2018 at 09:15):

(Mario, in my thesis I had one theorem/def per file -- this is all LaTeX --, and it worked quite nicely. I had some LaTeX macros that hyperlinked all references to theorems and definitions. And then with a bit of bash-fu I got PDF's for every theorem/def. I wanted an HTML realisation, but it never became nice enough...)

view this post on Zulip Johan Commelin (Jun 01 2018 at 09:15):

So... that was the other end of the spectrum (-;

view this post on Zulip Johan Commelin (Jun 01 2018 at 09:15):

And it was on a smaller scale, and not formalised.

view this post on Zulip Sean Leather (Jun 01 2018 at 09:16):

Yeah, that could make sense for a thesis, but it's probably too far for a library. :wink:

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:18):

By the way, metamath is organized as set.mm, a single 20MB text file with lots of ascii headers of various levels to give it a book-like structure

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:19):

so I guess that's the extreme opposite

view this post on Zulip Sean Leather (Jun 01 2018 at 09:19):

OMG :bomb:

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:20):

One thing that I like about that is it enforces a linear dependency structure, which makes the dependency question trivial

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:22):

the downside is that it (apparently) adds a lot of spurious dependencies, more than are truly needed by the theorems

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:22):

But that's less of a problem when the entire file can be verified in less than 10 seconds

view this post on Zulip Mario Carneiro (Jun 01 2018 at 09:25):

Also metamath has a separate html view so you aren't completely reliant on text navigation, you can look at abridged views and high level views of the theorems so that you can skip over the boring stuff... Maybe something like this in lean could be helpful for you

view this post on Zulip Sean Leather (Jun 01 2018 at 09:27):

It's still useful to browse code, and I still think good code organization is a worthy goal.

view this post on Zulip Jeremy Avigad (Jun 01 2018 at 17:26):

I think it generally makes it easier to read theory files when the definitions are an integral part of theory. The definitions are needed to understand the theorems and the theorems illustrate the intended usage of the definitions. It would be annoying to have to flip back and forth between files.

view this post on Zulip Chris Hughes (Jun 01 2018 at 18:00):

How about just listing all the definitions in a comment at the top of the file?

view this post on Zulip Kevin Buzzard (Jun 01 2018 at 18:08):

And while we're at it how about listing what goes on in the file in another comment at the top of the file?

view this post on Zulip Kevin Buzzard (Jun 01 2018 at 18:09):

@Mario Carneiro what would you make of such PRs? Someone adding definitions and comments explaining what is going in various files in mathlib?

view this post on Zulip Mario Carneiro (Jun 01 2018 at 18:10):

Absolutely go ahead

view this post on Zulip Mario Carneiro (Jun 01 2018 at 18:10):

I do my best but I also have a lot of theorems I want to prove

view this post on Zulip Kevin Buzzard (Jun 01 2018 at 18:10):

theorems are overrated

view this post on Zulip Scott Morrison (Jun 01 2018 at 22:45):

@Kevin Buzzard

view this post on Zulip Kevin Buzzard (Jun 01 2018 at 22:55):

I'm into definitions

view this post on Zulip Johan Commelin (Jun 02 2018 at 19:29):

theorems are overrated — @Kevin Buzzard

I just read the following blogpost of M. Harris: https://mathematicswithoutapologies.wordpress.com/2018/06/02/is-the-tone-appropriate-is-the-mathematics-at-the-right-level/
It contains the following quote of Scholze:

“What I care most about are definitions. For one thing, humans describe mathematics through language, and, as always, we need sharp words in order to articulate our ideas clearly. (For example, for a long time, I had some idea of the concept of diamonds. But only when I came up with a good name could I really start to think about it, let alone communicate it to others. Finding the name took several months (or even a year?). Then it took another two or three years to finally write down the correct definition (among many close variants). The essential difficulty in writing “Etale cohomology of diamonds” was (by far) not giving the proofs, but finding the definitions.) But even beyond mere language, we perceive mathematical nature through the lenses given by definitions, and it is critical that the definitions put the essential points into focus.

Unfortunately, it is impossible to find the right definitions by pure thought; one needs to detect the correct problems where progress will require the isolation of a new key concept.”

view this post on Zulip Mario Carneiro (Jun 02 2018 at 19:41):

I think this is an over-mature view though; it is only in the context where theorems are commonplace that you can advocate a return to focus on definitions

view this post on Zulip Mario Carneiro (Jun 02 2018 at 19:41):

it's too easy to abuse this view to focus only on definitions at the expense of theorems

view this post on Zulip Kevin Buzzard (Jun 02 2018 at 19:42):

On the other hand there are some definitions which are much more important than theorems.

view this post on Zulip Mario Carneiro (Jun 02 2018 at 19:45):

I don't argue that definitions aren't important, essential even. They are the core of the theory, the things that theorems relate

view this post on Zulip Mario Carneiro (Jun 02 2018 at 19:46):

but to take one without the other is to see only half the picture

view this post on Zulip Johan Commelin (Jun 02 2018 at 19:47):

Maybe I like the fact that DTT muddles the distinction between the two.

view this post on Zulip Kevin Buzzard (Jun 02 2018 at 19:52):

That's right, so we should take the most important of both :-)

view this post on Zulip Kevin Buzzard (Jun 02 2018 at 19:53):

and there is surely ample evidence out there now in blogland and mathoverflow and whatever

view this post on Zulip Kevin Buzzard (Jun 02 2018 at 19:53):

that perfectoid spaces are a supremely important definition

view this post on Zulip Sean Leather (Jun 04 2018 at 08:15):

I think it generally makes it easier to read theory files when the definitions are an integral part of theory. The definitions are needed to understand the theorems and the theorems illustrate the intended usage of the definitions. It would be annoying to have to flip back and forth between files.

@Jeremy Avigad Thanks for your thoughts.

I agree with you that it is useful to see definitions and theorems together, but I find that most theorems beyond the simplest rfl involve more than one definition. Thus, when the definitions are interleaved with large chunks of theorems, I'm either flipping back and forth or viewing different parts of the same file in a split screen. Whereas with all (or as many as is reasonable) of the definitions in one file, I can view two files side-by-side, one with definitions and one with theorems involving those definitions.

To summarize, while I think it would be nice to have a definition with a nearby associated group of theorems, I think that that is already problematic in mathlib in a very practical sense because the number of theorems generally surpasses the number of definitions and makes it difficult to find them. In addition to that, I think definition discovery is a useful part of learning a theory topic and would be aided by putting the definitions in one place.

view this post on Zulip Sean Leather (Jun 04 2018 at 08:24):

How about just listing all the definitions in a comment at the top of the file?

@Chris Hughes: By “definition,” are you referring to either (a) the entire declaration (i.e. def name := term) or (b) only the name?

(a) While I grant you that we are not engaging in (“full-blown”) software engineering when writing mathlib, this is one of those big no-no's in practice. Duplication in comments is notorious for getting out of sync with code. I would strongly recommend against doing this.

(b) Listing the names may provide a minor benefit for reference. But, since theorems generally use the unfolded definition in some form, this won't be helpful when trying to understand a theorem that uses multiple definitions, because you will end up trying to find the declarations themselves. It won't solve the problem I'm trying to solve.

view this post on Zulip Andrew Ashworth (Jun 04 2018 at 09:19):

I think what you really want to do is write a doxygen extension for lean :)

view this post on Zulip Andrew Ashworth (Jun 04 2018 at 09:20):

not like other programming files don't have the same problem with separation of data structures / classes and methods...

view this post on Zulip Sean Leather (Jun 04 2018 at 09:23):

not like other programming files don't have the same problem with separation of data structures / classes and methods...

Certainly! My proposal is actually similar to a common pattern in Haskell package development: putting types in a file of their own.

view this post on Zulip Andrew Ashworth (Jun 04 2018 at 09:26):

that's a big pain in lean though since circular dependencies are hard to handle

view this post on Zulip Johan Commelin (Jun 04 2018 at 09:26):

https://www.stack.nl/~dimitri/doxygen/manual/faq.html#faq_pgm_X
Seems like Lean falls in the 3rd category...

view this post on Zulip Andrew Ashworth (Jun 04 2018 at 09:30):

it wouldn't be so bad if you wanted something quick and dirty - you just want to grab things in between def | definition, lemma, theorem, :, :=, and a newline

view this post on Zulip Andrew Ashworth (Jun 04 2018 at 09:33):

but I am not volunteering to do this :) however it seems more likely to happen than changing how mathlib is organized

view this post on Zulip Andrew Ashworth (Jun 04 2018 at 09:35):

it seems straightfoward - famous last words of every project I've ever estimated ever

view this post on Zulip Patrick Massot (Jun 04 2018 at 11:12):

I know I'm always writing the same thing, but I think we shouldn't forget that some people already developed huge libraries in a language very close to Lean, and have a look at how they handle this. For the comments at top of files see https://github.com/math-comp/math-comp/blob/master/mathcomp/algebra/vector.v (this is random file from math-comp, they all look the same)

view this post on Zulip Patrick Massot (Jun 04 2018 at 11:13):

It also seems they keep the definitions and lemmas in the same file

view this post on Zulip Patrick Massot (Jun 04 2018 at 11:13):

@Assia Mahboubi will probably tell us much more

view this post on Zulip Assia Mahboubi (Jun 04 2018 at 11:39):

Hi @Patrick Massot . I read the thread briefly, so I might have missed the point of the debate. I do not see the advantage of having definitions and theorems in separated files. Sometimes, the difference between a definition and a theorem is quite thin. Also, what makes me believe that an object in the library is what I am looking for is not only its name (and not even only its definition) but also the companion lemmas, examples, etc. So it would not ease my inspection. Last, the definition of a list in Lean is probably no going to change, but for more complex objects, definitions often evolve during the course of the development. "Testing" the subsequent impact of the changes on notations, lemmas, theorem, etc. is easier if this content is not too far I would say. However, part of the answer might be of a technological nature. Even if I am flatted that people like @Patrick Massot want to hear from my experience, you Lean people might benefit from tools that change the game.

view this post on Zulip Sebastian Ullrich (Jun 04 2018 at 12:49):

One more argument in favor of not separating definitions and lemmas: You can make uninteresting helper definitions privateand still prove things about them

view this post on Zulip Sean Leather (Jun 04 2018 at 12:50):

That's really only an argument for not separating those definitions. :wink: (But it is a fine argument in any case.)


Last updated: May 16 2021 at 21:11 UTC