Zulip Chat Archive

Stream: general

Topic: mathlib file organization


Sean Leather (Jun 01 2018 at 08:33):

For those interested in the file organization of mathlib, I've written up a proposal for a small change that, I think, will improve things a bit. Feedback welcome!

Johan Commelin (Jun 01 2018 at 08:39):

Interesting proposal. I have also noticed myself getting lost in the long files...

Johan Commelin (Jun 01 2018 at 08:40):

I would also like to separate the "interface lemma's" from the "real beef", although maybe there is not always a clear line between them.

Sean Leather (Jun 01 2018 at 08:41):

I have also noticed myself getting lost in the long files...

I'm glad I'm not the only one. I think this is a first logical step to reducing the size of files. There are other steps that can be taken later, of course.

Johan Commelin (Jun 01 2018 at 08:42):

Also, I have never read what the purpose of default.lean is. But if I inferred it correctly, then I think we should make more use of it.

Johan Commelin (Jun 01 2018 at 08:43):

And then using more files will not increase the length of import lines.

Mario Carneiro (Jun 01 2018 at 08:43):

I've been deliberately avoiding the use of default.lean within mathlib, and I think it's good practice without as well

Sean Leather (Jun 01 2018 at 08:44):

AFAIK, the file data/list/default.lean allows you to write import data.list instead of import data.list.default. So, the default.lean file generally imports everything in the directory.

Johan Commelin (Jun 01 2018 at 08:44):

exactly

Mario Carneiro (Jun 01 2018 at 08:44):

right

Mario Carneiro (Jun 01 2018 at 08:44):

and that's exactly what I don't want

Johan Commelin (Jun 01 2018 at 08:44):

Why not?

Mario Carneiro (Jun 01 2018 at 08:45):

that adds a bunch of spurious dependencies in an already delicate graph

Mario Carneiro (Jun 01 2018 at 08:45):

remember that circular dependencies are bad but there isn't an obvious linear order of files

Johan Commelin (Jun 01 2018 at 08:46):

Sure, but there is a partial order, right?

Johannes Hölzl (Jun 01 2018 at 08:46):

It should be a DAG

Johan Commelin (Jun 01 2018 at 08:46):

Do we already have a script that generates a graphviz visualisation?

Mario Carneiro (Jun 01 2018 at 08:48):

mathlib.gif

Johan Commelin (Jun 01 2018 at 08:49):

Right, I had a similar graph on the level of theorems and proofs for my thesis.

Johan Commelin (Jun 01 2018 at 08:50):

But why are you scared that the graph becomes to delicate for Lean?

Sean Leather (Jun 01 2018 at 08:50):

I don't quite see the problem with default.lean. Nonetheless, I see it as just an extra and there's no actual need to have one.

Johan Commelin (Jun 01 2018 at 08:50):

Well, we need it to keep imports manageable.

Johan Commelin (Jun 01 2018 at 08:50):

Already it is common to have files with >8 imports

Johan Commelin (Jun 01 2018 at 08:51):

If you want to split a 3000-line file into 5 files

Mario Carneiro (Jun 01 2018 at 08:51):

Currently, folders are organized by topic, and files are organized by dependency units

Mario Carneiro (Jun 01 2018 at 08:51):

so if there is no dependency issue and the topic is still the same, it all goes in one file

Sean Leather (Jun 01 2018 at 08:52):

At least with my proposal, once you split a file into one for definitions and one for theorems, the theorems file imports the definitions file, and you just import the theorems file. The default.lean just allows you to leave off one part of the hierarchy.

Johan Commelin (Jun 01 2018 at 08:52):

What is a "dependency issue"?

Mario Carneiro (Jun 01 2018 at 08:52):

A is used by B, which is used by A'

Mario Carneiro (Jun 01 2018 at 08:53):

and A' wants to be in the same file as A

Johan Commelin (Jun 01 2018 at 08:53):

Ok, and there is a reason (I guess the topic) that B should not be in that file as well.

Mario Carneiro (Jun 01 2018 at 08:53):

I've thought many times about separating out definitions. But I'm not sure it's as easy as that

Mario Carneiro (Jun 01 2018 at 08:54):

in lean, definitions and theorems are all mixed up thanks to curry howard

Sean Leather (Jun 01 2018 at 08:54):

As I said, there may be exceptions, but I think there are many obvious easy examples.

Sean Leather (Jun 01 2018 at 08:55):

And I'm happy to give it a try myself.

Mario Carneiro (Jun 01 2018 at 08:55):

If you are using list as your test bed it's not representative

Sean Leather (Jun 01 2018 at 08:55):

If other people think this is a good thing.

Mario Carneiro (Jun 01 2018 at 08:55):

programming stuff in general tends to be less dependent

Sean Leather (Jun 01 2018 at 08:56):

Perhaps. But there is a lot of programming stuff in mathlib. :simple_smile:

Sean Leather (Jun 01 2018 at 08:56):

And what's the harm with doing it there?

Mario Carneiro (Jun 01 2018 at 08:58):

also, how is theorem organization handled? What to do if a definition depends on a theorem?

Sean Leather (Jun 01 2018 at 08:59):

It probably depends on the example.

Mario Carneiro (Jun 01 2018 at 09:00):

What about instances? are they definitions or theorems?

Sean Leather (Jun 01 2018 at 09:00):

To keep the proposal minimal, I don't say anything about that.

Sean Leather (Jun 01 2018 at 09:01):

This is where concrete examples help.

Mario Carneiro (Jun 01 2018 at 09:02):

rat is a field

Sean Leather (Jun 01 2018 at 09:02):

What I really want to know is whether people are, in general, in favor of having definitions separated from theorems. From my experience using mathlib, I believe I am. There are certainly wrinkles to be ironed out.

Mario Carneiro (Jun 01 2018 at 09:02):

I am worried about losing topicality

Mario Carneiro (Jun 01 2018 at 09:03):

particularly for smaller definitions that are more auxiliary or only used in a few theorems

Sean Leather (Jun 01 2018 at 09:03):

Isn't rat a structure?

Mario Carneiro (Jun 01 2018 at 09:03):

I mean a mathematical field

Mario Carneiro (Jun 01 2018 at 09:04):

That's an instance

Mario Carneiro (Jun 01 2018 at 09:04):

in rat.lean

Sean Leather (Jun 01 2018 at 09:04):

You mean instance field_rat : discrete_field ℚ?

Mario Carneiro (Jun 01 2018 at 09:04):

yeah

Mario Carneiro (Jun 01 2018 at 09:07):

@Jeremy Avigad may want to chip in on this topic

Sean Leather (Jun 01 2018 at 09:07):

It seems to me that data/rat.lean would also benefit from extracting the defs.

$ grep '^def' data/rat.lean | wc -l
       7
$ grep '^theorem' data/rat.lean | wc -l
      44
$ grep '^instance' data/rat.lean | wc -l
      16

Whether you put instances in with theorems or not is a secondary question. Since I'm not familiar with it, I don't have the answer for you.

Mario Carneiro (Jun 01 2018 at 09:08):

You can't just count them, theorems will almost always outnumber defs by a large margin

Sean Leather (Jun 01 2018 at 09:08):

That's kind of the point. :wink:

Mario Carneiro (Jun 01 2018 at 09:09):

But there is often def -> theorem -> def -> theorem dependencies that will be messed up with such a reorganization

Sean Leather (Jun 01 2018 at 09:09):

Indeed! So it needs a concrete attempt.

Johannes Hölzl (Jun 01 2018 at 09:09):

and the instances and theorems are very interdependent

Mario Carneiro (Jun 01 2018 at 09:11):

By the way, one downside of the partial order structure of files is that it's not exactly a lattice, so it's not clear where to put cross cutting theorems

Johan Commelin (Jun 01 2018 at 09:11):

With my current PR on algebraic topology, I split it up in 3 files... in this example, would you have rather had 1 file? (Question is for mathlib maintainers.)

Mario Carneiro (Jun 01 2018 at 09:11):

like if you have incomparable files A and B and you have a theorem about AB which is used by C

Sean Leather (Jun 01 2018 at 09:12):

By the way, one downside of the partial order structure of files is that it's not exactly a lattice, so it's not clear where to put cross cutting theorems

Which is one of the reasons I left that concern out of the proposal.

Johan Commelin (Jun 01 2018 at 09:15):

(Mario, in my thesis I had one theorem/def per file -- this is all LaTeX --, and it worked quite nicely. I had some LaTeX macros that hyperlinked all references to theorems and definitions. And then with a bit of bash-fu I got PDF's for every theorem/def. I wanted an HTML realisation, but it never became nice enough...)

Johan Commelin (Jun 01 2018 at 09:15):

So... that was the other end of the spectrum (-;

Johan Commelin (Jun 01 2018 at 09:15):

And it was on a smaller scale, and not formalised.

Sean Leather (Jun 01 2018 at 09:16):

Yeah, that could make sense for a thesis, but it's probably too far for a library. :wink:

Mario Carneiro (Jun 01 2018 at 09:18):

By the way, metamath is organized as set.mm, a single 20MB text file with lots of ascii headers of various levels to give it a book-like structure

Mario Carneiro (Jun 01 2018 at 09:19):

so I guess that's the extreme opposite

Sean Leather (Jun 01 2018 at 09:19):

OMG :bomb:

Mario Carneiro (Jun 01 2018 at 09:20):

One thing that I like about that is it enforces a linear dependency structure, which makes the dependency question trivial

Mario Carneiro (Jun 01 2018 at 09:22):

the downside is that it (apparently) adds a lot of spurious dependencies, more than are truly needed by the theorems

Mario Carneiro (Jun 01 2018 at 09:22):

But that's less of a problem when the entire file can be verified in less than 10 seconds

Mario Carneiro (Jun 01 2018 at 09:25):

Also metamath has a separate html view so you aren't completely reliant on text navigation, you can look at abridged views and high level views of the theorems so that you can skip over the boring stuff... Maybe something like this in lean could be helpful for you

Sean Leather (Jun 01 2018 at 09:27):

It's still useful to browse code, and I still think good code organization is a worthy goal.

Jeremy Avigad (Jun 01 2018 at 17:26):

I think it generally makes it easier to read theory files when the definitions are an integral part of theory. The definitions are needed to understand the theorems and the theorems illustrate the intended usage of the definitions. It would be annoying to have to flip back and forth between files.

Chris Hughes (Jun 01 2018 at 18:00):

How about just listing all the definitions in a comment at the top of the file?

Kevin Buzzard (Jun 01 2018 at 18:08):

And while we're at it how about listing what goes on in the file in another comment at the top of the file?

Kevin Buzzard (Jun 01 2018 at 18:09):

@Mario Carneiro what would you make of such PRs? Someone adding definitions and comments explaining what is going in various files in mathlib?

Mario Carneiro (Jun 01 2018 at 18:10):

Absolutely go ahead

Mario Carneiro (Jun 01 2018 at 18:10):

I do my best but I also have a lot of theorems I want to prove

Kevin Buzzard (Jun 01 2018 at 18:10):

theorems are overrated

Scott Morrison (Jun 01 2018 at 22:45):

@Kevin Buzzard

Kevin Buzzard (Jun 01 2018 at 22:55):

I'm into definitions

Johan Commelin (Jun 02 2018 at 19:29):

theorems are overrated — @Kevin Buzzard

I just read the following blogpost of M. Harris: https://mathematicswithoutapologies.wordpress.com/2018/06/02/is-the-tone-appropriate-is-the-mathematics-at-the-right-level/
It contains the following quote of Scholze:

“What I care most about are definitions. For one thing, humans describe mathematics through language, and, as always, we need sharp words in order to articulate our ideas clearly. (For example, for a long time, I had some idea of the concept of diamonds. But only when I came up with a good name could I really start to think about it, let alone communicate it to others. Finding the name took several months (or even a year?). Then it took another two or three years to finally write down the correct definition (among many close variants). The essential difficulty in writing “Etale cohomology of diamonds” was (by far) not giving the proofs, but finding the definitions.) But even beyond mere language, we perceive mathematical nature through the lenses given by definitions, and it is critical that the definitions put the essential points into focus.

Unfortunately, it is impossible to find the right definitions by pure thought; one needs to detect the correct problems where progress will require the isolation of a new key concept.”

Mario Carneiro (Jun 02 2018 at 19:41):

I think this is an over-mature view though; it is only in the context where theorems are commonplace that you can advocate a return to focus on definitions

Mario Carneiro (Jun 02 2018 at 19:41):

it's too easy to abuse this view to focus only on definitions at the expense of theorems

Kevin Buzzard (Jun 02 2018 at 19:42):

On the other hand there are some definitions which are much more important than theorems.

Mario Carneiro (Jun 02 2018 at 19:45):

I don't argue that definitions aren't important, essential even. They are the core of the theory, the things that theorems relate

Mario Carneiro (Jun 02 2018 at 19:46):

but to take one without the other is to see only half the picture

Johan Commelin (Jun 02 2018 at 19:47):

Maybe I like the fact that DTT muddles the distinction between the two.

Kevin Buzzard (Jun 02 2018 at 19:52):

That's right, so we should take the most important of both :-)

Kevin Buzzard (Jun 02 2018 at 19:53):

and there is surely ample evidence out there now in blogland and mathoverflow and whatever

Kevin Buzzard (Jun 02 2018 at 19:53):

that perfectoid spaces are a supremely important definition

Sean Leather (Jun 04 2018 at 08:15):

I think it generally makes it easier to read theory files when the definitions are an integral part of theory. The definitions are needed to understand the theorems and the theorems illustrate the intended usage of the definitions. It would be annoying to have to flip back and forth between files.

@Jeremy Avigad Thanks for your thoughts.

I agree with you that it is useful to see definitions and theorems together, but I find that most theorems beyond the simplest rfl involve more than one definition. Thus, when the definitions are interleaved with large chunks of theorems, I'm either flipping back and forth or viewing different parts of the same file in a split screen. Whereas with all (or as many as is reasonable) of the definitions in one file, I can view two files side-by-side, one with definitions and one with theorems involving those definitions.

To summarize, while I think it would be nice to have a definition with a nearby associated group of theorems, I think that that is already problematic in mathlib in a very practical sense because the number of theorems generally surpasses the number of definitions and makes it difficult to find them. In addition to that, I think definition discovery is a useful part of learning a theory topic and would be aided by putting the definitions in one place.

Sean Leather (Jun 04 2018 at 08:24):

How about just listing all the definitions in a comment at the top of the file?

@Chris Hughes: By “definition,” are you referring to either (a) the entire declaration (i.e. def name := term) or (b) only the name?

(a) While I grant you that we are not engaging in (“full-blown”) software engineering when writing mathlib, this is one of those big no-no's in practice. Duplication in comments is notorious for getting out of sync with code. I would strongly recommend against doing this.

(b) Listing the names may provide a minor benefit for reference. But, since theorems generally use the unfolded definition in some form, this won't be helpful when trying to understand a theorem that uses multiple definitions, because you will end up trying to find the declarations themselves. It won't solve the problem I'm trying to solve.

Andrew Ashworth (Jun 04 2018 at 09:19):

I think what you really want to do is write a doxygen extension for lean :)

Andrew Ashworth (Jun 04 2018 at 09:20):

not like other programming files don't have the same problem with separation of data structures / classes and methods...

Sean Leather (Jun 04 2018 at 09:23):

not like other programming files don't have the same problem with separation of data structures / classes and methods...

Certainly! My proposal is actually similar to a common pattern in Haskell package development: putting types in a file of their own.

Andrew Ashworth (Jun 04 2018 at 09:26):

that's a big pain in lean though since circular dependencies are hard to handle

Johan Commelin (Jun 04 2018 at 09:26):

https://www.stack.nl/~dimitri/doxygen/manual/faq.html#faq_pgm_X
Seems like Lean falls in the 3rd category...

Andrew Ashworth (Jun 04 2018 at 09:30):

it wouldn't be so bad if you wanted something quick and dirty - you just want to grab things in between def | definition, lemma, theorem, :, :=, and a newline

Andrew Ashworth (Jun 04 2018 at 09:33):

but I am not volunteering to do this :) however it seems more likely to happen than changing how mathlib is organized

Andrew Ashworth (Jun 04 2018 at 09:35):

it seems straightfoward - famous last words of every project I've ever estimated ever

Patrick Massot (Jun 04 2018 at 11:12):

I know I'm always writing the same thing, but I think we shouldn't forget that some people already developed huge libraries in a language very close to Lean, and have a look at how they handle this. For the comments at top of files see https://github.com/math-comp/math-comp/blob/master/mathcomp/algebra/vector.v (this is random file from math-comp, they all look the same)

Patrick Massot (Jun 04 2018 at 11:13):

It also seems they keep the definitions and lemmas in the same file

Patrick Massot (Jun 04 2018 at 11:13):

@Assia Mahboubi will probably tell us much more

Assia Mahboubi (Jun 04 2018 at 11:39):

Hi @Patrick Massot . I read the thread briefly, so I might have missed the point of the debate. I do not see the advantage of having definitions and theorems in separated files. Sometimes, the difference between a definition and a theorem is quite thin. Also, what makes me believe that an object in the library is what I am looking for is not only its name (and not even only its definition) but also the companion lemmas, examples, etc. So it would not ease my inspection. Last, the definition of a list in Lean is probably no going to change, but for more complex objects, definitions often evolve during the course of the development. "Testing" the subsequent impact of the changes on notations, lemmas, theorem, etc. is easier if this content is not too far I would say. However, part of the answer might be of a technological nature. Even if I am flatted that people like @Patrick Massot want to hear from my experience, you Lean people might benefit from tools that change the game.

Sebastian Ullrich (Jun 04 2018 at 12:49):

One more argument in favor of not separating definitions and lemmas: You can make uninteresting helper definitions privateand still prove things about them

Sean Leather (Jun 04 2018 at 12:50):

That's really only an argument for not separating those definitions. :wink: (But it is a fine argument in any case.)


Last updated: Dec 20 2023 at 11:08 UTC