Zulip Chat Archive

Stream: mathlib4

Topic: Project idea: linking MathLib to math books

Josha Dekker (Feb 26 2024 at 07:52):

I find that tools like Loogle and Moogle are very useful to find results when you somewhat know what to look for. However, sometimes it is hard (for me) to find common results, e.g. because the Mathlib name describes the content of a result rather than the common name of the result, which leaves me guessing for the correct spelling, which then may depend on the level of generality in which the result is added. I was wondering if it would be beneficial to kind of link Mathlib to some math books. What I mean by this is the following:

For each (major) area of mathematics, identify a good source, which ideally is freely accessible. For this source, create one file per chapter, which lists all definitions and results. Definitions are linked to Mathlib definitions directly, and results are, as much as possible, one-line proofs based on Mathlib results, or at least short proofs. It would be fine if these results are proven from a much more general result (e.g. FTC from Stokes' theorem). So, I'm not advocating to reprove everything in terms of these books, but to show a possible road that one could take for these results, using results from Mathlib.

The benefits of this are three-fold:

While formalising a proof, some of the cognitive load is shifted from guessing the Mathlib phrasing to finding the necessary result in the math books.
Missing results/proofs that are very long provide for a good indication that there might be some result that we are missing in Mathlib
It potentially provides for some nice, beginner-friendly, projects.

The books/sources that I have in mind would be e.g.
Bourbaki (topology, analysis etc.)
(parts of the) Stacks project

I'm curious what your thoughts are on this, and if there is interest in starting such a thing! I'm happy to play a part in this!

For pi-base (a topology database), I am trying to do a similar thing, but I'm trying to do that in an automated way, so I'm learning more Lean to get as much as possible done automatically there. As such, that is still very much WIP, as I'm working through FPIL (which is going a bit slowly)

Kyle Miller (Feb 26 2024 at 08:26):

This is something I've wanted to see, with just about the same vision. I've thought this could be a good use of the Archive folder, depending on the textbook, and I agree with trying to make all the proofs one-liners.

Thomas Murrills (Feb 26 2024 at 08:26):

I’d love this! I’ve had a related desire, which is simply to create a systematic way to formalize books and papers (i.e. without the motivation to use them as ways to find Mathlib results, though keeping things in terms of mathlib as much as possible would be good practice, I’d think).

In my thoughts on this, I’ve figured it would be nice to be able to

have page <n> commands to specify page breaks (and which could be used for navigation)
have proposition, eq(uation) and corollary (etc.) commands in addition to theorem and lemma to aid search and organization (corollary would store the theorem it was a corollary of somehow)
be able to name things by numbers as well, in a way that provides structured access (e.g. so that later you could draw up all the 1.3.* items)

Down the road, for some books and papers, it would even be nice to have a way to produce literate lean versions of the book with the author’s cooperation, so that you could read it in lean (with hovers, etc.). (Lean blueprints sound relevant, esp. for the reading part?)

(People may already be working on this or have something like it, I’m not sure—I’d certainly be interested!)

Josha Dekker (Feb 26 2024 at 08:32):

Good to hear, let’s see if there’s more support for this line of thought so we can see if/how something like this could be shaped! Or if there is another such project already, which tries to systematically do this!

Yaël Dillies (Feb 26 2024 at 08:58):

My entire LeanCamCombi project is an extensive exercise in doing just this, except that it's comprising material not yet formalised.

Josha Dekker (Feb 26 2024 at 09:03):

Yaël Dillies said:

My entire LeanCamCombi project is an extensive exercise in doing just this, except that it's comprising material not yet formalised.

Very nice! So do you think that something like the approach that I described for some other areas of math would be nice to have?

Yaël Dillies (Feb 26 2024 at 09:06):

Yes, absolutely. This is far from the first thread about it

Josha Dekker (Feb 26 2024 at 09:09):

Great, so suppose we want to do this in an organised way, would the best approach be to

Find some collaborators interested in this
Draft a project proposal
Propose a first book as proof-of-concept
Get community feedback
Start
Or would you propose a different methodology?

Yaël Dillies (Feb 26 2024 at 09:15):

I would go straight to "Start". It's much easier to get collaborators and feedback once you have a clear proof-of-concept

Yaël Dillies (Feb 26 2024 at 09:17):

I have been meaning to do it myself but I don't really know what specific book to follow. I've specialised in formalising things for which there's no textbook :upside_down:

Rémy Degenne (Feb 26 2024 at 09:17):

I agree: don't get lost in proposals and proofs of concept. Pick a book, start and see what the actual design questions for such a lean-book pairing are.
If you want a probability book, I suggest Kallenberg's Foundations of modern probability (3rd ed.).

Rémy Degenne (Feb 26 2024 at 09:19):

Although the proofs might be short at times, and sometimes what takes a page in that book requires thousands of lines of code, at least all the important arguments are there and are presented in a quite formal way.

Josha Dekker (Feb 26 2024 at 09:20):

(deleted)

Damiano Testa (Feb 26 2024 at 09:28):

Or, look for a book that has already inspired various mathlib proofs. Some of the bourbaki texts would probably do.

Damiano Testa (Feb 26 2024 at 09:29):

This way, you can get to the "linking" phase directly and partially delay the formalising stuff.

Josha Dekker (Feb 26 2024 at 09:29):

Yes, I was thinking of Bourbaki as well, but the English versions seem to not be open access on Springer. I've been meaning to refresh my French so I'm happy to use the French version, but that might not be as accessible for everyone.

Josha Dekker (Feb 26 2024 at 09:31):

Damiano Testa said:

This way, you can get to the "linking" phase directly and partially delay the formalising stuff.

Ideally, I don't formalise anything. In the end, the idea of linking a book would be that it gives quick pointers to results in Mathlib, so I'd want as much as possible to be obtained just by linking.

Damiano Testa (Feb 26 2024 at 09:31):

For a mathematician with no formalisation training, reading a french maths texts is probably easier than reading lean code. I would ignore the issue at first.

Damiano Testa (Feb 26 2024 at 09:32):

Josha Dekker said:

Damiano Testa said:

This way, you can get to the "linking" phase directly and partially delay the formalising stuff.

Ideally, I don't formalise anything. In the end, the idea of linking a book would be that it gives quick pointers to results in Mathlib, so I'd want as much as possible to be obtained just by linking.

I suspect that there will be some formalisation link to be provided, no matter what.

Josha Dekker (Feb 26 2024 at 09:32):

If this is a shared sentiment, I'll work on Bourbaki, General Topology chapters 1-4.

Yaël Dillies (Feb 26 2024 at 09:34):

Yaël Dillies said:

I have been meaning to do it myself but I don't really know what specific book to follow. I've specialised in formalising things for which there's no textbook :upside_down:

LeanCamCombi is actually far enough along that I could start writing some maths items and link them.

Josha Dekker (Feb 26 2024 at 09:34):

Of course there are still English versions of the Bourbaki ones, just not directly accessible through SpringerLink (for me at least).

I'll make a new project this afternoon and use Mathlib as a dependency. I'll need to polish my French a bit and mainly work on this during my daily commutes, so I'm not sure how quickly everything will go! Once I get the first two/three paragraphs in, I'll post a link here so that anyone can get some feedback.

Ruben Van de Velde (Feb 26 2024 at 09:37):

Having no experience with Bourbaki myself - are the theorem numbers consistent between the translations?

Christian Merten (Feb 26 2024 at 09:40):

Instead of having a separate project and listing results there, could you simply add the reference to the mathlib docstring? If a result from Bourbaki is missing, chances are high it should be added to mathlib anyway.
At some point we could add reference support in the docs to autogenerate linking maps from the docstrings.

Yaël Dillies (Feb 26 2024 at 09:41):

The entire point of this project is to generate that reference support

Yaël Dillies (Feb 26 2024 at 09:42):

Once we have those textbooks translations mostly sorted, we will be able to add them to mathlib and get doc-gen to automatically add the references to the docstring

Christian Merten (Feb 26 2024 at 09:44):

Yaël Dillies said:

The entire point of this project is to generate that reference support

Then you read Joshas description differently than me.

Yaël Dillies (Feb 26 2024 at 09:46):

If they differ, I'm talking about my idea, not Joshua's

Ruben Van de Velde (Feb 26 2024 at 09:47):

I'm guessing the average textbook (Bourbaki perhaps less so) will state most of its results in less generality than mathlib, so building a map purely by tagging existing mathlib theorems would not necessarily be the most readable

Christian Merten (Feb 26 2024 at 09:49):

But the most useful, I only care about mathlib theorems when I am searching for a result I would like to use downstream.

Josha Dekker (Feb 26 2024 at 09:49):

I see Christian's point here. I think multiple lines of thought are possible:

We make a file for a given book, in which we list every definition and result from that book chronologically. Each proof will ideally be a one-line Mathlib proof, and each definition should just be a reference to the Mathlib reference. This way, the file can be searched to link results from the book to Mathlib results easily. The advantage here is that we don't need to add new results to Mathlib that are mentioned in the book but are present in Mathlib in a more general way: they provide the reader with an example of how the result can be obtained from Mathlib.
For a given book, we create tags, e.g. [@ Bourbaki ref_no], which we put with the relevant results in Mathlib. We then write some code that automatically reads in all the results tagged and aggregates them in a file. What I fear might be disadvantages here may be that books typically have less general results, and that multiple books that we may want (e.g. general topology, functional analysis) may have a few overlapping definitions, which would lead to a cluttering of tags.

I was proposing the former, which I think is Yaël's interpretation. Christian's interpretation is the second.

Yaël Dillies (Feb 26 2024 at 09:50):

Actually, neither of these is exactly what I meant

Josha Dekker (Feb 26 2024 at 09:50):

What did you mean?

Yaël Dillies (Feb 26 2024 at 09:53):

(deleted)

Christian Merten (Feb 26 2024 at 09:54):

Another potential disadvantage of option one is maintainability, at least as long as you don't plan this to be included in the mathlib repo itself.

Christian Merten (Feb 26 2024 at 09:56):

For the cluttering of tags issue: Ideally all the reference tags can be hidden in the mathlib docs page (similar to the "Show equations" dropdown).

Josha Dekker (Feb 26 2024 at 09:56):

Christian Merten said:

Another potential disadvantage of option one is maintainability, at least as long as you don't plan this to be included in the mathlib repo itself.

I think that having a solid set of textbooks easily linked to Mathlib would really help with the adoption of Mathlib for new formalisation projects, so I hope that we can find some way of including this in the Mathlib repo (not necessarily while WIP, but certainly when finished). Of course this should not be a catch them all effort in which people start formalising every book and adding it to the Mathlib repo: we should have a curated set of books, like 1 advanced book per topic.

Josha Dekker (Feb 26 2024 at 09:57):

Christian Merten said:

For the cluttering of tags issue: Ideally all the reference tags can be hidden in the mathlib docs page (similar to the "Show equations" dropdown).

That would certainly help, yes!

Christian Merten (Feb 26 2024 at 10:03):

Josha Dekker said:

Christian Merten said:

Another potential disadvantage of option one is maintainability, at least as long as you don't plan this to be included in the mathlib repo itself.

I think that having a solid set of textbooks easily linked to Mathlib would really help with the adoption of Mathlib for new formalisation projects, so I hope that we can find some way of including this in the Mathlib repo (not necessarily while WIP, but certainly when finished). Of course this should not be a catch them all effort in which people start formalising every book and adding it to the Mathlib repo: we should have a curated set of books, like 1 advanced book per topic.

Maybe a maintainer can give their opinion on this, if this is included in the mathlib repo it still means additional maintenance work (e.g. when renaming theorems) while a rename does not require the reference tag in the doc string to be changed.

Yaël Dillies (Feb 26 2024 at 10:03):

I have specific ideas about how this should be set up:

Every textbook/chapter of a textbook gets a JSON file. Inside that JSON file, every maths result/definition gets an entry with
- its number inside the book (not all results/definitions will have that, eg because it's an intermediate result that's worth mentioning separately)
- its informal name (again, not all results/definitions will have that)
- the name of the corresponding Lean declaration
Most Lean declarations will hopefully be in mathlib. However some results are not worth being spelled out in mathlib proper (because they are easy chainings of mathlib lemmas or because they are examples with no practical use), so there has to be a supplementary source of the Lean declarations. I see two non-mutually exclusive solutions:
- Host the extra declarations in a separate project. This will be the solution at the start, while we still figure out how to organise the whole thing.
- Host the extra declarations in the mathlib repo, either in Archive (which I think is good enough) or in a separate Textbooks folder. This is the solution I want there to be for mature textbook translations because it further allows...
... each JSON file of textbook translation gets a page in the docs with links to the in the style of the undergraduate TODO list. Further, each mathlib declaration linked in a textbook translation automatically gets a little tag in the docs sending back to the corresponding textbook translation's page.

Yaël Dillies (Feb 26 2024 at 10:04):

Christian Merten said:

if this is included in the mathlib repo it still means additional maintenance work (e.g. when renaming theorems) while a rename does not require the reference tag in the doc string to be changed.

This will become a non-problem once we have implemented Mario Carneiro's ideas on automatic migration. If it's included in the mathlib repo, it also means that CI will run on it, so it's no more work than to change any other occurrence of a theorem.

Josha Dekker (Feb 26 2024 at 10:07):

Yaël Dillies said:

I have specific ideas about how this should be set up:

Every textbook/chapter of a textbook gets a JSON file. Inside that JSON file, every maths result/definition gets an entry with

its number inside the book (not all results/definitions will have that, eg because it's an intermediate result that's worth mentioning separately)

its informal name (again, not all results/definitions will have that)

the name of the corresponding Lean declaration

Most Lean declarations will hopefully be in mathlib. However some results are not worth being spelled out in mathlib proper (because they are easy chainings of mathlib lemmas or because they are examples with no practical use), so there has to be a supplementary source of the Lean declarations. I see two non-mutually exclusive solutions:

Host the extra declarations in a separate project. This will be the solution at the start, while we still figure out how to organise the whole thing.

Host the extra declarations in the mathlib repo, either in Archive (which I think is good enough) or in a separate Textbooks folder. This is the solution I want there to be for mature textbook translations because it further allows...

... each JSON file of textbook translation gets a page in the docs with links to the in the style of the undergraduate TODO list. Further, each mathlib declaration linked in a textbook translation automatically gets a little tag in the docs sending back to the corresponding textbook translation's page.

I think this sounds very good. I have two small questions:

Do you have a sketch for how you would like the JSON file to look?
If you are using JSON rather than .lean files, how can we automatically pick up when the corresponding Lean declaration is updated?

Christian Merten (Feb 26 2024 at 10:07):

And I guess you want additional CI support to check that that the linked mathlib declarations in the JSON actually exist?

Yaël Dillies (Feb 26 2024 at 10:07):

Yaël Dillies (Feb 26 2024 at 10:09):

A good example to follow is the undergraduate TODO list. It uses YAML instead of JSON. I'm not sure how much that matters. And here's the CI step that checks the YAML contains existing Lean declarations.

Josha Dekker (Feb 26 2024 at 10:13):

Thank you! Shall I make a new project and write up something for the first section or two of Bourbaki's Topology book? I guess it might be useful to use YAML then, as that allows us to recycle the same CI step? I have no experience with setting up CI, so I won't do that yet. If anything is missing, I'll add that to a separate file and reference that as well.

Yaël Dillies (Feb 26 2024 at 10:15):

That sounds like the best path forward! Feel free to add me to the repo. Then I can set up CI.

Josha Dekker (Feb 26 2024 at 10:15):

I guess it is probably better to make a new project with Mathlib as dependency at this point, rather than branching Mathlib and working directly in docs?

Yaël Dillies (Feb 26 2024 at 10:15):

Yes, definitely

Josha Dekker (Feb 26 2024 at 10:15):

I'll do that this afternoon! Thanks!

Josha Dekker (Feb 26 2024 at 15:23):

Yaël Dillies said:

That sounds like the best path forward! Feel free to add me to the repo. Then I can set up CI.

I've sent you an invite, feel free to change what you want of course!

Josha Dekker (Feb 26 2024 at 15:28):

If I want this to be related to Mathlib at some point, I need to get it formatted in a workable way anyway!

Josha Dekker (Feb 26 2024 at 17:16):

I've also set up a minimal example in
LeanBourbaki/Chapter1/Chapter1Lean.yaml please feel free to change the structure to a format you like better/works with CI, then I'll start adding things!

Emilie (Shad Amethyst) (Feb 26 2024 at 17:52):

Do you have a link to it?

Josha Dekker (Feb 26 2024 at 18:17):

Emilie (Shad Amethyst) said:

Do you have a link to it?

It is still very much WIP, but I can send a link in a bit if you like!

Josha Dekker (Feb 26 2024 at 18:34):

The project is in here, but it is still very much WIP (read: no actual content so far...)
@Yaël Dillies will at some point add CI, and I'm not sure what format for the YAML files would be optimal, which will probably depend on the way the CI plays out.
https://github.com/JADekker/LeanBourbaki

Kyle Miller (Feb 26 2024 at 18:37):

I think it would be good to have at least some formalized textbooks in the Archive folder. For example, Topologie générale, ones that make sure that everything that's supposed to be in mathlib is actually there.

Yes, it increases the mathlib maintenance burden, but I see the Archive as being one of mathlib's test suites.

Plus, this serves as documentation for mathlib. If you are familiar with the textbook, you can find the result you want, locate that result in the Archive, and then see its (hopefully one-liner) proof. Sure, mathlib is at a different level of generality than most textbooks, but that doesn't affect whether mathlib can prove the textbook results.

Kyle Miller (Feb 26 2024 at 18:38):

I'm less certain about linking from mathlib to particular textbook. Things can look fairly different when formalized, and it can be hard to say which theorem is the textbook theorem.

I think I'd prefer if the formalized textbook mark out what's the main mathlib theorem that proves it. From that, we can generate the reverse mapping.

Kyle Miller (Feb 26 2024 at 18:40):

A random idea for how to implement it: we can create syntax to tag a theorem inside the proof with metadata saying "this is what proves the current theorem" and then have some metaprogramming read this off. That way you don't have to keep anything synchronized as the Archive is maintained (i.e., when someone makes a mathlib change and hurriedly goes through and fixes the Archive)

Yaël Dillies (Feb 27 2024 at 15:46):

I just pushed some very basic CI bodging things together from mathlib, PFR and LeanAPAP. Unsurprisingly, it doesn't work (yet), but at least I can explain that the idea is currently to follow the layout from https://github.com/leanprover-community/mathlib4/blob/master/docs/100.yaml

Yaël Dillies (Feb 27 2024 at 15:46):

@Eric Wieser, do you know why we are using YAML if the first YAML processing step is turning the files into JSON?

Josha Dekker (Feb 27 2024 at 16:14):

Yaël Dillies said:

I just pushed some very basic CI bodging things together from mathlib, PFR and LeanAPAP. Unsurprisingly, it doesn't work (yet), but at least I can explain that the idea is currently to follow the layout from https://github.com/leanprover-community/mathlib4/blob/master/docs/100.yaml

Thank you! I tried to follow that, although I’m not entirely sure what the optimal formatting would be here…

Eric Wieser (Feb 27 2024 at 16:19):

The convert-to-json step was added as part of the change from our scripts being converted from python to Lean

Patrick Massot (Feb 27 2024 at 18:23):

Yaml is nicer to read and edit for humans, json is a bit nicer for computers. And this translation step is purely because Lean 4 has a json lib but no yaml lib.

Yaël Dillies (Feb 27 2024 at 21:36):

How hard is it to write a yaml lib? Maybe I should just do that. Probably good training on Lean as a general purpose programming language.

Eric Wieser (Feb 27 2024 at 21:38):

As I understand it, yaml is a cursed format that is incredibly hard to parse correctly... Let me see if I can find the blog post I'm thinking of

Matthew Ballard (Feb 27 2024 at 21:38):

toml?

Yaël Dillies (Feb 27 2024 at 21:43):

I'm happy with any format so long as I can write a Lean library for it

Matthew Ballard (Feb 27 2024 at 21:46):

I think XML is coming back into style :)

Ruben Van de Velde (Feb 27 2024 at 21:55):

Turns out XML is also a cursed format, though maybe not as cursed as yaml

Eric Wieser (Feb 27 2024 at 22:04):

Here's one post about why yaml is so hard to parse and author: https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-from-hell

Thomas Murrills (Feb 27 2024 at 22:32):

There's also Apple's recent open-source Pkl config language which is meant to be able to convert to json/yaml/etc. while behaving like a statically-typed language (plus, it can check constraints)—but I imagine it hasn't been fully battle-tested yet.

Yakov Pechersky (Feb 27 2024 at 22:48):

yaml is cursed. i tried writing a library to parse it when we were doing porting because we had yamls describing PRs. it's an indentation sensitive language that supports gotos, basically. so you can have self-reference. i thought about using direct lean Syntax categories to define the language, but it got too crazy for me.

Yakov Pechersky (Feb 27 2024 at 22:49):

https://yaml.org/spec/1.2.2/ :cry:

Eric Wieser (Feb 27 2024 at 23:03):

I think the only reasonable Lean yaml parser project would be one that wraps libyaml

Patrick Massot (Feb 27 2024 at 23:17):

Ok, I now update my “json is a bit nicer for computers” to “json is much nicer for computers”. And then I guess a corollary is that yaml only pretends to be nicer to read and write for humans since any interpretation difference between humans and computers is a trap for humans.

Eric Wieser (Feb 27 2024 at 23:18):

Correct yaml is nice to read for humans, but it is very hard for humans to tell if yaml is correct when writing it in general.

Patrick Stevens (Feb 27 2024 at 23:33):

(Its problem is that it vibes very well no matter how correct it is - much the same as the problem with current LLMs)

Mario Carneiro (Mar 02 2024 at 03:18):

Spoiler alert, there is a TOML parser coming to lean in the near future. So that might impact your choice of human-oriented configuration language for lean tools...

Jireh Loreaux (Mar 12 2024 at 12:05):

For yet another reason why YAML is so scary: https://thenewstack.io/with-yamlscript-yaml-becomes-a-proper-programming-language/

Chris Wong (Mar 13 2024 at 04:57):

I've used TOML in Rust tooling and it's been great so far. I think the main disadvantage is that it doesn't handle deep nesting (≥ 3 layers) very well. Though, if your config is that complex, it might be better to use a Lean DSL anyway.

Last updated: May 02 2025 at 03:31 UTC