Zulip Chat Archive

Stream: general

Topic: What is the point of Lean's maths library?


Kevin Buzzard (Jul 28 2021 at 09:01):

I have been invited to give a colloquium at Berkeley's Topos Institute in a couple of weeks (talk link) and I have given them the rather provocative title "What is the point of Lean's maths library?". I could happily talk for an hour on what I think the point is, but I have been giving a lot of talks recently from the comfort of my own front room and I am beginning to wonder whether I am just starting to recycle arguments which I've already waffled on about at length on YT or in my blog post ranblings. As I learn more about this community I realise that other people have very different ideas about the whole endeavour. Michael Harris' first post in Silicon Reckoner, his new substack, shows that he has a very different view of mathlib to me.

What do you think the point of Lean's maths library is?

Oliver Nash (Jul 28 2021 at 09:57):

I'd answer this question by drawing a parallel to the informal case. Here are four different questions:

  1. What is the point of Lean's maths library?
  2. What is the point of informal mathematics?
  3. Why do I do informal mathematics?
  4. Why do I work on Mathlib?

Questions 3, 4 are easy, in each case I say "because it's there". Thus, I do formal mathematics for the same reasons that I do informal mathematics: it is unexplored terrain, I'm curious about it, and I find it satisfying and beautiful.

Question 2 is not so easy and usually includes me guessing what the person asking might regard as valuable. At various points in history I might have answered by saying trigonometry was important for navigating the oceans, or I might have said that understanding the laws of steam engines or electromagnetism was important, and these days I might say electronic communication, crypotography, complexity theory, compressed sensing, GPS, ... are important.

Oliver Nash (Jul 28 2021 at 09:57):

With that disclaimer I think I'd answer question 1 by saying the Mathlib effort:

  • Helps us learn what we need from the next generation of proof assistants
  • Enables correctness checking (like LTE)
  • Provides a guaranteed-correct unified mathematical reference
  • Unblocks a larger number of mathematicians who would formalise if only the relevant basic definitions / results in their area were already in place
  • [Speculative] May allow us to push borders of mathematics into areas otherwise too complicated for humans to work reliably

Jason Rute (Jul 28 2021 at 12:54):

I think there are two questions:

  1. What is the point of building a large library of formalized mathematics?
  2. What is the point of mathlib’s approach to this (especially in contrast to MetaMath, Isabelle, Mizar, HOL Light, and Coq)?

Jason Rute (Jul 28 2021 at 12:54):

I don’t have enough understanding of the second question to give a meaningful contribution, but the first question has many diverse interesting answers and observations.

Jason Rute (Jul 28 2021 at 12:54):

First, the desire to formalize all of mathematics is fairly old. Russell and Whitehead’s Principia Mathematics is now over 100 years old. Since then, I think a lot of mathematicians have the unfortunate take away that, while formalization is theoretically possible, it is impractically hard. (And they often erroneously cite that it took 379 pages to prove 1 + 1 = 2, as if the first chunk of the book was devoted to that theorem alone.)

Jason Rute (Jul 28 2021 at 12:54):

However, not everyone has shared this pessimistic view, especially when computers came along. I don’t know the full history here with AutoMath and Mizar and such, but I’d love to see it written down. I do know that the QED Manifesto of the 90s provides a number of diverse motivations for formalizing mathematics. Recently, Josef Urban, in the middle, around 31:30, of an interesting a podcast described the diverse viewpoints of the QED Manifesto authors, some who thought AI was too lofty and just wanted to formalize math by hand, while others saw this potential library as a source for training future AIs.

Jason Rute (Jul 28 2021 at 12:54):

I think the Internet and all that came with it is really one thing that has helped formalization take off (as well as other large scale open source software projects). First mailing lists and websites, but now Github, Zulip, Discord, etc. are invaluable for this community. Even the ability to try out Lean in the browser is key to adoption. (I know some complain that Lean has a big PR campaign, but I think PR is really important. New software languages get adopted based on how good their documentation is. And for every 1 person who comes to Lean from another ITP, another 2 come to formalization for the first time---or so I’m guessing without any data :smile:).

Jason Rute (Jul 28 2021 at 12:54):

In my talk about Lean-GPT-f at Harvard, I gave five different motivations for formalized mathematics, all of which I think apply to Mathlib:

  1. Mechanically check mathematical proofs. While it doesn’t yet make sense to have mathematicians formalize all of their work, it does make sense to formalize particularly foundational, interesting, or troublesome proofs, such as undergraduate mathematics, the independence of CH, or Scholze’s lemma in the Liquid Tensor project.
  2. Digitize mathematics. Everything nowadays is digitized, and systematically organized into databases. However, mathematics is remarkably resistant to this trend. Sure we have pdfs and LaTeX files of most articles, but none of these formats capture the mathematical content of the articles. Formalization of a large body of mathematics, whether the full proofs or just the theorem statements, would open up many possibilities for automatic retrieval and search of mathematics.
  3. Unify and archive mathematical knowledge. Right now the world of mathematics consists of a mostly disjoint collection of articles (most of them mostly true). There have been past heroic efforts from Euclid to Bourbacki to systematically organize the current mathematical knowledge of the time. More modern incarnations include Wikipedia, nLab, and the Stacks Project. Formalization provides yet another approach to this, whereby we systematically build up a library of mathematics starting at the foundations, up to the most interesting new developments.
  4. Prove the correctness of software and hardware. This hasn’t been much of a focus in Lean, although Lean 4 will be a fast language, compiled to C and maybe LLVM, with the ability to formally verify the properties of any (safe) code one writes. There have been some great success stories so far in software and hardware verification and I see a lot of potential for verified security software. It is also the most likely way to monetize ones skills in formal theorem proving. As the mathematics in formal libraries get more sophisticated, so do the types of algorithms that we can formalize.
  5. Making mathematics accessible to computers in a new way. I’ve heard it said that the one thing mathematics is good for is more mathematics. This “false truism” also holds for formalization. We don’t know the full applications of formal mathematics right now except that it will improve our ability to formalize more mathematics in the future. However, we do have hints already at the ways that one can use a large library of formal mathematics. AI projects like HOList and GPT-f are trained off of many person-years of formalization effort, and hopefully will eventually become a virtuous loop, helping users better formalize more mathematics. It’s hard to predict what we can do with a large library of formalized, computer-readable, interconnected mathematics, but it is very reasonable to believe it could have important positive future impacts on mathematics, technology, science, and education.

Johan Commelin (Jul 28 2021 at 13:22):

@Jason Rute Wow, thanks for sharing your thoughts!

Matthew Ballard (Jul 28 2021 at 13:36):

Training in a more holistic manner. Perhaps more valuable than the existence of a single formalized library itself is the cultivation of a community where the members possess the skill "ability to formalize pure mathematics."

Bryan Gin-ge Chen (Jul 28 2021 at 13:52):

One thing that I haven't seen mentioned yet is that mathlib has brought together a set of folks with rather diverse interests who most likely would not be interacting otherwise. My impression is that when people from very different communities come together there are often unexpected opportunities and results.

Yakov Pechersky (Jul 28 2021 at 14:01):

There's also the aspect from the other side of computer science. Chip designers have been formalizing their designs for a long time. But they don't need a formalization of Abel Ruffini to make the chips work. So mathlib allows us to ask meta questions - what are the various ways that expressiveness can help in transmitting information? What are useful metamathematic constructions/approaches? Can formalizing mathematics help us understand how to work with complicated structures in other fields?

Johan Commelin (Jul 28 2021 at 14:01):

I complete agree with Bryan! It's been an important factor in all the fun that I've had, that we have type theorists, topologists, meta hackers, analysts, programmers, number theorists, AI experts, UX experts, geometers, and many others interact here on zulip. It's awesome.
It also makes me sad that this probably will not scale. If we get 10x the user size, we'll see fragmentation, and people will start to interact only within their little cliques, I'm afraid. :sad:
But for now, I'm going to enjoy all the fun! :octopus: :cartwheel:

Matthew Ballard (Jul 28 2021 at 14:12):

I should say that for myself, a mathematician, experiencing and internalizing the perspectives @Johan Commelin mentions has been the most enjoyable aspect of my interaction with lean.

Filippo A. E. Nuccio (Jul 28 2021 at 16:07):

Johan Commelin said:

I complete agree with Bryan! It's been an important factor in all the fun that I've had, that we have type theorists, topologists, meta hackers, analysts, programmers, number theorists, AI experts, UX experts, geometers, and many others interact here on zulip. It's awesome.
It also makes me sad that this probably will not scale. If we get 10x the user size, we'll see fragmentation, and people will start to interact only within their little cliques, I'm afraid. :sad:
But for now, I'm going to enjoy all the fun! :octopus: :cartwheel:

I agree that the problem of scaling and fragmentation is crucial, and somewhat sad. A part from interacting with little cliques, this will also question the role of mathlib, because if we end up with dozens of libraries which don't interact with each other, basically none will be of great use.

Filippo A. E. Nuccio (Jul 28 2021 at 16:11):

That being said, and I might be too naïf, my answer is still the idea of a 21st century Bourbaki, including Serre's sentence "we never explain why we did things in such-and-such a way, we simply did them", which I read as a humble approach of letting each mathematician decide what to do of Bourbaki's books. When I speak to friends and colleagues, this is my main argument, together with the dream that in 10-15 years time to "submit a paper" will mean to submit some code which compiles in any of the several theorem-prover which will exist by then. I normally add that one big advantage will be that the editorial task done by colleagues working for journals will become somehow much more relevant, as the correctness of a result will be automatically checked and they will need to focus on its relevance.

Mario Carneiro (Jul 28 2021 at 16:23):

I agree that the problem of scaling and fragmentation is crucial, and somewhat sad. A part from interacting with little cliques, this will also question the role of mathlib, because if we end up with dozens of libraries which don't interact with each other, basically none will be of great use.

Specialization doesn't necessarily imply fragmentation. I would look to large software projects like the linux kernel for inspiration here: people work on their own domains but it all still fits into a (somewhat) unified whole

Mario Carneiro (Jul 28 2021 at 16:24):

I think there is already some considerable specialization going on in mathlib. I don't know if there is anyone who can actually follow every part of it

Kyle Miller (Jul 28 2021 at 17:23):

Jason Rute said:

often erroneously cite that it took 379 pages to prove 1 + 1 = 2

I'm curious about the "1 + 1 = 2 index" for mathlib, but I calculated the easier thing, which is how many pages mathlib would be when printed. Excluding deprecated and some supporting code like tactics, it would take about 7500 pages, 70 lines per page (methodology: wc -l of the lean files in the src directory, minus the result from a few subdirectories, and then I saw how many lines emacs fits in a page when printing in postscript format). Mathlib's tactics form another 500 page volume. For a rough comparison, Principia Mathematica is about 2000 pages with about half as many lines per page.

Johan Commelin (Jul 28 2021 at 17:32):

Did Princ.Math. have roughly the same amount of characters per line?

Mario Carneiro (Jul 28 2021 at 17:33):

they are mostly short lines, pure logical statements like A -> B -> A

Mario Carneiro (Jul 28 2021 at 17:34):

well, the syntax was weirder than that but you get the idea

Kyle Miller (Jul 28 2021 at 17:54):

Kyle Miller said:

I'm curious about the "1 + 1 = 2 index" for mathlib

Oh right, you don't need mathlib for that:

example : 1 + 1 = 2 := rfl

Mario Carneiro (Jul 28 2021 at 17:58):

If you want a closer equivalent to principia mathematica's 1+1 = 2, you should consider (1 : cardinal) + (1 : cardinal) = 2, but it still isn't a good example because the lhs is the definition of the rhs. This is why metamath uses 2 + 2 = 4 instead, but that also isn't a good example in lean for the same reason. So perhaps 3 + 1 = 4

Anne Baanen (Jul 28 2021 at 19:22):

Yakov Pechersky said:

There's also the aspect from the other side of computer science. Chip designers have been formalizing their designs for a long time. But they don't need a formalization of Abel Ruffini to make the chips work. So mathlib allows us to ask meta questions - what are the various ways that expressiveness can help in transmitting information? What are useful metamathematic constructions/approaches? Can formalizing mathematics help us understand how to work with complicated structures in other fields?

This is close to one of my main motivations: formalising maths is very much like taking it to bits and putting it back together again. Just like doing so to a physical machine, you learn a lot about how it is built and what each component does, much more than studying it from the outside.

Interestingly, for this purpose it's good that our provers are not too smart, so we can't paper over issues where the pieces are still too big to fit together. Here I'm thinking of things like docs#is_scalar_tower, which is quickly becoming a basic building block that I would never have invested so much time into, if we could transport everything in the Dedekind domains project for free to the algebraic closure.

Patrick Massot (Jul 28 2021 at 21:50):

I'm late to the party but I can still dump my standard list of reasons to push math formalization (I don't think there is anything specific to mathlib, although I very much enjoy our specific community and process), in no specific order:

  • check correctness, especially for proofs that are too big to fit in a human mind (here I'm thinking about proofs much larger than in LTE, at least 1000 pages without much intermediate reality checks, as in the really tricky bits of symplectic topology)
  • save time for referees (this is not quite the same as the first point because the first point is first and foremost from the author point of view)
  • force people to have clearer ideas, hopefully coming up with better abstractions. We still don't have a killer example here (for instance filters were unfortunately invented before proof assistants, but in the very similar context of Bourbaki). Although we didn't invent much in the LTE project apart from the Commelin complex, we can already say the project clarified the proof a bit, especially through questions we asked.
  • force consistency (especially in a monolithic mathlib, but we need consistent formalization whether the repository organization)
  • allow a much more universal access to the information. We all now subareas of math where papers are unreadable if you don't have physical access to a small group of authors and their folklore knowledge. Note this is isn't automatic with formal accounts, because we can write completely obfuscated formal math, so we also need the following items
  • allow people to write better informal accounts, using the possibility to lie as much as desired since the formal version is covering our back
  • even better, I still hope to reach a stage where we can get a single hybrid document allowing to start at a very informal level and almost continuously zoom in wherever we want until reaching the fully formal level (the Alectryon project is part of the current path towards that goal)
  • also on the science-fiction side, there is still the hope to have a really nice search tool, as was envisioned in the formal abstract project or even including the dream to ask the AI for "similar sounding" lemmas
  • teaching (see the tutorials project which is based on my undergrad course)
  • merge with computer algebra systems, so that everything we currently do with Sage or similar tools become proofs. This also includes all the big proofs that Jeremy loves so much (4 colors, Kepler's conjecture...)

And let me repeat once more that I don't mean all that to replace traditional math, I want everything, from very informal to formal.

Brandon Brown (Jul 28 2021 at 21:57):

Being able to do "data science" on mathlib has been interesting as well, e.g. visualizing the web of theorems seems like something only possible with a mathlib-like endeavor; I have no idea if that is useful beyond being moderately intriguing for its own sake. Also the educational value of mathlib has been really helpful for me as a programmer trying to self-learn more math, having all the math completely unambiguous and explicit (i.e. I can keep drilling down until I get to some inductive type I can understand or whatever) is great.

Kevin Buzzard (Jul 28 2021 at 22:10):

Thank you all very much for the wide range of views!

Johan Commelin (Jul 29 2021 at 04:42):

Patrick Massot said:

  • force people to have clearer ideas, hopefully coming up with better abstractions. We still don't have a killer example here (for instance filters were unfortunately invented before proof assistants, but in the very similar context of Bourbaki). Although we didn't invent much in the LTE project apart from the Commelin complex, we can already say the project clarified the proof a bit, especially through questions we asked.

This complex that we came up with during LTE turns out to be a reinvention as well. MacLane already defined it in 1958... but the connection with Breen-Deligne resolutions hadn't been made.

David Michael Roberts (Aug 12 2021 at 06:02):

Kyle Miller said:

Jason Rute said:

often erroneously cite that it took 379 pages to prove 1 + 1 = 2

I'm curious about the "1 + 1 = 2 index" for mathlib, but I calculated the easier thing, which is how many pages mathlib would be when printed. Excluding deprecated and some supporting code like tactics, it would take about 7500 pages, 70 lines per page (methodology: wc -l of the lean files in the src directory, minus the result from a few subdirectories, and then I saw how many lines emacs fits in a page when printing in postscript format). Mathlib's tactics form another 500 page volume. For a rough comparison, Principia Mathematica is about 2000 pages with about half as many lines per page.

so it's more on par with the Stacks Project, but that isn't formalised (though it is quite thorough..)


Last updated: Dec 20 2023 at 11:08 UTC