Zulip Chat Archive

Stream: general

Topic: LeanAPAP


Patrick Massot (Jul 04 2023 at 10:06):

Are you starting a new project in Lean 3??

Yaël Dillies (Jul 04 2023 at 10:36):

Yes, there are many QoL improvements that still haven't made it to Lean 4 (because the core developers are busy solving other issues). This includes:

  • buggy autocompletion
  • docs searches polluted by the several thousands of characters long autogenerated instances names
  • parallel compilation of declarations within a file (this is particularly important to us since we have files with very long proofs)
  • I still can't make lake exe cache get work
  • a sensible way of determining which variables are used in a declaration

Further, we have no use of the new Lean 4 features:

  • We have no looping instances
  • The project isn't that big and linear that we will feel the speed difference
  • The instant library_search would at most be a marginal improvement to our coding experience given that Bhavik and I both know basically all the lemmas we ever need

Yaël Dillies (Jul 04 2023 at 10:37):

This is without saying that we want to actually formalise this result, rather than spending the summer learning/fixing a language that still has to see its first stable release.

Yaël Dillies (Jul 04 2023 at 10:38):

Thomas Bloom, Bhavik and I sat down (around a nice meal!) and had this conversation. I presented them with all the pros and cons of using Lean 3 vs Lean 4, and we decided to go with Lean 3.

Johan Commelin (Jul 04 2023 at 11:38):

One of the cons is that you are taking a roundabout way in materialising this belief:
Yaël Dillies said:

discrete Lp norms, discrete convolution, the Marcinkiewicz-Zygmund inequality (quantitative, for deterministic functions), and I believe all those belong in mathlib.

Yaël Dillies (Jul 04 2023 at 11:43):

We know :sad:

Scott Morrison (Jul 04 2023 at 23:04):

But @Yaël Dillies, @Bhavik Mehta, @Thomas Bloom, the analysis above of Lean 3 vs Lean 4 seems to leave out by far the biggest con of using Lean 3:

  • Your project is unusable by other mathematicians, and no longer contributing to the collective project of formalisation in an integrated library.

Obviously we would like to fix the five points Yael makes above, but I can't see them as anywhere close to balancing this one.

Scott Morrison (Jul 04 2023 at 23:08):

@Yaël Dillies, could you tell us what problem you are having with lake exe cache get? I tried looking through your old posts about it, but they seem to date back to when cache was being developed. It would be good to have an up-to-date report.

Yaël Dillies (Jul 04 2023 at 23:09):

Point is: the Lean 4 cons are impediments to writing the project, while the Lean 3 con is an impediment to the integration to mathlib. If we port our project once it's over (give it 3-4 weeks), then we get the best of both worlds (kinda?)

Scott Morrison (Jul 04 2023 at 23:24):

I will admit I am perpetually baffled by people's enthusiasm for writing code that later gets abandoned. (Where I'm using "abandoned" as "neither in mathlib, nor with daily CI that bumps mathlib and a long term plan for maintenance".) Mostly I'm looking at LTE and perfectoid here. :-) Any time I see people going about projects in ways that unnecessarily increase the chances of that happening, I am simply confused.

Bhavik Mehta (Jul 04 2023 at 23:28):

Scott Morrison said:

  • Your project is unusable by other mathematicians, and no longer contributing to the collective project of formalisation in an integrated library.

I think this is a very uncharitable and unfair interpretation - there are a good number of existing Lean 3 projects which are not in mathlib, and there is no reason we couldn't port our work to Lean 4 in the very near future.

you are taking a roundabout way in materialising this belief

I'm not so sure it is roundabout - while there are speed improvements in Lean 4, compilation speed is not the limiting factor for us; instead digitising the proof in the first place is. And with the factors Yaël mentions above, this seems to be more efficient to do in Lean 3 with porting later.

To be clear, I think the situation would be very different if Lean 4 had a stable release, or if the scope of this project wasn't a groundbreaking result that we aim to formalise over a summer.

Yaël Dillies (Jul 04 2023 at 23:31):

Scott Morrison said:

I will admit I am perpetually baffled by people's enthusiasm for writing code that later gets abandoned.

Scott, if there's one thing you all can't accuse me of, it's abandoning anyone's code :laughing:

Mario Carneiro (Jul 04 2023 at 23:37):

Bhavik Mehta said:

I'm not so sure it is roundabout - while there are speed improvements in Lean 4, compilation speed is not the limiting factor for us; instead digitising the proof in the first place is.

Honestly I wouldn't even say that compilation speed improvements are the main draw of lean 4. It may have happened by the by but it wasn't really a goal. It's just a lot more expressive and scriptable than it used to be, and a lot of the IDE integration things got significantly better.

Scott Morrison (Jul 04 2023 at 23:38):

Bhavik Mehta said:

Scott Morrison said:

  • Your project is unusable by other mathematicians, and no longer contributing to the collective project of formalisation in an integrated library.

I think this is a very uncharitable and unfair interpretation

Certainly if this project happens in Lean 3, and then is promptly ported to Lean 4, and this extra step doesn't delay incorporation into mathlib, then absolutely I agree this is an unfair interpretation.

If on the other hand the project ended up not being ported to Lean 4, the interpretation would be correct, wouldn't it?

mathport is (however wonderful it has been) unpleasant to use, and the manually porting required after running it is still tedious. Why burden yourself with that extra work? I just can't imagine that the total time of "write in Lean 3, migrate to Lean 4" is less than "write in Lean 4", valid complaints about autocomplete and documentation notwithstanding.

I guess my point relies on the premise that "the finish line" is a project running in Lean 4, and an intermediate step of having it running in Lean 3 is not particularly significant anymore.

Bhavik Mehta (Jul 04 2023 at 23:46):

Scott Morrison said:

If on the other hand the project ended up not being ported to Lean 4, the interpretation would be correct, wouldn't it?

Agreed, but I think all of us don't want this to be the outcome.

I just can't imagine that the total time of "write in Lean 3, migrate to Lean 4" is less than "write in Lean 4", valid complaints about autocomplete and documentation notwithstanding.

To be frank, I can't imagine the opposite, given our familiarity with Lean 3 and formalising combinatorics therein. In addition, our dependencies are relatively small and stable, so in theory the switch to mathlib4 shouldn't be too problematic.

I guess my point relies on the premise that "the finish line" is a project running in Lean 4, and an intermediate step of having it running in Lean 3 is not particularly significant anymore.

This is understandable, although I would argue there's some significance in that it shows the proof is correct, the proof structure is sensible, our definitions (many of which haven't been formalised) are well-implemented, and gives a "blueprint" (so to speak) of how to write it in Lean 4 (under the hyper-pessimistic assumption that we have no way of auto-porting)

Scott Morrison (Jul 04 2023 at 23:48):

I'll just note here that Bhavik pointed out in a parallel private thread that this project is about a proof whose validity is still perhaps slightly uncertain -- or at least it is very new and not many people have thought about it yet --- and that part of the value of the project is just checking that it is really correct. I concede that with this motivation, my arguments about Lean 3 vs Lean are mostly moot.

Scott Morrison (Jul 04 2023 at 23:49):

(Because there would still be value even if no one else could ever use the code.)

Bhavik Mehta (Jul 04 2023 at 23:49):

Mario Carneiro said:

Bhavik Mehta said:

I'm not so sure it is roundabout - while there are speed improvements in Lean 4, compilation speed is not the limiting factor for us; instead digitising the proof in the first place is.

Honestly I wouldn't even say that compilation speed improvements are the main draw of lean 4. It may have happened by the by but it wasn't really a goal. It's just a lot more expressive and scriptable than it used to be, and a lot of the IDE integration things got significantly better.

That makes sense, but I don't believe these features would be useful for this particular project. That's not to say they're not valuable, of course, but for the context of deciding a language in which to do Yaël's summer internship, I don't think they are strong factors in either direction

Mario Carneiro (Jul 04 2023 at 23:50):

I should also mention that a fair number of academic projects using lean fall in this general camp: the goal is to show a theorem is true, not so much to integrate into a library

which you could say is a bit self-centered, but the nature of academic publishing pushes things in that direction

Bhavik Mehta (Jul 04 2023 at 23:50):

Scott Morrison said:

I'll just note here that Bhavik pointed out in a parallel private thread that this project is about a proof whose validity is still perhaps slightly uncertain -- or at least it is very new and not many people have thought about it yet

Yes, I want to say as much as possible in this thread to centralise discussion, hence the last paragraph of my last-but-one message, but I should've mentioned that sooner, thanks!

Notification Bot (Jul 05 2023 at 06:31):

20 messages were moved here from #general > Should scilib and mathlib be separate libraries? by Johan Commelin.

Kevin Buzzard (Jul 05 2023 at 06:34):

Scott Morrison said:

Mostly I'm looking at LTE and perfectoid here. :-)

Here the point was publicity and recruitment.

Johan Commelin (Jul 05 2023 at 06:37):

I think LTE helped us explore the design space for homological algebra. It taught us a lot in that regard. And the tangible result is that @Joël Riou is now PRing a well-designed library to mathlib 4. So even if LTE/for_mathlib doesn't get emptied literally, it certainly does in a figurative sense.

Kevin Buzzard (Jul 05 2023 at 06:38):

And the analogous payoff won't exist for leanAPAP because anything you learn about how best to do something in lean 3 is irrelevant in the long term.

Reid Barton (Jul 05 2023 at 06:45):

In the sense that everything is irrelevant in the long term?

Mauricio Collares (Jul 05 2023 at 06:45):

Yaël Dillies said:

Point is: the Lean 4 cons are impediments to writing the project, while the Lean 3 con is an impediment to the integration to mathlib. If we port our project once it's over (give it 3-4 weeks), then we get the best of both worlds (kinda?)

3-4 weeks?! It's amazing that a breakthrough this big and recent can be formalized in a month or two (and the Lean 3 choice makes more sense now, provided it's ported later).

Mauricio Collares (Jul 05 2023 at 07:06):

One other thing not mentioned yet is that (unless I am misunderstanding the situation, which is likely) there is no template for Lean 4 formalization projects yet, in the sense that you can't just copy a pre-existing project, fill in stuff and get a nice dependency graph

Sebastian Ullrich (Jul 05 2023 at 07:24):

Bhavik Mehta said:

To be clear, I think the situation would be very different if Lean 4 had a stable release [...]

I would like to understand what people mean by this: is it "We put a version number instead of a date on this commit" or "We put extra effort in testing this commit compared to day-to-day development in order to prevent unintended regressions since the previous release"? Because at least the official development phase of Lean 3 certainly was closer to the former. Lean 3 only became de facto stable because development slowed down.
For the current state of Lean 4 releases, please see https://github.com/leanprover/lean4/pull/2304/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R1-R2

Mac Malone (Jul 05 2023 at 09:03):

@Sebastian Ullrich I think people are psychologically predisposed towards something called a "stable release" as opposed to a "nightly release" and version numbers help provide some sense of feature-completeness. I know I feel some of this. Beyond that, another concern is nightly builds can sometimes just break in unexpected ways (e.g. unforeseen regressions like #2282, cases where release files are missing, or incomplete features because the nightly was unfortunately built between related sets of changes).

A "stable" release schedule where a commit the dev team is satisfied with is tagged as a release candidate (with version number), important ecosystem libraries (e.g., mathlib, std4) are updated to it to ensure no regressions / breakage, and then the release candidate is retagged as stable may be a good idea.

Mac Malone (Jul 05 2023 at 09:10):

For version numbers, I would suggest a naming scheme of 4.major.minor where minor is essentially just the build number from the last major and major is only bumped when there has been a large number of accumulated changes or some new feature is added the dev team particularly wants to highlight (or is likely to cause a lot of breakage).

Yaël Dillies (Jul 05 2023 at 09:10):

Mauricio Collares said:

3-4 weeks?! It's amazing that a breakthrough this big and recent can be formalized in a month or two (and the Lean 3 choice makes more sense now, provided it's ported later).

We're picking our fights carefully! Bhavik and I discussed the previous breakthrough when it came out and it was clear that it would be a big undertaking. The Kelley-Meka proof is much more accessible. Add to that that Bhavik and I are both experts of formalising analytic combinatorics and that Thomas is preparing us detailed write-ups of the proofs and you get such a short time.

Yaël Dillies (Jul 05 2023 at 09:13):

If you want to follow along, here is our blueprint. It currently only has the finite field case, so don't get too misled by the completion rate. What's missing is the integer case. It's basically the same as the finite field case except that the start and end steps now require Bohr sets to be used in place of subspaces, hence are harder (subspaces are algebraic objects, Bohr sets are analytic ones).

Bhavik Mehta (Jul 05 2023 at 10:22):

Mac said:

Sebastian Ullrich I think people are psychologically predisposed towards something called a "stable release" as opposed to a "nightly release" and version numbers help provide some sense of feature-completeness. I know I feel some of this. Beyond that, another concern is nightly builds can sometimes just break in unexpected ways (e.g. unforeseen regressions like #2282, cases where release files are missing, or incomplete features because the nightly was unfortunately built between related sets of changes).

A "stable" release schedule were a commit the dev team is satisfied with is tagged as a release candidate (with version number), important ecosystem libraries (e.g., mathlib, std4) are updated to it to ensure no regressions / breakage, and then the release candidate is retagged as stable may be a good idea.

Yes this is a pretty good summary of what I meant. Phrases like "currently being released as milestone releases towards a first stable release. Lean 3 is still the latest stable release" which were on the Lean 4 readme (I know it's now been changed, but this is what it was when we made this decision) felt to me like the developers weren't confident there would not be breaking changes in further nightly releases. That's not to say, by any means, that the wording was misleading or that the developers should have more confidence or anything like that (it's far more important to me that Lean 4 is good than that it's now!), but just that if something goes "weird" in Lean 3, there's a pretty good chance I know how to fix it, whereas in Lean 4 it might be an unforeseen regression or a deliberate breaking change or just a cryptic error message, and it's this unknown that I'd prefer to avoid.

Bhavik Mehta (Jul 05 2023 at 10:28):

Mauricio Collares said:

3-4 weeks?! It's amazing that a breakthrough this big and recent can be formalized in a month or two (and the Lean 3 choice makes more sense now, provided it's ported later).

Yaël has given a pretty good idea of why we think it can be done in a month or two, but on the other side I'd like to give some broader context and links on why the breakthrough is so big: Gil Kalai's blog post on the result, Tim Gowers' thread on the result and Quanta article on the result. We're also formalising an improved version of the result with (slightly) stronger bounds and a (slightly) easier proof than is talked about there.

Bhavik Mehta (Jul 05 2023 at 10:34):

Kevin Buzzard said:

And the analogous payoff won't exist for leanAPAP because anything you learn about how best to do something in lean 3 is irrelevant in the long term.

I don't agree here - in the same way that LTE helped guide homological algebra in Lean 3 and hence in Lean 4, I think LeanAPAP can guide the design of Bohr sets, iterative arguments in Lean 4, and all the other new tools that are needed for this proof. If nothing else, we gain an understanding of how to implement and nicely work with these definitions and techniques in dependent type theory

Bhavik Mehta (Jul 07 2023 at 16:58):

Scott Morrison said:

I'll just note here that Bhavik pointed out in a parallel private thread that this project is about a proof whose validity is still perhaps slightly uncertain -- or at least it is very new and not many people have thought about it yet --- and that part of the value of the project is just checking that it is really correct.

I just want to clarify that what I meant here wasn't to cast any doubts on the Kelley-Meka paper or the Bloom-Sisask exposition of it. Instead the version of the proof we're hoping to formalise isn't public and just hasn't been read by many people.


Last updated: Dec 20 2023 at 11:08 UTC