Zulip Chat Archive

Stream: Polynomial Freiman-Ruzsa conjecture

Topic: Blueprint is starting!


Terence Tao (Nov 15 2023 at 04:03):

Yael has set up a skeletal Blueprint for the project at https://teorth.github.io/pfr/blueprint/ as well as autodocumentation of the Lean methods in https://teorth.github.io/pfr/docs/ . I'll work on filling out the blueprint with lemmas and definitions that need to be filled in. It should be possible very soon to have doable tasks regarding Shannon entropy and jensen's inequality; I'm still figuring out how to implement conditioning a probability space to an event, but once that is done, most of the steps should be at least statable in Lean, if not provable. More soon.

Terence Tao (Nov 15 2023 at 20:47):

Thanks to Yael's tireless efforts, a partial blueprint is now available at https://teorth.github.io/pfr/blueprint/ , with an impressively complex looking dependency graph already at https://teorth.github.io/pfr/blueprint/dep_graph_document.html

Right now it mostly focuses on the Appendix A material (Shannon entropy and Ruzsa distance), but I plan to work today to fill in skeletal versions of the rest of the paper, and to create stub Lean files and stub theorems/definitions to connect more tightly with the blueprint.

There is at least one task that could already be done independently of the rest of the project: stating and proving the Ruzsa covering lemma (currently a stub at ruzsa_covering.lean). (This lemma is already on Mathlib)

Most of the rest of the project will require using the Ruzsa distance (currently defined as a stub at ruzsa_distance.lean), however it may be possible to start formalizing some parts of the argument without actually having a working definition of Ruzsa distance, by introducing a bunch of stub lemmas about the distance.

One portion of the paper that could be relatively easy to formalize is Appendix B. This is where the PFR conjecture is derived (in about a page of argument) from the entropic formulation of the PFR conjecture. The main tasks needed are

  1. To formalize the statement (if not the proof) of the entropic PFR conjecture (currently a stub in entropy_pfr.lean)
    2.Formalizing the statement of the aforementioned Ruzsa covering lemma (This is already on Mathlib)

  2. Formalizing a few simple lemmas about entropy.

Perhaps we could focus on this portion of the proof first in order to get some practice with workflow. If anyone here has prior experience with working with a collaborative formalization project, any advice here would be welcome.

Patrick Massot (Nov 15 2023 at 21:10):

In my experience, the best thing you can do to encourage collaboration is what you are already doing: writing the blueprint and stating lemmas. Then people can fill in proofs. Of course this is not a 100% reliable process. You are bound to do small mistakes in the statements (either stating slightly wrong lemmas or correct lemmas that are not exactly what you need), but this can be fixed later.

Terence Tao (Nov 15 2023 at 21:14):

Patrick Massot said:

In my experience, the best thing you can do to encourage collaboration is what you are already doing: writing the blueprint and stating lemmas. Then people can fill in proofs. Of course this is not a 100% reliable process. You are bound to do small mistakes in the statements (either stating slightly wrong lemmas or correct lemmas that are not exactly what you need), but this can be fixed later.

OK. I am seeing that I can add a lot of entropy lemmas as "sorries" to be filled in to entropy_basic.lean, which thanks to Remy is already in quite good shape. I will do that over the next few hours, already there are about four or five lemmas with sorries that people are welcome to take a crack at (I guess via pull requests would be the best option here).

Terence Tao (Nov 15 2023 at 21:15):

I suppose to avoid redundancy people should announce somewhere in this stream if they are working on anything in particular? Right now I am focusing on converting lemmas in the blueprint into stubs in the lean files (and by stubs, I really mean stubs: basically only the lemma name and description is present).

Yaël Dillies (Nov 15 2023 at 21:24):

I am going through the blueprint right now to reference lemmas that are already in mathlib.

Yaël Dillies (Nov 15 2023 at 21:25):

Also thanks to @Sky Wilshaw for squashing the bugs I couldn't squash myself.

Patrick Massot (Nov 15 2023 at 21:27):

Yes, it is important that people announce on Zulip that they want to work on something. They can create new topics in this stream.

Shreyas Srinivas (Nov 15 2023 at 21:33):

what is the preferred contribution mechanism? PRs from github forks of the repo or commits to non-master branches (since that was used in mathlib for some reason)?

Terence Tao (Nov 15 2023 at 21:40):

Shreyas Srinivas said:

what is the preferred contribution mechanism? PRs from github forks of the repo or commits to non-master branches (since that was used in mathlib for some reason)?

I don't know enough about github to express a preference, as long as it finds a way to the master repository eventually I'm happy. I imagine that it might depend on whether one is planning to do only a small localized contribution or to work on a large chunk of the files at a time?

By the way, I have finished adding stubs in entropy_basic.lean and tied them into blueprint (though it may take a while for this to propagate to the online version of the blueprint). So one set of tasks that are quite actionable right now would be to fill in the sorries (in the definitions, statements, and proofs) of the various stubs present. I'll take a break for now, and later work on adding new sections of the blueprint to cover the main proof (Sections 2-7, basically).

Shreyas Srinivas (Nov 15 2023 at 21:41):

I think it depends on the CI setup.

Shreyas Srinivas (Nov 15 2023 at 21:42):

Roughly, the stuff that automatically checks that all the i's are dotted and t's are crossed before merging commits.

Yaël Dillies (Nov 15 2023 at 21:43):

I am doing that just now.

Yaël Dillies (Nov 15 2023 at 21:44):

There's CI for both commits and PRs, so that won't be the discriminating factor.

Patrick Massot (Nov 15 2023 at 21:46):

The fact that Mathlib uses PR from branches of the main repo is a very unusual setup that comes only from our olean cache mechanism (the thing you use with lake exe cache get). There is no reason to do that unless the project becomes so huge that you need a way to distribute a precompiled version using the lake exe cache get mechanism.

Johan Commelin (Nov 16 2023 at 09:57):

It is really great to see how fast this project is moving. The dependency graph indeed looks impressive.
One silly suggestion by an enthusiastic bystander: There are many outgoing edges from entropy-def and ruz-dist-def. It might be helpful to apply a bit of transitive reduction to the graph, so that it will look less cluttered, and become easier to grok.

Yaël Dillies (Nov 16 2023 at 10:36):

I was thinking this as well. I was going to message Terence with two remarks but let me put them here:

  • A blueprint is useful when there aren't too many items. We may want to not make one blueprint item per theorem/lemma if the results are easy. Typically, I think we could state the mutual information inequalities as corollaries of the corresponding entropy ones.
  • The labels should be quite meaningful since it's the first way someone will interact with the blueprint.

Yaël Dillies (Nov 16 2023 at 10:37):

I have a few ideas how to improve the blueprint from the software point of view:

  1. We should automatically eliminate transitive edges (and maybe throw a warning). We need to be careful so as to what counts as transitive, since there are two types of edges.
  2. We should throw a warning when a theorem is marked as proved but not stated (this will disappear if we use better macros.
  3. We should record what part of the blueprint an item is coming from, either manually by user input in the macros or by looking at the section numper (could make that a default behavior). This will help scale the blueprint because we could then allow several filtered views of the online blueprint.

Yaël Dillies (Nov 16 2023 at 10:39):

@David Thrane Christiansen, how does that fit with your plans? (and how long will it take until I can play with your documentation things?)

Terence Tao (Nov 16 2023 at 15:54):

Yaël Dillies said:

I was thinking this as well. I was going to message Terence with two remarks but let me put them here:

  • A blueprint is useful when there aren't too many items. We may want to not make one blueprint item per theorem/lemma if the results are easy. Typically, I think we could state the mutual information inequalities as corollaries of the corresponding entropy ones.
  • The labels should be quite meaningful since it's the first way someone will interact with the blueprint.

I agree. To be honest I've been spending almost all my time dumping content into the blueprint, and have put almost zero effort into making it presentable or cleaning it up. In particular I have no particular attachment to the labels or the Lean method names, these were placeholders that I threw together quickly in order to proceed with the mathematical content.

I had assumed that transitive reduction was already taking place so I tried to be maximalist with what definitions were needed to formulate results. But I have a question: a number of results whose statement requires only definition X, but whose proof requires lemma Y that use definition X. In such cases should I still put \uses{X} in the statement, even if the proof will have uses{Y} and thus, by transitivity, uses{X}? I ask because in the intermediate stage in which the lemma is stated but not proved, it only needs X, but I am not sure whether the dependency graph relies on both the \uses statements in the statement, or in both the statement and the proof.

I had similarly taken a maximalist approach to stating all the lemmas used, even the trivial corollaries, in the blueprint, but again I am not attached to this practice. Is it possible to tie multiple lean lemmas to one blueprint lemma? In that case I could concatenate several closely related statements into one blueprint lemma while still keeping all the lean lemmas separate. For instance a blueprint lemma might say "If hypothesis H is true, the conclusions C1, C2, and C3 are true" which would be linked to three lean lemmas H_implies_C1, H_implies_C2, H_implies_C3.

I guess the broader philosophical question is where the blueprint stands on the spectrum between the human-readable proof and the Lean-compiling proof. I had assumed it was closer to the latter than the former so I tried to make it be roughly in one-to-one correspondence with the Lean code, but if something more intermediate is desirable, I can certainly work towards that instead.

Patrick Massot (Nov 16 2023 at 15:58):

I could work on the graph infrastructure to do transitive reduction if needed (I wrote the Lean blueprint tool). You can already attach multiple Lean declaration to a single LaTeX statement. See an example here.

Terence Tao (Nov 16 2023 at 16:00):

Patrick Massot said:

I could work on the graph infrastructure to do transitive reduction if needed (I wrote the Lean blueprint tool). You can already attach multiple Lean declaration to a single LaTeX statement. See an example here.

OK, good to know. Once I finish dumping in all the mathematical content into the blueprint (there's just one section of the paper to go, namely the "Endgame"), and also get the blueprint to compile (I've been having significant technical issues with that, but that's another story), I can try to streamline the blueprint by concatenating closely related results into single lemmas, and maybe demoting some trivial observations to remarks (and moving their links to lean methods to some other logical location).

Yaël Dillies (Nov 16 2023 at 16:03):

Terence Tao said:

To be honest I've been spending almost all my time dumping content into the blueprint, and have put almost zero effort into making it presentable or cleaning it up.

I think you should continue like that. The software has to get smarter, rather than the human working harder.

David Thrane Christiansen (Nov 17 2023 at 08:49):

Yaël Dillies said:

David Thrane Christiansen, how does that fit with your plans? (and how long will it take until I can play with your documentation things?)

As far as concrete features in the blueprint system go, my goal is to make it easy for you to build them, rather than to build them myself - that way, different users with different needs can all be well-served. If you'd like to set up a call to go over details and make sure everything important is on my radar, then let's do so!

I hope to have a release ready for people to play with and test by the end of the year, but I expect large improvements early next year too.

Kevin Buzzard (Nov 18 2023 at 11:24):

Has the graph at https://teorth.github.io/pfr/blueprint/dep_graph_document.html gone missing for everyone or just me?

Rémy Degenne (Nov 18 2023 at 11:26):

Missing for me.

Yaël Dillies (Nov 18 2023 at 11:46):

Uh oh. Let me see.

Yaël Dillies (Nov 18 2023 at 11:55):

There's a bad interaction between the HTML and one of the item names in the blueprint. A missing ) bracket, but I can't see it yet.

Mauricio Collares (Nov 18 2023 at 12:16):

The troublesome item name is uniform-entropy' (the single quote in particular).

Terence Tao (Nov 18 2023 at 17:40):

I don't have a local copy of blueprint and so can't diagnose the problem, but for now I just replaced uniform-entropy' with uniform-entropy-II throughout and hopefully we at least get the blueprint built again while we wait for a proper explanation of what is going on. [EDIT: I see the graph again!]

Utensil Song (Nov 20 2023 at 05:07):

The issue is with the dot graph source string. In dot, double quotes " are used, but when it's emitted to JavaScript by Python, it's a single quote ' string in JS, unescaped, causing the parse error in JS. I tried a naive fix to replace all ' to \' but it didn't work, so I pushed a fix here using backtick quote string which doesn't interfere with both " in dot and ' in theorem name.

Yaël Dillies (Nov 20 2023 at 08:25):

I just added instructions to the README to build the blueprint locally. Please tell me if they don't work for you!

Johan Commelin (Nov 20 2023 at 17:35):

https://teorth.github.io/pfr/blueprint/dep_graph_document.html suggests that the project is ~50% done. Is that a reasonable guesstimate?

Yaël Dillies (Nov 20 2023 at 17:40):

The blueprint is a bit inflated towards the start by the fact that every entropy lemma is its own item.

Terence Tao (Nov 20 2023 at 17:43):

On the other hand, many of the later theorems are basically proven by applying a sequence of a half-dozen entropy lemmas and doing some basic algebra. Modulo technical issues (e.g. right now I am struggling slightly to get from joint independence of three variables to pairwise independence of any two of them, the issue being techical due to manipulation of a vector type) I could imagine the latter part being rather straightforward and also quite crowdsourceable, though the low-level implementation details may well be a stumbling block.


Last updated: Dec 20 2023 at 11:08 UTC