Zulip Chat Archive

Stream: Polynomial Freiman-Ruzsa conjecture

Topic: Welcome!


Terence Tao (Nov 13 2023 at 20:31):

Hi everyone. I am thinking of starting a project to formalize in Lean4 the recent proof of Timothy Gowers, Ben Green, Freddie Manners, and myself of the Polynomial Freiman-Ruzsa (PFR) conjecture (though, despite the name, it was actually first proposed by Katalin Marton). The paper is at https://arxiv.org/abs/2311.05762 . I have done one formalization project already (an elementary twelve-page paper); this paper is longer (28 pages) but actually I think a large part of it will be quite routine to formalize; the most challenging part in my opinion is actually Appendix A, in which the Shannon entropy inequalities and some of their consequences in additive combinatorics are established. In particular some thought may need to be given on how to formalize probability spaces and random variables for the purposes of establishing entropy inequalities; see the thread https://leanprover.zulipchat.com/#narrow/stream/217875-Is-there-code-for-X.3F/topic/Should.20there.20be.20a.20ProbabilitySpace.20class.3F for more discussion.

I've started a rudimentary github on this project at https://github.com/teorth/pfr , though there is currently not much content there.

I would be interested in opening up this project to be more collaborative than my previous project, which would require a bit more formal organization. In particular I am interested in trying to set up a blueprint for this proof as an early step, which from what I gather from https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/Lean.204.20project.20blueprint/near/398052467 is now available (at least in some beta form) for Lean4. Will be happy to accept volunteers to contribute to this project in whatever capacity they feel able to. In particular I would appreciate any advice on how to make a Lean collaboration work as smoothly and efficiently as possible.

Anyway, I guess this thread in particular can be used for general questions and discussions about the project. I don't have a particular timeframe in mind; the important thing for me is not so much to get the formalization done (we went over the paper pretty carefully, I'm quite confident it is correct already :wink:) but to understand the formalization process better, particularly with regards to the collaborative aspect. There is also the possibility that some of the lemmas and theorems formalized in this project could somehow get converted into something suitable for mathlib, but I will defer to the mathlib experts as to whether this is realistic (or even desirable).

Yaël Dillies (Nov 13 2023 at 22:48):

Very happy to participate! I was actually going to ask Tim about whether he was interested in a formalisation, because this is exactly the kind of things I do.

Yaël Dillies (Nov 13 2023 at 22:50):

The worst thing about you taking up Lean is that I don't get to formalise your breakthroughs anymore. :grinning:

Yaël Dillies (Nov 13 2023 at 22:51):

I can set the project up for you (mostly thinking about the blueprint), given that I very recently did it for LeanAPAP.

Bhavik Mehta (Nov 13 2023 at 23:07):

A couple of comments about the Shannon entropy stuff. I formalised some of the cases of the standard conditional entropy lemmas over finite probability spaces, but I held off on PRing it because mathlib probably ought to have entropy defined on a general probability space instead. (This was to formalise https://arxiv.org/abs/2211.09055 and the exponential ramsey project). I'm entirely ambivalent on which should be in this project. Doing it in a special case would be easier, but doing it in the general case would make moving to mathlib smoother.

Terence Tao (Nov 13 2023 at 23:12):

Bhavik Mehta said:

A couple of comments about the Shannon entropy stuff. I formalised some of the cases of the standard conditional entropy lemmas over finite probability spaces, but I held off on PRing it because mathlib probably ought to have entropy defined on a general probability space instead. (This was to formalise https://arxiv.org/abs/2211.09055 and the exponential ramsey project). I'm entirely ambivalent on which should be in this project. Doing it in a special case would be easier, but doing it in the general case would make moving to mathlib smoother.

The way I see it there is a hierarchy of generality for Shannon entropy stuff:

  1. Random variables on finite probability spaces, taking values in a finite set.
  2. Random variables on arbitrary probability spaces, taking values in a finite set.
  3. Random variables on arbitrary probability spaces, taking values in a finite subset of an infinite range.
  4. Random variables on arbitrary probability spaces, taking values in at most countable subset of an infinite range.

(There is also the theory of continuous (or "differential") entropy, intended for random variables with an absolutely continuous probability distribution measure, but that's somewhat distinct from this more discrete theory.)

For this particular project we could in fact get away with just 1., but it seems straightforward to formalize 2. and not much harder to do 3. 4. means a bit more fiddling around with infinite series and having to add assumptions all over the place that entropies are finite, but is certainly doable, if totally unnecessary for the current project. Perhaps if we write all the methods in a sufficiently "object oriented" fashion it might be possible to port over from say 2. to 4. in a manner that won't seriously break any code that relies on 2?

Bhavik Mehta (Nov 13 2023 at 23:15):

That sounds reasonable to me (and by your hierarchy the stuff I have is all 1). But I'd like to hear from the probability experts of mathlib for the differences expected between 2, 3, and 4. @Rémy Degenne, perhaps?

Terence Tao (Nov 13 2023 at 23:22):

Bhavik Mehta said:

That sounds reasonable to me (and by your hierarchy the stuff I have is all 1). But I'd like to hear from the probability experts of mathlib for the differences expected between 2, 3, and 4. Rémy Degenne, perhaps?

My main concerns are how to smoothly implement the following operations:

  1. Deriving conditional entropy (or entropic sumset inequalities) from their unconditional counterparts, e.g., deriving the conditional Shannon inequality H(X,YZ)H(XZ)+H(YZ) {\bf H}(X,Y|Z) \leq {\bf H}(X|Z) + {\bf H}(Y|Z) from the unconditional version H(X,Y)H(X)+H(Y) {\bf H}(X,Y) \leq {\bf H}(X) + {\bf H}(Y). I had in mind working with some sort of ProbabilitySpace class that supports a conditioning operation (and I think I now know how to handle conditioning to empty events, by allowing the probability measure to in fact be identically zero). But the sentiment I got from others is that this may be unnecessarily complicated, and there may be other approaches.

  2. We will often need to take independent trials of one or two random variables, and at one place (Lemma A.1) we need to take conditionally independent trials. Because everybody will have a finitely supported probability distribution function, one could build these trials by hand (with a custom sample space designed just to support those trials and no other variables), but in the interests of generality (particularly if working at hierarchy level 4) one may want to proceed using some of the measure theory machinery already in Mathlib. (But the full Kolmogorov extension theorem will not be required in this project.)

  3. We will be dealing with moderately complicated entropic quantities such as I(Y1+Y2:Y1+Y3Y1+Y2+Y3+Y4) {\bf I} (Y_1 + Y_2 : Y_1 + Y_3 | Y_1 + Y_2 + Y_3 + Y_4 ) , and I would like the notation to not get out of control (in particular I would like to hide the ambient probability measure as much as possible). This will be a particular concern if we ever want to generalize to higher p, where for instance we will often need to consider entropic quantities of tuples of random variables (not just individual random variables), e.g., H(X1,,Xp) {\bf H}(X_1,\dots,X_p).

Shreyas Srinivas (Nov 13 2023 at 23:33):

I would love to participate if I can be useful. Most of the paper itself (by which I mean I am taking the external citations on faith) doesn't seem to use math that I haven't learnt or encountered in theoretical CS. I cannot be 100% certain about it until I read enough to grasp the key arguments. I'd be grateful for any guidance in this matter.

Terence Tao (Nov 13 2023 at 23:38):

Shreyas Srinivas said:

I would love to participate if I can be useful. Most of the paper itself (by which I mean I am taking the external citations on faith) doesn't seem to use math that I haven't learnt or encountered in theoretical CS. I cannot be 100% certain about it until I read enough to grasp the key arguments. I'd be grateful for any guidance in this matter.

It's almost entirely self-contained. The Shannon inequalities make one or two appeals to Jensen's inequality but this can be taken as a black box and is almost formalized already both in the repo and in a couple other places. There is one small lemma from a previous paper of mine that is needed (Lemma 2.2, characterizing random variables of zero doubling) but this is a relatively easy lemma (the main tool needed will be a converse to Jensen's inequality that indicates when equality holds in the Shannon inequalities). The derivation of consequences of PFR, such as Theorem 1.3, are contained in other papers; one could imagine formalizing these applications as well as just the core Theorem 1.1 if we have enough capacity.

Shreyas Srinivas (Nov 13 2023 at 23:42):

I am familiar with at least some forms of Jensen's inequality, shannon entropy etc. It came up in last semester's ML theory reading group I organized among other places.

Bhavik Mehta (Nov 13 2023 at 23:42):

Shreyas Srinivas said:

I would love to participate if I can be useful. Most of the paper itself (by which I mean I am taking the external citations on faith) doesn't seem to use math that I haven't learnt or encountered in theoretical CS. I cannot be 100% certain about it until I read enough to grasp the key arguments. I'd be grateful for any guidance in this matter.

In principle, a blueprint should help a huge amount, especially for people in your position. With a well-written blueprint, someone can contribute without needing to understand the key arguments (and this happened a fairly large amount in LTE and sphere eversion, in my understanding)

Shreyas Srinivas (Nov 13 2023 at 23:43):

(deleted)

Kevin Buzzard (Nov 13 2023 at 23:45):

With LTE the blueprint kind of evolved at the same time as the formalisation. Really the key thing which made LTE parallelisable was the nice collection of sorrys in the repo at any given stage, which people could claim as their own and then they'd have a well-defined task to work on and wouldn't need to understand the entire argument.

Shreyas Srinivas (Nov 13 2023 at 23:45):

Bhavik Mehta said:

Shreyas Srinivas said:

I would love to participate if I can be useful. Most of the paper itself (by which I mean I am taking the external citations on faith) doesn't seem to use math that I haven't learnt or encountered in theoretical CS. I cannot be 100% certain about it until I read enough to grasp the key arguments. I'd be grateful for any guidance in this matter.

In principle, a blueprint should help a huge amount, especially for people in your position. With a well-written blueprint, someone can contribute without needing to understand the key arguments (and this happened a fairly large amount in LTE and sphere eversion, in my understanding)

This paper feels close enough to the kind of stuff we do in TCS.

Shreyas Srinivas (Nov 13 2023 at 23:45):

I would even say much more elegant. It isn't something that I can't understand, at least at a high level with some blackboxes with reasonable effort.

Terence Tao (Nov 13 2023 at 23:46):

Kevin Buzzard said:

With LTE the blueprint kind of evolved at the same time as the formalisation. Really the key thing which made LTE parallelisable was the nice collection of sorrys in the repo at any given stage, which people could claim as their own and then they'd have a well-defined task to work on and wouldn't need to understand the entire argument.

How early in the process of the LTE were the key definitions formalized, so that one could even have sorries in the first place?

Adam Topaz (Nov 13 2023 at 23:49):

Many definitions were either completely or partially made with sorry initially in LTE. The definitions that went into the first target were done very early on, but things like Ext and even the definition of condensed sets themselves didn't exist for a long time. In some sense you can write definitions using sorry, and state some lemmas that explain how the definitions should behave (also initially proved with sorry, for obvious reasons). But then you can use those lemmas without ever relying on any implementations.

Terence Tao (Nov 13 2023 at 23:56):

Adam Topaz said:

Many definitions were either completely or partially made with sorry initially in LTE. The definitions that went into the first target were done very early on, but things like Ext and even the definition of condensed sets themselves didn't exist for a long time. In some sense you can write definitions using sorry, and state some lemmas that explain how the definitions should behave (also initially proved with sorry, for obvious reasons). But then you can use those lemmas without ever relying on any implementations.

That's good to know. A large portion of the proof (pretty much everything outside the appendices, actually), could proceed in such a sorry fashion, once one fixes at least the syntax for the key concepts such as conditional Shannon entropy H[XY] {\bf H}[X | Y], conditional mutual information, and Ruzsa entropic distance (and its conditional versions), which are defined in terms of Shannon entropy and also independent trials. There will be a dozen or more small lemmas about entropy (including "obvious" things like H[X,Y]=H[Y,X]{\bf H}[X,Y] = {\bf H}[Y,X] that we often don't bother to state explicitly) but I could imagine an early stage of the formalization process identifying all these lemmas and adding them to a large pile of sorries for the portion of the project roughly corresponding to Appendix A of the paper, which I anticipate to be the hardest part actually (though the whole project doesn't look too scary to me, as formalization projects go).

Terence Tao (Nov 13 2023 at 23:58):

One thing to keep in mind is that different random variables may be defined on different sample spaces (in which case the concept of independence makes no sense, but some other statistics, such as Ruzsa distance, are still well defined), and one will have to pay a bit of attention to this when formalizing even the statements of most of the steps in the paper.


Last updated: Dec 20 2023 at 11:08 UTC