Zulip Chat Archive

Stream: Polynomial Freiman-Ruzsa conjecture

Topic: mathlib-only approach to entropy

Rémy Degenne (Nov 15 2023 at 11:36):

I saw that the PFR repository contains definitions of new classes like ProbabilitySpace, and I wanted to see what an implementation of entropy would look like without those, with only the classes defined in mathlib. The goal is to have a baseline against which we can compare the new classes, to see what they bring.

The code is here: https://github.com/leanprover-community/mathlib4/compare/RD_entropy (on branch#RD_entropy). This is a branch of mathlib and not of the pfr project because that code does not use anything from the project, so it was faster for me that way.

What the file contains:

several auxiliary lemmas about x* log x and ProbabilityTheory.cond. Ignore those.
a definition of the entropy of a measure.
a definition of the entropy of a random variable (as entropy of its law)
a definition of conditional entropy
a proof of the chain rule, to check that the definitions can be used together
one sorry, but it is not on the path to the chain rule and corresponds to something already proved in the pfr project

I implemented the entropy of a measure as an intermediate step because I wanted to get rid of all the Y ⁻¹' {y} when possible.

Some observations on this mathlib-only approach:

overall, it works well. In particular there is no type theory hell
writing volume.map X for the law of X is not ideal, we need some notation
the toReal everywhere are annoying, as are all the rw steps to juggle with ENNReal and Real
all measures coming from conditioning are written explicitly in proofs, which is not very readable. We see things like (ℙ[|Y ⁻¹' {y}]).map X {x} (this is the probability that X = x given Y = y)

Terence Tao (Nov 15 2023 at 15:09):

Rémy Degenne said:

I saw that the PFR repository contains definitions of new classes like ProbabilitySpace, and I wanted to see what an implementation of entropy would look like without those, with only the classes defined in mathlib. The goal is to have a baseline against which we can compare the new classes, to see what they bring.

Thanks for this! I quite like this approach, and it is good for me to see how the Mathlib methods (and style) can be used to address the problems I was facing (in particular, the need to consider "probability spaces" with zero measure). Particularly liked

The use of Mathlib naming conventions (e.g., negMulIdLog to denote the function $x \mapsto -x \log x$ , which had called h)
The separate introduction of a measureEntropy function (which also makes sense mathematically as a useful concept), and how it can handle arbitrary measures (including infinite measures), not just probability measures and the zero measure
Use of existing Mathlib methods and classes like MeasureTheory.Measure.Map, integral_nonneg and MeasurableSingletonClass (and the existing conditional probability notation)

It's a shame that all the ENNReals come back now that one is going through the Mathlib Measure theory (my initial thought was to develop a restricted subset of that theory for ProbabilitySpace that only uses NNReal) but I can live with this. Perhaps all that is needed is a separate suite of helper lemmas relating ENNReal with Real (though once one gets out of the foundations of conditioning, the ENNReals should hopefully become invisible). This seems more like an issue concerning further development of the MeasureTheory section of Mathlib than for this specific project, though.

Using more conventional Mathlib notation like entropy X or condEntropy X Y in place of H[ X ] or H[ X | Y ] works for me. Mathematically one would typically use something like $\mu_X$ to denote the law volume.map X of X, perhaps some notation to that effect could be used. I initially didn't like (fun ω ↦ (X ω, Y ω)) so much, which is why I sought a more compact notation (X,Y) for it, but I think I can get used to it.

The one mild concern I have going forward is how to deduce conditional results quickly from unconditional results. A typical example would be that starting with the unconditional Shannon inequality entropy (fun ω ↦ (X ω, Y ω)) ≤ entropy X + entropy Y, to deduce as a corollary the conditional Shannon entropy condEntropy (fun ω ↦ (X ω, Y ω)) Z ≤ condEntropy X Z + condEntropy Y Z. My old approach was designed with the intention to facilitate this type of deduction (which occurs frequently in the paper) by applying the unconditional result to conditioned random variables $(X|Z=z), (Y|Z=z)$ for all z, and then integrating over the law of Z. As I now realize this generates a certain amount of "dependent type hell". In your current writeup I notice that you don't directly use the unconditional lemmas to prove the conditional ones, but instead adapt the proofs of the former to establish the latter. This is fine at this level, but later in the paper there will be unconditional entropy inequalities that have rather complicated proofs, and the quickest way to derive their conditional analogues will be to apply the unconditional inequalities (as a "black box") to conditioned variables and then integrate. Do you expect your framework to be able to support this type of argument? If so, then I am 100% sold on this approach and would be happy to accept a PR that replaces the probability space and entropy stuff I had already written with your version.

Rémy Degenne (Nov 15 2023 at 15:17):

I think my approach needs one more definition: def entropy' (μ : Measure Ω) (X : Ω → S) := measureEntropy (μ.map X).
Then the unconditional definition is an example of this and the conditional definition is simply the expectation of another example of this. That way I should be able to avoid most duplication of proofs.

Rémy Degenne (Nov 15 2023 at 15:18):

Also, I added notations to my code in the last few minutes:

* `Hm[μ] = measureEntropy μ`
* `H[X] = entropy X`
* `H[X | Y ← y] = Hm[(ℙ[| Y ⁻¹' {y}]).map X]`
* `H[X | Y] = condEntropy X Y`, such that `H[X | Y] = (volume.map Y)[fun y ↦ H[X | Y ← y]]`

Rémy Degenne (Nov 15 2023 at 15:22):

I tried to introduce a notation for volume.map X, but when I use the notation lean apparently cannot guess what measure space I am talking about. I am not very experienced with notations so the problem might come from me.

Terence Tao (Nov 15 2023 at 15:58):

Rémy Degenne said:

I tried to introduce a notation for volume.map X, but when I use the notation lean apparently cannot guess what measure space I am talking about. I am not very experienced with notations so the problem might come from me.

Yeah, I encountered this problem a lot and had to manually fill in the MeasureSpace type instances using @ notation. Also I discovered that all this notation is unfolded anyway in the Infoview, so it was less useful than I expected. I guess the path of least resistance is to not use much notation and just use the default MathLib syntax most of the time.

I have backed up my now obsolete versions of probability_space.lean, entropy_basic.lean, etc. and will try not to touch them for a while (I need to work on the blueprint, which is not building :-(), so feel free to work on a merge. (I guess the intent is to update the Mathlib files and then have the PFR code just refer to the updated Mathlib file?)

One thing looking beyond this project: for this project, it is enough to work with entropies on a Fintype, in which case the entropies are Real (or NNReal). In principle one could eventually consider entropies on a MeasurableSingletonClass in which case they would take values instead in our friend ENNReal. But developing entropy for this project in the ENNReal category brings back all these lovely casting issues. If we wanted to eventually move the entropy stuff to mathlib, would this be implemented using two different versions of entropy and some coercion lemmas?

Rémy Degenne (Nov 15 2023 at 16:02):

Rémy Degenne said:

I think my approach needs one more definition: def entropy' (μ : Measure Ω) (X : Ω → S) := measureEntropy (μ.map X).
Then the unconditional definition is an example of this and the conditional definition is simply the expectation of another example of this. That way I should be able to avoid most duplication of proofs.

To go further in that direction: I think that the clean way to proceed is to work only with definitions parametrized by a measure. That is, instead of entropy with respect to the volume, use entropy wrt some arbitrary measure. Same for the conditional entropy. Then when you want a result about a conditional quantity you apply the generic result to the conditional measure (and integrate).
I tried to work with MeasureSpace because I did not want to depart too much from what was in the PFR repository, but imposing a preferred measure is restrictive.

Rémy Degenne (Nov 15 2023 at 16:05):

Terence Tao said:

I have backed up my now obsolete versions of probability_space.lean, entropy_basic.lean, etc. and will try not to touch them for a while (I need to work on the blueprint, which is not building :-(), so feel free to work on a merge. (I guess the intent is to update the Mathlib files and then have the PFR code just refer to the updated Mathlib file?)

The intent is not necessarily to merge to mathlib, no. That would mean deciding on the correct generality beyond Fintype, etc. I wrote that code in a mathlib branch purely because that's my usual way to code. I'll make a PR to the PFR repository.

Terence Tao (Nov 15 2023 at 16:05):

Rémy Degenne said:

To go further in that direction: I think that the clean way to proceed is to work only with definitions parametrized by a measure. That is, instead of entropy with respect to the volume, use entropy wrt some arbitrary measure. Same for the conditional entropy. Then when you want a result about a conditional quantity you apply the generic result to the conditional measure (and integrate).
I tried to work with MeasureSpace because I did not want to depart too much from what was in the PFR repository, but imposing a preferred measure is restrictive.

Is there some way to have the best of both worlds - an entropy functional defined for arbitrary measures, but some way to automatically use the default measure when working in a MeasureSpace? If one has to specify the measure every time we use entropy then this will create clutter when proving the main theorems.

Rémy Degenne (Nov 15 2023 at 16:07):

There is volume_tac for definitions. For example Integrable f is short for Integrable f volume thanks to that tactic. I don't know how it interacts with notations.

Rémy Degenne (Nov 15 2023 at 16:07):

See docs#MeasureTheory.Integrable (it's not clear in the doc, but you see volume_tac if you look at the source)

Terence Tao (Nov 15 2023 at 16:09):

There is a potentially a minor issue when performing double conditioning, e.g., moving from condEntropy X Y ≤ entropy X to condEntropy X (fun ω ↦ (Y ω, Z ω)) ≤ condEntropy X Z, which is that conditioning is currently defined only for probability measures, when one conditions on Z one could potentially be conditioning to an empty event and one no longer gets a probability measure. One may possibly need to extend condEntropy to non-probability measures the same way it is already done for entropy.

Oh wait, never mind, you don't actually assume you have a probability measure when defining condEntropy.

Johan Commelin (Nov 15 2023 at 16:18):

Also I discovered that all this notation is unfolded anyway in the Infoview, so it was less useful than I expected.

This means that some "delaborator" is missing. I still don't know how to write those, but there are plenty people here who do. So I don't think you have to give up on nice notation yet (-;

Adam Topaz (Nov 15 2023 at 16:19):

Which notation is missing a delaborator?

Johan Commelin (Nov 15 2023 at 16:25):

Probably the notation in the file that Remy wrote, like here: https://github.com/leanprover-community/mathlib4/compare/RD_entropy#diff-c6edb06aa151c687d884bd60c05ef9525002a8c7ea5c8d8b9bd076b4f5d3419aR222

Adam Topaz (Nov 15 2023 at 16:26):

Aha ok.

Rémy Degenne (Nov 15 2023 at 16:26):

I don't know if it's that one. The simple notations I introduced here look fine in the infoview on my end.

Terence Tao (Nov 15 2023 at 16:28):

In my (now obsolete) build, notation like P[E] is automatically unfolded for some reason to (fun s ↦ ENNReal.toNNReal (↑↑↑(finiteMeasure Ω) s)) E in the infoview.

Terence Tao (Nov 15 2023 at 16:29):

I suspect it has something to do with the finiteMeasure class, which seems like it is held together with Lean's equivalent of duct tape

Shreyas Srinivas (Nov 15 2023 at 16:35):

subtype or coe? coe is a very common duct tape in lean

Kyle Miller (Nov 15 2023 at 16:36):

@Rémy Degenne If you change notation to notation3, I just checked that it makes your H[X | Y ← y] notation pretty printable. Right now, mathlib's notation3 command tends to be better at making pretty printable notations.

Terence Tao (Nov 15 2023 at 16:37):

Terence Tao said:

Is there some way to have the best of both worlds - an entropy functional defined for arbitrary measures, but some way to automatically use the default measure when working in a MeasureSpace? If one has to specify the measure every time we use entropy then this will create clutter when proving the main theorems.

A little more specifically: I estimate about 90% of the arguments in the paper (outside of the Shannon entropy appendix) take place in a MeasureSpace with a preferred measure, and then there are 10% that proceed by saying "by applying the previous unconditional inequality to a conditioned measure and integrating, we obtain the following conditional inequality as a corollary". So a "best of both worlds" approach would be preferable in which we can work with a preferred measure by default, but make it easy to apply the conditioning argument whenever necessary.

Adam Topaz (Nov 15 2023 at 16:38):

@Kyle Miller notation3 doesn't properly generate the unexpander for P[ E ] unfortunately in the way it's currently defined.

Adam Topaz (Nov 15 2023 at 16:38):

(deleted)

Adam Topaz (Nov 15 2023 at 16:39):

It looks like all the coercions there might be throwing something off.

Kyle Miller (Nov 15 2023 at 16:41):

If you point me to the line I can take a look

Adam Topaz (Nov 15 2023 at 16:43):

https://github.com/teorth/pfr/blob/e9e89403c9cf0f3b9e07cb4bae8a3189cb28ccb7/PFR/probability_space.lean#L54

Adam Topaz (Nov 15 2023 at 16:43):

The notation is used here, for example: https://github.com/teorth/pfr/blob/e9e89403c9cf0f3b9e07cb4bae8a3189cb28ccb7/PFR/probability_space.lean#L61C6-L61C6

Adam Topaz (Nov 15 2023 at 16:44):

I just tried changing that to notation3 and there is no change in behavior in the infoview.

Rémy Degenne (Nov 15 2023 at 16:44):

Terence Tao said:

Terence Tao said:

Is there some way to have the best of both worlds - an entropy functional defined for arbitrary measures, but some way to automatically use the default measure when working in a MeasureSpace? If one has to specify the measure every time we use entropy then this will create clutter when proving the main theorems.

A little more specifically: I estimate about 90% of the arguments in the paper (outside of the Shannon entropy appendix) take place in a MeasureSpace with a preferred measure, and then there are 10% that proceed by saying "by applying the previous unconditional inequality to a conditioned measure and integrating, we obtain the following conditional inequality as a corollary". So a "best of both worlds" approach would be preferable in which we can work with a preferred measure by default, but make it easy to apply the conditioning argument whenever necessary.

I just pushed an update to the file to not use MeasureSpace, but use volume_tac. At the end of the file, I added an example in a MeasureSpace, to show that we don't need to write the measure explicitly.

Yakov Pechersky (Nov 15 2023 at 16:51):

Regarding variadic notation that handles a proof obligation (using a tactic) based on the inputs, there is the c[1, 2, 3, 4] cycle notation that only works if the inputs are nodup:
https://github.com/leanprover-community/mathlib4/blob/3fbcf0e064d06994e24f42a8b5c0245ddc9115d0/Mathlib/GroupTheory/Perm/Cycle/Concrete.lean#L521C1-L522
Implemented thanks to Kyle and Patrick

Terence Tao (Nov 15 2023 at 17:05):

Rémy Degenne said:

I just pushed an update to the file to not use MeasureSpace, but use volume_tac. At the end of the file, I added an example in a MeasureSpace, to show that we don't need to write the measure explicitly.

Looks good. This does seem like the "best of both worlds" I was looking for. (I also assume it would be a trivial matter to take a result proven for a MeasureSpace and conclude the analogous result for an arbitrary measure, by building the MeasureSpace associated with the measure.)

One tangentially related thing. I chose 100 rather arbitrarily as the precedence for notation like P[ E ] because I didn't really understand this precedence concept (in particular, I didn't find a way to easily look up the precedence for other notations such as the arithmetic operations). Are there any conventions on this? One thing I noted was that (P[ E ])⁻¹ required parentheses no matter what precedence I chose. It's a very minor thing, but it would have been nice to not have to need these extra parentheses.

Rémy Degenne (Nov 15 2023 at 17:06):

I copied 100 from your code :sweat_smile: . I don't know what are the precedences of usual things. Does anyone know if we have documentation about that somewhere?

Kyle Miller (Nov 15 2023 at 17:08):

I'm not sure you need precedence for a notation that has enclosed brackets like this -- you probably can omit giving one.

Kyle Miller (Nov 15 2023 at 17:12):

Here are the basic arithmetic operators, which shows their precedences. Unfortunately there's no central documentation with all Lean syntax and precedences yet.

Kyle Miller (Nov 15 2023 at 17:20):

Terence Tao said:

notation like P[E] is automatically unfolded for some reason to (fun s ↦ ENNReal.toNNReal (↑↑↑(finiteMeasure Ω) s)) E in the infoview.

If you hand-simplify what it unfolds to, beta reducing it, then at least this pretty prints:

notation3 "P[ " E " ]" => ((finiteMeasure _).toMeasure.measureOf E).toNNReal

It seems to be docs#MeasureTheory.FiniteMeasure.instCoeFun that's introducing this non-beta-reduced term.

Similarly, there's

notation3 "P[ " E " ; " X " ]" => ((@finiteMeasure _ X).toMeasure.measureOf E).toNNReal

but then this overrides the first notation when pretty printing (e.g. P[E] comes out as P[ E ; inst✝¹ ]).

The notation3 command should be able to handle funs, but that's a new feature and hasn't been tested in practice.

Terence Tao (Nov 15 2023 at 17:29):

Kyle Miller said:

I'm not sure you need precedence for a notation that has enclosed brackets like this -- you probably can omit giving one.

Right now, omitting the parentheses to give P[ E ]⁻¹ returns the error unexpected token '⁻¹'; expected ':=', 'where' or '|'. It's moot because this notation is now obsolete, but I am still curious as to what the issue is. It doesn't seem to be a whitespace issue; adding or removing spaces doesn't fix the problem.

Kyle Miller (Nov 15 2023 at 17:35):

I guess you need max precedence to meet the precedence of ⁻¹. The barest of testing makes me think this doesn't lead to any other problems.

Kyle Miller (Nov 15 2023 at 17:36):

(max is a constant that stands for 1024, which is the precedence for function application)

Last updated: May 02 2025 at 03:31 UTC