Zulip Chat Archive
Stream: Polynomial Freiman-Ruzsa conjecture
Topic: Entropy
Yaël Dillies (Nov 15 2023 at 22:29):
I have now stated most of the basic lemmas about entropy, and I have also proved most of them.
Terence Tao (Nov 15 2023 at 23:26):
That's great progress already! I put in a whole bunch of stubs about Ruzsa distance and related concepts in ruzsa_distance.lean that now might be within reach now that we have at least formalizations of all the entropy lemmas. One milestone is the ruzsa triangle inequality which is the first slightly nontrivial statement about that distance. The most advanced one is the entropic Balog-Szemeredi-Gowers lemma, which may take some time to prove as it basically uses the full suite of entropy tools.
Shreyas Srinivas (Nov 15 2023 at 23:43):
There is a spurious end
in the latest commit
Shreyas Srinivas (Nov 15 2023 at 23:43):
in rusza_distance.lean
Shreyas Srinivas (Nov 15 2023 at 23:44):
'dist_nonneg' has already been declaredLean 4
dist_nonneg.{u} {α : Type u} [inst✝ : PseudoMetricSpace α] {x y : α} : 0 ≤ Dist.dist x y
import Mathlib.Topology.MetricSpace.PseudoMetric
Terence Tao (Nov 16 2023 at 00:51):
Shreyas Srinivas said:
'dist_nonneg' has already been declaredLean 4 dist_nonneg.{u} {α : Type u} [inst✝ : PseudoMetricSpace α] {x y : α} : 0 ≤ Dist.dist x y import Mathlib.Topology.MetricSpace.PseudoMetric
Presumably it will make sense to put all the Ruzsa distance lemmas in a single namespace to avoid this issue. In any case, renamed for now. (Feel free to also supply these sorts of corrections through pull requests, possibly in batches if there are a lot of them.)
Rémy Degenne (Nov 16 2023 at 08:07):
I opened a PR with a few entropy lemmas. I'm stopping here for today: I have to go to work.
About the entropy definition: one way to generalize entropy and conditional entropy to prove everything for one definition and get both as a consequence would be to do something similar as what I did for independence in mathlib, which is to use Markov kernels. We would have entropy of a random variable with respect to a kernel and a measure.
def kernel.entropy (X : Ω → S) (κ : kernel T Ω) (μ : Measure T) :=
μ[fun t ↦ measureEntropy ((κ t).map X)]
Then entropy is this with a constant kernel from a singleton space and a dirac measure, and conditional entropy of X wrt Y is this with μ = volume.map Y
and κ = fun y ↦ ℙ[|Y ⁻¹' {y}]
(or replace volume with any measure on Ω). That way, we don't even have the integration step when we want to deduce the conditional result: both conditional and unconditional are particular cases of this object.
I won't try that change, as I think the current setup we have is good enough for now, but that's an approach I would be tempted to use if adding all of this to mathlib.
Yaël Dillies (Nov 16 2023 at 08:26):
Once all the basic entropy lemmas are proved, I'll move PFR.entropy_basic
to PFR.Mathlib.Probability.Entropy
in sight of PRIng it to mathlib. Then I'll try this kernel refactor.
Bhavik Mehta (Nov 16 2023 at 22:34):
This isn't directly entropy, but I'd like to claim any remaining lemmas about h(x) = -x log x
, since I proved many of them (in slightly more generality) for the exponential Ramsey project
Terence Tao (Nov 17 2023 at 01:33):
Great! I think the main remaining lemma to prove about - x log x is the converse to Jensen's inequality https://teorth.github.io/pfr/blueprint/sect0001.html#converse-jensen . If you get that, you may be able to conclude https://teorth.github.io/pfr/blueprint/sect0002.html#conditional-vanish readily (though we already have https://teorth.github.io/pfr/blueprint/sect0002.html#vanish-entropy so that is another route). You should also get https://teorth.github.io/pfr/blueprint/sect0002.html#uniform-entropy as a quick consequence.
Terence Tao (Nov 17 2023 at 02:25):
Also if you happen to have already proven other interesting facts about x log x that can be easily ported into the neg_xlogx.lean file, I don't see much harm in adding them there even if they aren't directly used in the project. It's conceivable that the neg_xlogx.lean and entropy.lean files eventually get converted into MathLib submissions, so if you have results are potentially of wider value, it's worth putting them in.
Heather Macbeth (Nov 17 2023 at 04:03):
I couldn't find the equality case of Jensen in the library, so just wrote a proof: https://github.com/teorth/pfr/pull/16
Heather Macbeth (Nov 17 2023 at 04:08):
(An important moment in a project's life: the beginning of the ForMathlib
folder! Some projects' ForMathlib
folders become notorious ...)
Johan Commelin (Nov 17 2023 at 07:06):
Off topic aside: I just want to point out that also features in a no-go result concerning the liquid analytic ring structure on . See https://math.commelin.net/files/liquid_example.pdf for a semi-detailed write up. All ideas are due to Clausen--Scholze. All errors are mine.
Heather Macbeth (Nov 17 2023 at 19:45):
Terence Tao said:
You should also get https://teorth.github.io/pfr/blueprint/sect0002.html#uniform-entropy as a quick consequence.
I did this (https://github.com/teorth/pfr/pull/19), modulo working out which of the degenerate cases (zero measure, infinite measure, empty type, ...) are valid ... any measure theory expert want to think about those ones?
Sebastien Gouezel (Nov 17 2023 at 19:58):
I'll have a look tomorrow, but the only interesting case is clearly the one you've already done!
Rémy Degenne (Nov 19 2023 at 08:10):
Rémy Degenne said:
I opened a PR with a few entropy lemmas. I'm stopping here for today: I have to go to work.
About the entropy definition: one way to generalize entropy and conditional entropy to prove everything for one definition and get both as a consequence would be to do something similar as what I did for independence in mathlib, which is to use Markov kernels. We would have entropy of a random variable with respect to a kernel and a measure.
def kernel.entropy (X : Ω → S) (κ : kernel T Ω) (μ : Measure T) := μ[fun t ↦ measureEntropy ((κ t).map X)]
Then entropy is this with a constant kernel from a singleton space and a dirac measure, and conditional entropy of X wrt Y is this with
μ = volume.map Y
andκ = fun y ↦ ℙ[|Y ⁻¹' {y}]
(or replace volume with any measure on Ω). That way, we don't even have the integration step when we want to deduce the conditional result: both conditional and unconditional are particular cases of this object.I won't try that change, as I think the current setup we have is good enough for now, but that's an approach I would be tempted to use if adding all of this to mathlib.
Although I wrote that I would not do it, I finally defined a notion of entropy for kernels (see the open PR on the pfr github). It's not exactly the definition I wrote above, but the simpler μ[fun t ↦ measureEntropy (κ t)]
.
I was motivated by the proof of the chain rule for the conditional entropy: I needed to prove something that was basically a result about disintegration of kernels, so I figured I might as well write it for kernels.
I used the kernel definition to prove the conditional chain rule and the submodularity of entropy.
The chain rule for the entropy of the composition-product of kernels is rather cute:
Hk[κ ⊗ₖ η, μ] = Hk[κ, μ] + Hk[η, μ ⊗ₘ κ]
⊗ₖ
is the composition-product of kernels defined in mathlib, and ⊗ₘ
is the similar operation which takes μ : Measure α
and κ : kernel α β
and creates μ ⊗ₘ κ : Measure (α × β)
.
Terence Tao (Nov 19 2023 at 09:40):
Rémy Degenne said:
Although I wrote that I would not do it, I finally defined a notion of entropy for kernels (see the open PR on the pfr github). It's not exactly the definition I wrote above, but the simpler
μ[fun t ↦ measureEntropy (κ t)]
.
I was motivated by the proof of the chain rule for the conditional entropy: I needed to prove something that was basically a result about disintegration of kernels, so I figured I might as well write it for kernels.
I used the kernel definition to prove the conditional chain rule and the submodularity of entropy.
Nice! I've always been fond of probability kernels - they feel like the "right" way to do probability theory from a fully relativized (topos-theoretic?) point of view, but I've rarely had need to use them (except to get a concrete representation of conditional expectation for some ergodic theory applications). I'm a little surprised that you can develop this theory though without getting bogged down into measure-theoretic minutiae such as Radon measures and the Riesz representation theorem (though of course in the finite setting of this project those details are all trivial anyway). One can imagine that this formalism would also allow one to work with differential entropy relatively easily, though again that's not needed at all for the PFR project.
Rémy Degenne (Nov 19 2023 at 09:43):
Now I'm looking at the Ruzsa distance and thinking that it should get the same kernel treatment :)
Rémy Degenne (Nov 19 2023 at 09:44):
In mathlib I added results about disintegration of measures in a more general case, but not yet disintegration of kernels, precisely because there are a lot of measure-theoretic subtleties. Here in the finite case all is much easier.
Terence Tao (Nov 19 2023 at 10:03):
Rémy Degenne said:
Now I'm looking at the Ruzsa distance and thinking that it should get the same kernel treatment :)
That could well be the case. I've been avoiding thinking about exactly how to define conditional Ruzsa distance in a clean fashion; it's possible that the kernel approach will be nicer than just conditioning to each singleton at a time (though in this finite setting I'm sure pretty much any approach would work eventually).
I noticed that you no longer need MeasurableSingletonClass
much in the theory now. Again, not an issue in this finite setting, but it does seem like the more natural framework if one wants to extend the theory into continuous settings.
Rémy Degenne (Nov 19 2023 at 10:05):
I do need MeasurableSingletonClass
everywhere: the reason these are mostly deleted from the statements is that I put them in the variable declaration on top of the file.
Rémy Degenne (Nov 19 2023 at 10:08):
(deleted, was a wrong definition)
Last updated: Dec 20 2023 at 11:08 UTC