Zulip Chat Archive

Stream: Polynomial Freiman-Ruzsa conjecture

Topic: Pairing together two random variables


Terence Tao (Nov 14 2023 at 15:35):

The Shannon entropy notation used in the paper frequently involves pairs X,Y X, Y of random variables, e.g., H[X,Y]. H[ X, Y ]. Here X,Y X, Y live on the same probability space Ω \Omega but could take values in different ranges S,T S, T . The way I was thinking of implementing this in Lean was to have an operation pair that pairs together two random variables, something like this,

import Mathlib

def pair {Ω : Type*} {S : Type*} {T : Type*} ( X : Ω  S ) ( Y : Ω  T ) (ω : Ω) : S × T := ( X ω, Y ω)

notation:100 "⟨ " X " , " Y " ⟩" => pair X Y

example  {Ω : Type*} {S : Type*} {T : Type*} ( X : Ω  S ) ( Y : Ω  T ) ( Z : Ω  S × T ) (h : Z =  X, Y  ) (ω : Ω) : Z ω = (X ω, Y ω) := by
  rw [h]
  rfl

and now any notation for single random variables, e.g., Shannon entropy H[ X ], automatically extends to pairs as H[ ⟨ X , Y ⟩ ]. (Well, ok, one needs a little lemma that if Xand Y are measurable, then ⟨ X , Y ⟩ is also, but this is straightforward.)

I was wondering if (a) anyone knew of some existing methods in MathLib that duplicate this operation, and (b) if they have better suggestions for notation than ⟨ X , Y ⟩ (I guess it's not safe to overload the parenthesis pairing operation, especially since it is actually used in the definition of pair!)

Rémy Degenne (Nov 14 2023 at 15:46):

Your pair is called prod in mathlib lemmas. See docs#Measurable.prod_mk for the statement that this product is measurable. There is already parenthesis notation for it (see lemmas around the one I linked)

Adam Topaz (Nov 14 2023 at 15:49):

With some clever macros, it should be possible to arrange for H[...] to work with ... being an arbitrary (positive) number of terms.

Rémy Degenne (Nov 14 2023 at 15:50):

Oh sorry, I may have misunderstood. Is the goal of your code to get rid of fun ω ↦ (X ω, Y ω) in favor of something like ⟨ X, Y ⟩ ?

Adam Topaz (Nov 14 2023 at 15:51):

import Mathlib


def prod (f : α  β) (g : α  γ) : α  (β × γ) := fun x => (f x, g x)

syntax "H[" term,+ "]" : term

macro_rules
  | `(H[ $x ]) => `($x)
  | `(H[ $x, $xs,* ]) => `(prod $x H[ $xs,* ])

def foo :    := sorry

#check H[foo, foo, foo]

Adam Topaz (Nov 14 2023 at 15:51):

This H is of course not anything interesting, but just to illustrate the idea

Rémy Degenne (Nov 14 2023 at 15:59):

This approach for H is likely to have issues with the fact that (a,(b,c)) does not have the same type as ((a,b),c), right?

Adam Topaz (Nov 14 2023 at 16:00):

right, I'm associating everything to the right, as is the default for products in lean.

Terence Tao (Nov 14 2023 at 16:01):

Rémy Degenne said:

Oh sorry, I may have misunderstood. Is the goal of your goal to get rid of fun ω ↦ (X ω, Y ω) in favor of something like ⟨ X, Y ⟩ ?

Yeah, that's the intention. In the paper there are moderately complicated expressions such as I[ X, Y : Z, W | U, V ] appearing (the conditional mutual information of the pair X, Y and the pair Z, W relative to the pair U, V and it would be a lot of clutter to insert a large number of fun ω ↦ ... phrases into the notation. Having support for arbitrary tuples than just pairs would be nice, but for this paper pairs would suffice, and given the large number of entropy type functionals we consider (entropy, conditional entropy, mutual information, and conditional mutual information, as well as entropic Ruzsa distance) it might be too much to set up notation for each one separately. (If one were to generalize PFR to higher characteristic, though, a tuple version of this notation would be good, but that we can work out later.)

Incidentally there are an unusually large number of delimiters that coexist in this paper. In addition to the comma , used to pair random variables, there is the bar | used to denote conditioning, e.g., H[ X | Y = y ] and also H[ X | Y ] (the two have different meanings but are related by the formula H[ X | Y ] = ∑ y, P[ Y = y] * H[ X | Y = y ]), and the colon : to denote mutual information I [ X : Y ]. We had to introduce yet another delimiter, the semicolon ;, for Ruzsa distance d[ X ; Y ], in order to avoid collision with the other delimeters (for instance, we also have a conditional version d[ X | Z ; Y | W] of the Ruzsa distance concept). It can be a bit confusing unfortunately, but the notation is pretty standard in information theory (except for the semicolon).

Terence Tao (Nov 14 2023 at 16:04):

Rémy Degenne said:

This approach for H is likely to have issues with the fact that (a,(b,c)) does not have the same type as ((a,b),c), right?

There is going to be a basic lemma at some point that if one random variable Y is the composition of another X with an injection f, then X and Y have the same entropy (and similarly for other basic entropy functionals). So as long as there is a canonical injection (or bijection) relating (X,(Y,Z)) with ((X,Y),Z), I don't see this as more than a minor inconvenience. (In this particular paper, we can probably avoid using triples altogether, and certainly don't need quadruples or higher arity tuples, but I guess it's good to think about future uses also.) Certainly we will need the canonical bijection between (X,Y) and (Y,X) to prove basic statements such as H[ X, Y ] = H [ Y, X ] which are implicitly used in the paper.

Rémy Degenne (Nov 14 2023 at 16:08):

The sort of issue I was alluding to is not mathematical. I was thinking about the code getting cluttered with equivalences between products in every statement. But if there are rarely more than 2 random variables that's not a big issue.

Terence Tao (Nov 14 2023 at 16:13):

Rémy Degenne said:

The sort of issue I was alluding to is not mathematical. I was thinking about the code getting cluttered with equivalences between products in every statement. But if there are rarely more than 2 random variables that's not a big issue.

I don't see any way around this, other than to work with arbitrary tuple notation (xα)αI(x_\alpha)_{\alpha \in I} rather than try to build things up from the pair notation. I doubt one can make (a,(b,c)) definitionally equivalent to ((a,b),c) without breaking some important things.

Adam Topaz (Nov 14 2023 at 16:14):

((a,b),c) and (a,(b,c)) don't even have the same type :)

Adam Topaz (Nov 14 2023 at 16:15):

OTOH, I do think it would be quite easy to make some automation that can automatically create an equivalence between iterated products, and possibly even invisibly insert that equivalence as needed.

Terence Tao (Nov 14 2023 at 16:16):

(Of course, working with arbitrary tuples and with pairs just pushes the problem one level down, in that at some point one needs the canonical equivalence between (xα)αrange 2(x_\alpha)_{\alpha \in \mathrm{range}\ 2} and (x0,x1)(x_0,x_1), but somehow mathematicians manage to completely ignore the subtleties of this equivalence in their work.)

Yaël Dillies (Nov 14 2023 at 16:20):

I don't get this (a, (b, c)) vs ((a, b), c) business. We need to write something, right? And that something is the right-associated version that Adam's notation uses. We can provide an API for iterated products that makes it possible to consider them as left-associated even though they are not, but we need to choose an association for the notation itself.

Rémy Degenne (Nov 14 2023 at 16:30):

I don't object to the way the products associate. My point was simply this: if we don't need to consider products of many things (not more than 3), using these products of two variables and nesting them is fine. Otherwise, especially if we want to insert or remove variables in the middle, it will be very inconvenient very fast and we should do something else we usually do in these situations, like index all random variables and let H take a finset as input (like Finset.sum for example). But apparently there is no need for large products in this project, so the first approach is fine.

Yaël Dillies (Nov 14 2023 at 18:43):

Ah sure. We certainly can have a pi type or finset version of the notation and relate them to the nested product.

Adam Topaz (Nov 22 2023 at 02:42):

Adam Topaz said:

OTOH, I do think it would be quite easy to make some automation that can automatically create an equivalence between iterated products, and possibly even invisibly insert that equivalence as needed.

This is probably no longer relevant to the project, but I'll mention it just in case it's helpful. Motivated by this discussion, I wrote a little bit of automation that automatically creates equivalences between iterated products, and it was just merged into mathlib. With a fresh copy of mathlib, it's now possible to write:

variable {α β γ δ : Type*}

example : (α × β) × (γ × δ)  α × (β × γ) × δ := prod_assoc%
example : (α × β) × (γ × δ)  α × β × γ × δ := prod_assoc%

example (x : α) (y : β) (z : γ) (w : δ) :
    prod_assoc% ((x,y),(z,w)) = (x,(y,z),w) :=
  rfl

example (x : α) (y : β) (z : γ) (w : δ) :
    prod_assoc% ((x,y),(z,w)) = (x,y,z,w) :=
  rfl

Last updated: Dec 20 2023 at 11:08 UTC