Zulip Chat Archive

Stream: mathlib4

Topic: Rota's Entropy Theorem And Feeback Before A PR

Essam Abadir (Aug 12 2025 at 21:33):

I am writing to share the Lean formalized work of the late MIT Professor Gian-Carlo Rota, which I call Rota's Entropy Theorem (RET) - I hope I do it justice and would appreciate your advice in that regard. My sole role has been to formalize it in Lean and provide a proof that is as close to his original as possible.

Around the mid-1970's Rota proved equivalence between discrete Shannon Entropy and all the continuous physics entropies - a result that would be strongly disputed even today without a strong validation like Lean can provide.

a) Guidance before doing a PR?
Information on the proper way to contribute would be greatly appreciated especially as it relates to library structuring, name spacing conventions, etc.. Given the complexity of the code base to support the proof, the code below is a exemplary portion of a "Minimum Viable Proof" with all detailed code replaced by sorry. As as a reference for those kind enough to give guidance, the sketch is intended to display some of the dependencies and argument structure .

The full proof is available in the EGPT codebase (
https://github.com/eabadir/PprobablyEqualsNP/blob/main/EGPT/Entropy/RET.lean
and NumberTheory/Analysis.lean).

b) Background & Integration Questions:
What are the responsibilities of the contributor? Will the PR process identify how to restructure or re-implement for best practices or is it ok to assume that the contribution will be isolated and follow its own practices?

The full RET is a bidirectional statement: (1) Rota's Uniqueness of Entropy (RUE) theorem proves any function H satisfying the axioms must be of the form C * H_shannon, and (2) any function of the form C * H_shannon (for C > 0) must satisfy the axioms.

The MVP provided here, RET_All_Enropy_Is_Scaled_Shannon_Entropy, recaps the second, "transport" direction and is chosen because of the stunning generalizability of the result *All entropy, including continuous physics entropies, are scaled Shannon Entropy. Rota's original proof uses a truly beautiful logarithmic trapping argument to establish this and the Lean code to support it is somewhat extensive.

My Lean skills being limited I relied heavily on Mathlib 3 examples (often from LLM's) and frequently ended-up re-implementing lemmas that I couldn't find in Mathlib 4. Some, if not most, probably have analogs or preferred best practices that I don't know about. How are these discrepancies normally handled in a PR?

NOTE ON sorry:
These are "stubs" or placeholders for complex definitions and theorems that exist in the full EGPT codebase. The names and types here are an exact match in the full code base.

import Mathlib.Analysis.SpecialFunctions.Log.NegMulLog
import Mathlib.Data.NNReal.Basic

/-- The standard Shannon Entropy (base e, natural log) of a probability distribution. -/
noncomputable def stdShannonEntropyLn {α : Type} [Fintype α] (p : α → NNReal) : Real := sorry

/-- The uniform probability distribution over a finite type `α`. -/
noncomputable def uniformDist {α : Type*} [Fintype α] (h_card_pos : 0 < Fintype.card α) : α → NNReal := sorry

/-- A joint probability distribution over a dependent pair of types. -/
noncomputable def DependentPairDistSigma {N : ℕ}
    (prior : Fin N → NNReal) (M_map : Fin N → ℕ)
    (P_cond : Π (i : Fin N), (Fin (M_map i) → NNReal)) :
    (Σ i : Fin N, Fin (M_map i)) → NNReal := sorry
-- Here we define the 7 axiomatic properties that an entropy function must have. Note Rota originally established 5
-- but two more were needed to satsify Lean's requirements on edge-cases.

structure IsEntropyNormalized (H_func : ∀ {α : Type} [Fintype α], (α → NNReal) → NNReal) : Prop where
  normalized : ∀ (p : Fin 1 → NNReal), (∑ i, p i = 1) → H_func p = 0

structure IsEntropySymmetric (H_func : ∀ {α : Type} [Fintype α], (α → NNReal) → NNReal) : Prop where
  symmetry : ∀ {α β : Type} [Fintype α] [Fintype β] (p_target : β → NNReal) (_hp : ∑ y, p_target y = 1) (e : α ≃ β),
    H_func (fun x : α => p_target (e x)) = H_func p_target

structure IsEntropyContinuous (H_func : ∀ {α : Type} [Fintype α], (α → NNReal) → NNReal) : Prop where
  continuity : ∀ {α : Type} [Fintype α] (p_center : α → NNReal) (_hp_sum_1 : ∑ i, p_center i = 1) (ε : ℝ), ε > 0 →
    ∃ δ > 0, ∀ (q : α → NNReal) (_hq_sum_1 : ∑ i, q i = 1), (∀ i, |(q i : ℝ) - (p_center i : ℝ)| < δ) → |((H_func q) : ℝ) - ((H_func p_center) : ℝ)| < ε

structure IsEntropyZeroInvariance (H_func : ∀ {α : Type} [Fintype α], (α → NNReal) → NNReal) : Prop where
  zero_invariance : ∀ {n : ℕ} (p_orig : Fin n → NNReal) (_hp_sum_1 : ∑ i, p_orig i = 1),
    let p_ext := (fun (i : Fin (n + 1)) => if h_lt : i.val < n then p_orig (Fin.castLT i h_lt) else 0)
    H_func p_ext = H_func p_orig

structure IsEntropyZeroOnEmptyDomain (H_func : ∀ {α : Type} [Fintype α], (α → NNReal) → NNReal) : Prop where
  apply_to_empty_domain : H_func Fin.elim0 = 0

structure IsEntropyMaxUniform (H_func : ∀ {α : Type} [Fintype α], (α → NNReal) → NNReal) : Prop where
  max_uniform : ∀ {α : Type} [Fintype α] (h_card_pos : 0 < Fintype.card α) (p : α → NNReal) (_hp_sum_1 : (∑ x, p x) = 1),
    H_func p ≤ H_func (uniformDist h_card_pos)

structure IsEntropyCondAddSigma (H_func : ∀ {α : Type} [Fintype α], (α → NNReal) → NNReal) : Prop where
  cond_add_sigma : ∀ {N : ℕ} [NeZero N] (prior : Fin N → NNReal) (M_map : Fin N → ℕ)
    (P_cond : Π (i : Fin N), (Fin (M_map i) → NNReal)) (_hprior_sum_1 : ∑ i, prior i = 1)
    (_hP_cond_props : ∀ i : Fin N, prior i > 0 → (M_map i > 0 ∧ ∑ j, P_cond i j = 1))
    (_hH_P_cond_M_map_zero_is_zero : ∀ i : Fin N, M_map i = 0 → @H_func (Fin (M_map i)) (Fin.fintype (M_map i)) (P_cond i) = 0),
    H_func (DependentPairDistSigma prior M_map P_cond) = H_func prior + ∑ i, prior i * H_func (P_cond i)

/--
A function `H_func` has Rota's entropy properties if it satisfies all 7 axioms.
-/
structure HasRotaEntropyProperties (H_func : ∀ {α : Type} [Fintype α], (α → NNReal) → NNReal) : Prop
  extends IsEntropyZeroInvariance H_func, IsEntropyNormalized H_func, IsEntropySymmetric H_func,
          IsEntropyContinuous H_func, IsEntropyCondAddSigma H_func, IsEntropyMaxUniform H_func,
          IsEntropyZeroOnEmptyDomain H_func

/--
The `EntropyFunction` type bundles a function `H_func` with the proof that it
satisfies Rota's axioms. This is the object our main theorem will take as input.
-/
structure EntropyFunction where
  H_func : ∀ {α : Type} [Fintype α], (α → NNReal) → NNReal
  props : HasRotaEntropyProperties H_func

/--
**Theorem (Rota Properties Transport Over Scaling):** If a function `H` satisfies
the properties of an entropy function, then any positive constant multiple of `H`
also satisfies those properties.

This transport also allows us to change the base of the logarithm in the entropy function
without re-writing all the axiom proofs which were more easily
done in nats.
-/
theorem RET_All_Enropy_Is_Scaled_Shannon_Entropy (ef : EntropyFunction) (C : ℝ) (hC_pos : 0 < C) :
    HasRotaEntropyProperties (fun p => (C * (ef.H_func p : ℝ)).toNNReal) :=
  -- To prove a structure, we provide a proof for each of its fields.
  {
    -- We need to prove 7 properties for the new scaled function `C * H`.
    -- The logic for each proof is to use the existing property for `H` and show
    -- that it is preserved after being multiplied by the constant `C`.

    -- Axiom 1: Symmetry
    symmetry := by
      sorry -- The symmetry property is preserved under scaling, as the constant `C` does not affect the equality of distributions.
    ,
    -- Axiom 2: Zero Invariance
    zero_invariance := by
      intro n p_orig hp_sum_1
      -- Goal: scaled H of p_ext = scaled H of p_orig.
      -- The logic is identical to symmetry. We use the original property and scale it.
      sorry -- The zero invariance property is preserved under scaling, as the constant `C` does not affect the zero value.
    ,
    -- Axiom 3: Normalization
    normalized := by
      intro p hp_sum_1
      -- Goal: (C * ↑(ef.H_func p)).toNNReal = 0
      -- We know from the original function's properties that `ef.H_func p = 0`.
      -- So the expression becomes `(C * 0).toNNReal` which is `0`.
      simp only [ef.props.normalized p hp_sum_1, NNReal.coe_zero, mul_zero, Real.toNNReal_zero]
    ,
    -- Axiom 4: Maximum at Uniform
    max_uniform := by
      intro α _ h_card_pos p hp_sum_1
      -- Goal: scaled H(p) ≤ scaled H(uniform)
      -- We know `H(p) ≤ H(uniform)`. Since `C` is positive, `C*H(p) ≤ C*H(uniform)`.
      -- `toNNReal` preserves this inequality.
      apply Real.toNNReal_le_toNNReal
      apply mul_le_mul_of_nonneg_left
      · exact NNReal.coe_le_coe.mpr (ef.props.max_uniform h_card_pos p hp_sum_1)
      · exact le_of_lt hC_pos
    ,
    -- Axiom 5: Continuity
    continuity := by
      intro α _ p_center hp_sum_1 ε hε_pos
      -- This is the epsilon-delta proof.
      -- We want to show |C*H(q) - C*H(p)| < ε.
      -- This is |C| * |H(q) - H(p)| < ε.
      -- So we need |H(q) - H(p)| < ε / |C|.
      -- We use the original function's continuity axiom with a scaled epsilon.
      sorry -- The continuity property is preserved under scaling, as the constant `C` does not affect the delta-epsilon relationship.
    ,
    -- Axiom 6: Conditional Additivity
    cond_add_sigma := by
      intro N _inst prior M_map P_cond hprior_sum_1 hP_cond_props hH_P_cond_zero
      -- We need to prove H(joint) = H(prior) + Σ priorᵢ * H(condᵢ) for the scaled function.
      -- This follows from distributing `C` across the original identity.
      sorry -- The conditional additivity property is preserved under scaling, as the constant `C` can be factored out.
    ,
        apply_to_empty_domain := by
      -- Goal: (C * ↑(ef.H_func (fun (_ : Empty) => 0))).toNNReal = 0
      simp only [ef.props.apply_to_empty_domain, NNReal.coe_zero, mul_zero, Real.toNNReal_zero]
  }

Siddhartha Gadgil (Aug 29 2025 at 04:46):

I believe you posted in the wrong place, as this is "program verification" and you seem to have theorems and proofs. Better to post in maths/mathlib4 channels (actually cross-posting is discouraqged, so if some maintainer sees this it is best they move the message).

Notification Bot (Aug 29 2025 at 08:34):

This topic was moved here from #Program verification > Rota's Entropy Theorem And Feeback Before A PR by Kevin Buzzard.

Last updated: Feb 28 2026 at 14:05 UTC