Zulip Chat Archive

Stream: mathlib4

Topic: Kraft's Inequality and Converse

Elazar Gershuni (Jan 14 2026 at 17:30):

Hi everyone,

I have been working on a formalization of Kraft's Inequality and related results from information theory, and I wanted to gauge interest in contributing this to Mathlib.

What I have formalized:

Kraft's Inequality: $\sum_{w \in S} D^{-|w|} \le 1$ for prefix-free codes.
Kraft-McMillan Inequality: The same bound for uniquely decodable codes (Kraft's Inequality is proved as a special case).
The Converse: Given lengths $l_i$ satisfying $\sum D^{-l_i} \le 1$ , there exists a prefix-free code with those lengths.
Extensions:
- The results are generalized to arbitrary finite alphabets (not just binary).
- The proofs handle both finite and infinite codes.
- As an example, I used it to prove Shannon's Source Coding theorem.

Implementation Details:
The project is currently structured as a standalone package.

The forward direction (McMillan) uses an algebraic proof.
The converse direction uses a "cumulative sum" / dyadic interval construction to assign codewords.
I've tried to follow Mathlib naming conventions and style.
I'm using v4.26

As noted in my repo, I used AI assistants (Extensively) to help generate and iterate on the definitions and proofs. I have reviewed and structured the arguments, but I would appreciate feedback on whether the current style matches Mathlib's standards.

Code: https://github.com/elazarg/kraft

Is this something that would be a good fit for Mathlib? If so, I'd love advice on where in the hierarchy it belongs (perhaps InformationTheory or Combinatorics?) and how to break it down for a PR.

Thanks!

Wrenna Robson (Jan 15 2026 at 09:09):

I think this is InformationTheory and IMO would be a very welcome contribution.

Wrenna Robson (Jan 15 2026 at 09:09):

Interested in what definition you have chosen for codes (I have found it a little hard to decide in the past).

Bhavik Mehta (Jan 15 2026 at 12:07):

I haven't yet looked at the code but I agree this theorem belongs in mathlib

Elazar Gershuni (Jan 15 2026 at 14:38):

Wrenna Robson said:

Interested in what definition you have chosen for codes

I used simply Set (List α). What alternatives have you considered?

Elazar Gershuni (Jan 18 2026 at 15:14):

Opened the first PR: KraftMcMillan for uniquely-decodable codes:
https://github.com/leanprover-community/mathlib4/pull/34108

Elazar Gershuni (Jan 20 2026 at 01:28):

I have generalized the KraftMcmillan theorem to this:

variable {M : Type*} [Monoid M]

def lengthGrowth (D_nat : ℕ) : Prop :=
  ∀ (T : Finset M) (s : ℕ), (T.filter (fun x => ℓ x = s)).card ≤ D_nat ^ s

def tupleProduct {S : Finset M} : ∀ {r : ℕ}, (Fin r → S) → M
  | 0,     _ => 1
  | r + 1, w => (w 0).1 * tupleProduct (fun i : Fin r => w i.succ)

theorem kraft_inequality_of_injective (ℓ : M → ℕ)
    {S : Finset M} {D_nat : ℕ}
    (D_pos : 0 < D_nat)
    (h_add : ∀ a b, ℓ (a * b) = ℓ a + ℓ b)
    (h_pos : ∀ x ∈ S, 1 ≤ ℓ x)
    (h_count : lengthGrowth ℓ D_nat)
    (h_inj : ∀ r, Function.Injective (tupleProduct (S := S) (r := r))) :
    ∑ x ∈ S, (1 / (D_nat : ℝ)) ^ (ℓ x) ≤ 1

The proof is similar to the previous one, but it's no longer about uniquely decodable codes.

Would that be useful? The "standard" theorem is an instantiation.

Vlad Tsyrklevich (Jan 20 2026 at 08:28):

I think generalizing the statement is a good idea, it is more useful and generalizing can often lead to simpler proofs and statements. Your tuple product seems like it could be defined

def tupleProduct {S : Finset M} {r : ℕ} (w : Fin r → S) : M :=
  Fin.foldl r (· * w ·) (1 : M)

The h_add hypothesis is very close to being a MulHom or AddHom but I'm not sure if there's a way to relate multiplication to addition without using Additive/Multiplicative or switching Monoid to AddMonoid.

Wrenna Robson (Jan 20 2026 at 14:41):

Elazar Gershuni said:

Wrenna Robson said:

Interested in what definition you have chosen for codes

I used simply Set (List α). What alternatives have you considered?

Well, one might consider codes of a fixed length, or linear codes, for which Fin n -> α/Vector n α might be appropriate for the former and the latter, ah, probably wants LinearSpace or whatever it is.

Equally, one might instead consider a code as the function X -> FreeMonoid α (and bear in mind that FreeMonoid α is a synonym for List α): in this perspective, the code is the encoding map, rather than the image of the encoding map.

For the particular case of linear codes one might consider a code represented by its generating matrix (which is nearly but not quite the same as the above), or indeed one might consider a code defined by the matrix to which it is a kernel.

Also, one has the problem that a metric on (block) codes isn't necessarily singly defined - the Hamming metric is common but other options exist.

Wrenna Robson (Jan 20 2026 at 14:43):

oh: also Set (List α) allows for an infinite code, but one might reasonably impose the condition that it should be finite in which case you would probably use Finset, which corresponds to X above being a fintype.

The Set representation leads to you being unable to consider non-singular codes.

In addition you might well wish to attach a probability distribution to X which then induces a probability distribution on the codewords.

Yuval Filmus (Jan 20 2026 at 17:04):

Codes of fixed length aren't interesting in this case, since the statement becomes trivial (every code of fixed length is uniquely decodable).
The Kraft-McMillan inequality is about variable-length codes.

On the other hand, the Kraft–McMillan inequality is definitely interesting for (countably) infinite codes.
For example, the famous information theorist Peter Elias studied universal codes for the natural numbers, and every so often new papers are published on the topic.

Yuval Filmus (Jan 20 2026 at 17:06):

"Code" has several meanings — the codes referred to in Kraft–McMillan are different from the ones considered in the theory of error-correcting codes, despite being called the same.
In information theory, this is the difference between "source coding" and "noisy channel coding".

Matt Diamond (Feb 16 2026 at 22:04):

Could we define Chaitin's constant once this is merged in?

Matt Diamond (Feb 16 2026 at 22:05):

I suppose we could define it now, but this would allow us to show that the sum converges

Wrenna Robson (Feb 17 2026 at 09:15):

Yuval Filmus said:

"Code" has several meanings — the codes referred to in Kraft–McMillan are different from the ones considered in the theory of error-correcting codes, despite being called the same.
In information theory, this is the difference between "source coding" and "noisy channel coding".

I think choosing good names for this stuff when it comes into Mathlib is going to be very important.

Yuval Filmus (Feb 17 2026 at 11:12):

Matt Diamond said:

I suppose we could define it now, but this would allow us to show that the sum converges

I think the current version is only for finite codes, though an infinite version should be easy to add.
Also, I’m not sure prefix code are defined yet.

Elazar Gershuni (Feb 17 2026 at 17:06):

I would love to hear tips about the next slice to PR, after 34108 (Kraft-McMillan inequality) is merged.

Last updated: Feb 28 2026 at 14:05 UTC