Zulip Chat Archive

Stream: mathlib4

Topic: Cantorian Diagonal Inversion and Shannon Entropy Lemmas

Luke Miller (Sep 26 2025 at 19:25):

I’ve been working on a small project formalizing Cantor style diagonal inversion in Lean 4.

Here are the core defs/lemma:

def binmat (N L : ℕ) := Vector (Vector Bool L) N

def diagonal_inversion {N L : ℕ} (S : binmat N L) : Vector Bool (min N L) :=
  Vector.ofFn (λ i =>
    !(S.get (Fin.castLE (min_le_left N L) i)).get (Fin.castLE (min_le_right N L) i)
  )

def bernoulli {α : Type} [Ring α] (p : α) : Bool → α
  | true  => p
  | false => 1 - p

def bit_inversion (b : Bool) : Bool := !b

lemma bernoulli_inversion {α : Type} [Ring α] (p : α) :
  ∀ b, bernoulli p b = bernoulli (1 - p) (bit_inversion b) := by
  intro b
  cases b <;> simp [bernoulli, bit_inversion]
  all_goals ring

It compiles fine, but I’d really appreciate feedback:
Is this idiomatic Lean 4?
Are there mathlib lemmas I should be using instead?
Any advice on structuring this kind of project?

Aaron Liu (Sep 26 2025 at 19:27):

can you edit your message to use #backticks

Luke Miller (Sep 26 2025 at 19:28):

Done, thank you Aaron

Luke Miller (Sep 26 2025 at 19:33):

The full build:

EDA.lean:

import EDA.Basics
import EDA.Diagonal
import EDA.MainTheorem

EDA.Diagonal

import Mathlib.Data.Nat.Basic
import EDA.Basics

def diagonal_inversion {N L : ℕ} (S : binmat N L) : Vector Bool (min N L) :=
  Vector.ofFn (λ i =>
    !(S.get (Fin.castLE (min_le_left N L) i)).get (Fin.castLE (min_le_right N L) i)
  )

EDA.Basics

import Mathlib.Data.Vector.Basic

-- A binary dataset: N sequences, each of length L
def binmat (N L : ℕ) := Vector (Vector Bool L) N

EDA.MainTheorem

import Mathlib.Data.Bool.Basic
import Mathlib.Algebra.Ring.Basic
import Mathlib.Tactic.Ring

/-
Bernoulli distribution on a single bit:
returns the probability (as a ring element).
-/
def bernoulli {α : Type} [Ring α] (p : α) : Bool → α
  | true  => p
  | false => 1 - p

-- Bit inversion is just Boolean negation
def bit_inversion (b : Bool) : Bool := !b

/-
Lemma: flipping a Bernoulli(p) bit
is the same as sampling a Bernoulli(1 - p) bit.
-/
lemma bernoulli_inversion {α : Type} [Ring α] (p : α) :
  ∀ b, bernoulli p b = bernoulli (1 - p) (bit_inversion b) := by
  intro b
  cases b <;> simp [bernoulli, bit_inversion]
  all_goals ring

Luke Miller (Sep 26 2025 at 19:41):

This is what I’m trying to capture:

Cantorian Diagonal Inversion (informal):

D = { S₁(1), S₂(2), ..., Sₘ(m) }

Take the i-th bit from the i-th sequence, flip it, collect into D.

Has anyone formalized Shannon Entropy or Cantorian Diagonalization in lean/mathlib yet?

Notification Bot (Sep 26 2025 at 20:56):

This topic was moved here from #mathlib4 > Feedback on operator definitions (EDA project) by Luke Miller.

Alex Meiburg (Sep 28 2025 at 04:03):

Shannon entropy can be formalized a few different ways depending on what you want to do. One version is here, which extends the definition to an arbitrary measure. There's a lot of work on relative entropies here. I have my own very limited version here.

Mathlib has docs#Real.binEntropy, which is the binary entropy function. (The Shannon entropy of a Bernoulli variable.)

Alex Meiburg (Sep 28 2025 at 04:04):

Cantor's diagonal argument has been done in Lean, indeed it's often one of the first theorems to be done in a new theorem proving setting. :) The version in Mathlib may look a bit dense to newcomers, though: https://github.com/leanprover-community/mathlib4/blob/bf35bd39a069f06cd32685a267702d145efc544a/Mathlib/Logic/Function/Basic.lean#L242

Alex Meiburg (Sep 28 2025 at 04:09):

The code you wrote is pretty idiomatic Lean4, but it looks like "Lean4 for actual programming", as opposed to "Lean4 for theorem proving" -- the two often take different design choices for a number of reasons.

For instance, you use Bool, which is actual data you can store in a variable; Mathlib generally prefers Prop, which can be any mathematical claim, regardless of whether you can compute the answer. You use Vector T, which is a variable-length array; Mathlib would often prefer List T (for better inductive proofs) or maybe a function x -> T (where x is a Fintype).

Alex Meiburg (Sep 28 2025 at 04:10):

What is your main goal with this project? I see you've titled it "EDA", are you looking at Electronic Design Automation?

Luke Miller (Sep 28 2025 at 04:10):

I appreciate you getting back to me with that information that is really awesome, I needed them specifically because of a mathematical theorem that I have under review at the Journal of computational science testing the randomness of time indexed data sets using cantor's diagonal argument and Shannon entropy. it actually led to the development of the encryption scheme that I posted in another thread. I wasn't familiar with lean before I got started so everything that I've learned I've pretty much learned just from looking it up, I didn't really look for any sort of documentation or anything like that. if you have any further recommendations let me know, I'll take a look at those specific libraries and see if they work better in my lean than what I already have.

Luke Miller (Sep 28 2025 at 04:14):

and it's called entropy diagonalization analysis. we're looking for structure in time indexed data sets, we tested it on bitcoin and on seismographs and frb bursts. on bitcoin we found a .02 difference in randomness versus regular Sha hashes because of the proof of work scheme. in seismology there are additional signals that you can pick up whenever the scores drop, but that is under review and I'm not willing to share additional information because of potential impacts it may have later on if the research ends up being true. for frb, we looked for a ton of different fil files from seti and something like 100 candidates it may have been 97 and only two of them came back multi-band where we could process them so we processed them and correctly identified 8 chunks out of something like 16,000 in two different fil files that contained fast radio bursts with no error. chunks next to the bursts did not contain any information and background noise was the expected ratio in quiet time. once again all of this is under review so take it with a grain of salt

Alex Meiburg (Sep 28 2025 at 16:32):

I think it'll be pretty hard to help you more without more details. :thinking: But if you run into issues let us know. Are you using an LLM to help you write the code?

Luke Miller (Sep 28 2025 at 16:40):

to assist with code > lean, yes. cross referencing and checking work over and over, not letting a single instance give us a final answer. it helps iterate quickly without having to sift through mass amounts of data. basically checking that, even though we validated the work empirically, that it can be machine checked. but I know that it's not giving me the best I could do for this. I used it to learn how lean works and how to structure documents and builds essentially, and asked questions to fix compile errors whenever I ran into them. But that also leaves me in the position of not knowing Bitvec has xor operations, or that Shannon entropy and Cantor have already been encoded, or how to look for specific libraries and information. I would hope that doesn't put me in a bad position with you guys, I am genuinely trying to learn very complicated things and only recently learned about lean in the process of peer review. I want to take it seriously.

Alex Meiburg (Sep 28 2025 at 17:06):

What do you mean, that you learned about Lean in the process of peer review...?

Luke Miller (Sep 28 2025 at 17:07):

I learned in the process of trying to find a coding language that would help me absolutely check my logic that lean existed and I decided that I was going to start using it

Luke Miller (Sep 28 2025 at 17:07):

if you guys want to scrutinize anymore you can go right ahead I won't hide anything from you

Yan Yablonovskiy 🇺🇦 (Sep 29 2025 at 10:46):

Luke Miller said:

it actually led to the development of the encryption scheme that I posted in another thread.

Which thread? Are you ban evading?

Alex Meiburg (Sep 29 2025 at 14:44):

Luke's presumably referring to #general > encryption scheme , which is the only other topic they've posted in. (Why did you think ban evading...?)

Luke Miller (Sep 29 2025 at 14:46):

? I haven't done anything to be banned although I will admit I have no idea how this place works beyond what I've read and am just doing my best so if you have any advice just say so. I've been here for like 3 days

Luke Miller (Sep 29 2025 at 14:47):

It seems to be a bit more intense than I expected

Alex Meiburg (Sep 29 2025 at 19:37):

I think you will encounter some confusion/skepticism around your goals, until you can spell some things out more clearly.

Lately there's been a particular trend -- I will spell out what's been happening, but please understand that I'm not describing you. People will come in with some extraordinary "breakthrough" ideas, typically cutting-edge research for a field that they weren't an expert in to begin with. Often they have have come to their ideas after some very independent work, aided by ChatGPT or similar. (The ideas, it turn out, make very little sense -- I mean stuff like "Time is a spiral fractal and we can resonate with morality in the 13th dimension" kind of stuff) They encounter resistance disseminating their ideas, precisely because they don't make sense. They interpret this as "other people don't understand my idea because it's too new". After some arguing with others, someone tells them, "well why don't you go prove your big new ideas in some theorem prover like Lean, then show us that!", expecting this will get them to leave them alone for a while. The person, not knowing Lean and recognizing that it's a difficult language, gets an LLM to help them write Lean. They make some amount of progress, and then come here to the Zulip chat. If they had a lot of progress, they will come in saying "Look, I have these breakthrough ideas and they're all proved in Lean!". If they had little progress, they will come in saying, "Look, I have these breakthrough ideas, but I'm having trouble proving them in Lean, can someone help?".

Alex Meiburg (Sep 29 2025 at 19:38):

But simply because that has happened enough times, I think people are now sort of on edge about anything where (1) someone has not-very-tested-sounding research ideas, (2) sounds like they're new to Lean, but are more interested in proving something dramatic than learning the details of the language itself, and (3) it sounds like an AI was heavily involved.

Luke Miller (Sep 29 2025 at 19:44):

I understand, I don't intend on bringing anything crazy in here, you guys saw the encryption scheme and this conversation and thats all I was checking. They both compile, I wouldn't want to bother anyone with anything I hadn't already previously checked. Everything I shared is either under active peer review or has gone through review. This thread is at jocsci and has been for a few months and the encryption got refused at rebuttal at a symposium because it would have been too big of a rewrite, development took off after the paper was submitted. I won't do anything else. Im going to be careful either way, you guys mean it and the opportunity to have community is too good to pass up. I won't bother anyone with anything that hasn't been at least looked at by a professional editor to be taken up for review.

Alex Meiburg (Sep 29 2025 at 19:44):

Again -- not necessarily true of you! So, until there's some evidence to the contrary, I will assume that whatever you're doing is reasonable good science. :)

Luke Miller (Sep 29 2025 at 19:45):

I might edit that comment I messed up a sentence! I appreciate you guys no matter what happens, if I'm outta line just smack me

Alex Meiburg (Sep 29 2025 at 19:48):

In order to help you achieve your goals, some questions/tips:

"Entropy Diagonalization Analysis" is your own term. It might be a meaningful term to you, but it's not standard, and so people won't be able to go off that at all. Typically you should start communication in terms of established language, and then as needed introduce your own vocabulary. Can you describe this as "entropy-based data analysis", e.g. work like this one? Is that accurate? What does diagonalization have to do with it?
You've mentioned an encryption scheme, and also what sounds like a data analysis framework. Can we pick one and focus on that first? These two, on the face of it, have nothing at all to do with each other.
Formally proving things about encryption schemes is, generally speaking, quite hard. (Most results about encryption also have some "security assumptions" in place.) This would be a very high goal, and I would honestly avoid that unless you've got a very solid math+cryptography background.
As for something like data analysis, you'll have to be pretty clear what your goal with formal verification. Are you trying to program some data analysis pipeline in Lean and verify it? Are you trying to prove a theorem abstractly about how something performs? What kind of properties are you hoping to prove?
Are you set on just beelining for one thing, or are you generally looking to get in to Lean? If you want to learn about cryptography in Lean, for instance, there's projects like https://github.com/Verified-zkEVM/ArkLib . There's also not much "classical statistics" (like, Student's t-test) in Mathlib, so that kind of "data analysis"-y stuff would be very welcome, and people here would be happy to help you learn how to prove + contribute such stuff.

Luke Miller (Sep 29 2025 at 19:55):

Its a little different, its for time series datasets. We take slices, for instance with bitcoin it was the block chain, from 86,000 to I believe ~450,000 or so, it was 330,000 total blocks. You take a diagonal at a n+1 position across the data stream and calculate n1, n2, n3 to 256, and take the Shannon entropy of the diagonal. For bitcoin the cleanroom was pure SHA-256 hashes. we took the same diagonal from them as we did the block chain and found that the PoW scheme was imposing a .02 difference in the randomness of the hash consistently across those 330,000 or so blocks. Does that mean anything? Not really. But it was cool! So we tested in some other areas too and sent it in to comp sci after. Once that was done, we thought if it was a randomness test we can develop to beat it and specifically developed an encryption scheme to do so and that encryption scheme passed Big Crush, Practrand to a TB, Dieharder etc. So thats the sitch. The cipher was a secondary confirmation of the experiments results, but once again both are under review

Luke Miller (Sep 29 2025 at 20:02):

And to answer your other questions: Lean seems to be something I'm going to genuinely need to know in the future. I shared to I suppose introduce myself to the community, but you do not owe me your attention. If someone chooses to help great, if not I understand. My problems are solved and my lean compiles, everything from there is semantics

Last updated: Feb 28 2026 at 14:05 UTC