Kevin Buzzard (Apr 03 2020 at 12:08):
Let's say that someone ports a random number generator to Lean where you can seed it (default seed is 37) and then it just gives you a function from nat to real or nat to the nats which are at most N or whatever. Using this someone could formalise an algorithm like Miller Rabin and then this could be a pseudo primality checker for large numbers. But how could one formalise the assertion that "the miller rabin test tried with 100 randomly generated primes will give the right answer with probability 99.99999999%"?
Kevin Buzzard (Apr 03 2020 at 12:09):
I mean, formalising the things mathematicians say when they justify these sorts of claims
Gabriel Ebner (Apr 03 2020 at 12:11):
The assertion is just a statement about the probability of a certain set (where the seed is a (uniformly distributed) random variable).
Kevin Buzzard (Apr 03 2020 at 12:12):
Maybe there's some function which eats a proposition depending on a stream input and returns a number between zero and one which we never compute
Kevin Buzzard (Apr 03 2020 at 12:14):
I see, and I can then prove that the probability you pass for a given input is at least 3/4 or whatever
Kevin Buzzard (Apr 03 2020 at 12:15):
Without ever explicitly computing it
Gabriel Ebner (Apr 03 2020 at 12:23):
It's been some time since I've looked at Miler-Rabin. IIRC (and read wikipedia), then in order to check that is prime, you check for a random that and for all where . The number is prime iff no such exists. If is not prime, the probability of picking such an is at least .
Gabriel Ebner (Apr 03 2020 at 12:28):
So you'd get a function
test : fin n → Prop with
volume test ≥ 3/4. You can then also define a function
(fin m → fin n) → Prop that does the test multiple times. I'm not sure how exactly you would hook this up with a PRNG, you'd probably need some theorems about the distribution of the values of the PRNG.
Mario Carneiro (Apr 03 2020 at 12:35):
I don't think there is anything meaningful you can do to hook up the PRNG, at the end of the day you just get one number. You might be able to take the preimage of the probability distribution across the PRNG to give a bound on the volume of the set of "good" seed values, but then you just stick 37 in and hope it's one of the good ones
Mario Carneiro (Apr 03 2020 at 12:36):
You might also have trouble proving that iterating the test increases the volume past 3/4, since proving independence is somewhere between very difficult and false
Gabriel Ebner (Apr 03 2020 at 12:43):
The part about the independence of the values of the PRNG is a good point, though the probability of finding a witness should still converge to 1 since the PRNG hopefully enumerates all values.
There are two questions here though:
- How to verify the Miller-Rabin algorithm. Here the answer is probably to ignore the PRNG and just compute the probability of the test directly, which isn't too hard.
- How to run it as a tactic to disprove primality. I completely agree with you here. Just stick in the randomly chosen values 2, 3, 5, 7, and 11 and pray that you find a witness.
Mario Carneiro (Apr 03 2020 at 12:45):
Unless it's a ludicrously large number, an easier way to get a witness is to ask a CAS to find a factor
Gabriel Ebner (Apr 03 2020 at 12:46):
I'd even say "in particular if it's a ludicrously large number".
Mario Carneiro (Apr 03 2020 at 12:47):
But I agree with the general spirit of the question; I've been struggling with finding the "proof content" of probabilistic proofs for years
Mario Carneiro (Apr 03 2020 at 12:48):
it always boils down to relying on open questions like the algorithmic randomness of prime numbers after suitable preprocessing
Gabriel Ebner (Apr 03 2020 at 12:50):
They're not probabilistic proofs, they're probabilistic algorithms. And their correctness (= probability that they return the correct result) does not rely on any open questions AFAICT.
Mario Carneiro (Apr 03 2020 at 12:51):
I mean the correctness of the over all procedure of taking a test like this, running it 10 or 20 times, and arguing that the chance that the theorem is false is now less than the probability of hardware failure
Gabriel Ebner (Apr 03 2020 at 12:51):
I also feel a bit uneasy that these algorithms return an imprecise answer to a binary question, but that doesn't change the fact that they are correct in a very precise sense.
Gabriel Ebner (Apr 03 2020 at 12:52):
Yes, that's precisely the part I don't like either.
Kevin Buzzard (Apr 03 2020 at 13:17):
You're pure computer scientists.
Bryan Gin-ge Chen (Apr 03 2020 at 13:32):
This discussion reminds me a bit of this MO question which I remember reading and being confused about right before I learned about Lean.
What notable algorithms do we currently have in non
meta mathlib? There's the extended GCD stuff by Mario in
data.int.gcd, some sort algorithms in
data.list.sort and some other stuff on data structures like
data.hash_maps, anything else that I'm missing? (I guess any
def is in some sense an algorithm, so really I'm asking about what mathematicians might label algorithms.)
Mario Carneiro (Apr 03 2020 at 13:42):
there are association lists in
data.list.alist, R-B maps in core, and
ordmap is an implementation of 2/3 size balanced binary trees a la Haskell's Data.Map on a long outdated branch of mathlib. I once had a plan to formalize most of Okasaki's book on functional data structures, but it was dead on arrival because lean 3 doesn't have thunks so half of the algorithms don't work (at least not at the stated performance bounds)
Reid Barton (Apr 03 2020 at 13:43):
For these probabilistic algorithms I don't think you can avoid modeling the PRNG output as a random variable (in the sense of probability theory).
Mario Carneiro (Apr 03 2020 at 13:44):
but that's... at best unjustified
Reid Barton (Apr 03 2020 at 13:44):
Right, which is precisely why you can't do better in a theorem prover :slight_smile:
Reid Barton (Apr 03 2020 at 13:45):
The most you could even hope to be true is that your PRNG looks random to someone with bounded computational power
Reid Barton (Apr 03 2020 at 13:45):
but I don't think anyone knows how to prove any statement of that form
Mario Carneiro (Apr 03 2020 at 13:45):
for all you know there is some deep number theory reason that the method your PRNG uses is disproportionally likely to pick Miller-Rabin false negatives
Reid Barton (Apr 03 2020 at 13:50):
Well, you might believe that this is actually false (because false negatives are rare, and you could detect them in polynomial time by doing a polynomial time deterministic primality test) but how would you ever prove it?
Reid Barton (Apr 03 2020 at 13:54):
and for your application you would even need to prove it with effective bounds
Ryan Lahfa (Apr 03 2020 at 14:05):
That's a small contribution, but AFAIK, people have been successful at modelling some CSPRNG behaviors using Z3 solvers: https://alephsecurity.com/2019/09/02/Z3-for-webapp-security/ — I don't know how feasible it would be to reproduce this in Lean (and how useful?) — but I guess this is far from the ideal desired outcome of this discussion.
Mario Carneiro (Apr 03 2020 at 14:12):
Yeah, unfortunately the only thing that seems to be possible in this space is disproving that certain random number generators are good
Last updated: May 13 2021 at 17:42 UTC