Zulip Chat Archive
Stream: maths
Topic: How does norm_num check primality?
Kevin Buzzard (May 29 2020 at 12:08):
Asking for a friend: https://twitter.com/pruvisto/status/1266339118587555846?s=20 (in response to https://twitter.com/XenaProject/status/1266336662046597121?s=20)
@XenaProject @AngelikiKoutso1 That said, I'm impressed that your norm_num tactic is that fast. Do you know how it works exactly?
- Manuel Eberl (@pruvisto)@AngelikiKoutso1 Lean: ``` import tactic open nat set_option profiler true -- timing example : prime 73727 := by norm_num -- quick example : prime 100000007 := by norm_num -- about 5 seconds ``` Mario Carneiro got norm_num doing this after https://xenaproject.wordpress.com/2018/07/26/617-is-prime/
- The Xena Project (@XenaProject)Johan Commelin (May 29 2020 at 14:31):
Kudos to Mario :octopus: :tada:
Johan Commelin (May 29 2020 at 14:33):
Note that
Mario Carneiro got norm_num doing this after https://xenaproject.wordpress.com/2018/07/26/617-is-prime/
is not completely accurate. I think Mario gave norm_num
another speedup in 2020 (a couple of weeks ago).
Kevin Buzzard (May 29 2020 at 14:44):
I don't know if that affects the primality testing. Is it just checking divisibility up to sqrt(n) though?
Johan Commelin (May 29 2020 at 14:55):
norm_num
got faster in general
Kevin Buzzard (May 29 2020 at 14:56):
My claim is accurate in the sense that norm_num
could not prove nat.prime 617
before I wrote that blog post, but it could a few days after it
Johan Commelin (May 29 2020 at 16:44):
Right, but the time measurements probably got better since then.
Johan Commelin (May 29 2020 at 16:44):
Of course the step from infty
to 10s
is a lot more important then from 10s
to 5s
.
Mario Carneiro (May 29 2020 at 21:41):
@Manuel Eberl norm_num
actually isn't doing anything very intelligent here. I agree that the fastest option is to do as much unverified work as possible, but the proof method here is still trial division, which I believe is the best performer for small primes. (I have implemented Pocklington certificates in metamath before but they were way overkill for the numbers you usually see in theorem provers.) The main use of the unverified computation here is in proving non-primality: if the input is not prime we find a witness to it externally and then the proof itself is short. But if the number is in fact prime we just demonstrate that for each d = 3,5,7,9,... while d*d <= n, n can be expressed as a + d * b where 0 < a < d (and unverified computation is used to find a and b here).
So while we can leverage unverified computation to some extent, the proof is still quite long -- for 100000007 that's 5000 applications of the inductive step. Unlike the Coq approach, we are making essentially no use of "reflection" -- there are no nontrivial definitional equality problems produced, because in benchmarks this turns out to be slightly faster, at least with lean's current implementation. So it is really quite similar to what you would do in Isabelle.
I'm not sure how it compare's with isabelle's simp
. Lean doesn't have simpprocs, but norm_num
is perhaps the nearest approximation to one, since it uses simp for the expression traversal, which calls the core norm_num tactic to simplify subterms. It is possible to use pure simp
to prove theorems about +
or <=
on numerals as well, with the right simp lemmas, but this seems to be about 100 times slower, and perhaps Isabelle's simp
lies somewhere in the middle. The problem is that simp
does so many things that it can't focus on the one goal, and instead runs through its many simplifications on every subterm, while norm_num
is applying exactly the theorems that are needed and produces a proof of prime n
all in one go.
Yury G. Kudryashov (May 29 2020 at 21:46):
Do I understand correctly that one of the reasons non-trivial rfl
s are slow is that from the kernel point of view nat
are Peano naturals, not binary numbers?
Mario Carneiro (May 29 2020 at 21:51):
That depends on what is being proved. You can often get lucky if the kernel unfolds things in the right order and reduces things to an exact match before it gets that bad, but the worst case with proving facts about nat is that it reduces the term all the way, to the unary representation
Chris Hughes (May 29 2020 at 21:51):
But it's slower even with the binary num
right?
Mario Carneiro (May 29 2020 at 21:52):
Yes, that's the real comparison I'm talking about
Mario Carneiro (May 29 2020 at 21:52):
If you did this Coq style you would use an inductive type whose structure matches the proof method
Kevin Buzzard (May 29 2020 at 21:56):
@Yury G. Kudryashov
import tactic
example : 1000000000000 + 1000000000000 = 2000000000000 := rfl -- immediate
example : 1000000000 + 2000000000 = 3000000000 := rfl -- times out
Mario Carneiro (May 29 2020 at 21:58):
I should replace "often" with "occasionally". This is an unlikely event in the space of all random questions you can ask
Mario Carneiro (May 29 2020 at 22:46):
Here's another funny variation:
example : 10000000000000000000 + 1 = 10000000000000000001 := rfl -- immediate
example : 10000000000000000001 + 1 = 10000000000000000002 := rfl -- times out
Manuel Eberl (May 30 2020 at 09:04):
@Mario Carneiro
I see. Perhaps we should try to implement something like that in Isabelle as well.
There is also the HOL Light approach of not using binary numbers but some larger base b
(e.g. 16) with b
×b
tables for addition and multiplication. That cuts down the constant factor for arithmetic operations quite a bit.
We even had a student who started implementing this for a Bachelor's thesis, but they never got very far.
Mario Carneiro (May 30 2020 at 09:23):
Metamath actually uses base 10 arithmetic :shock:
Mario Carneiro (May 30 2020 at 09:25):
To support this comfortably in lean would be complicated because the representation of numerals is innately base 2, so you would have to deal with all the possible half-digits at the end of a number if you chopped things up into 4 bit chunks
Manuel Eberl (May 30 2020 at 09:27):
Is this because of definitional equality vs other kinds of equality? In Isabelle/HOL, the representation of numbers is whatever we want it to be at this particular moment.
Mario Carneiro (May 30 2020 at 09:28):
It's because of the parser
Mario Carneiro (May 30 2020 at 09:28):
when you type 12345
in lean it gets turned into bit1 (bit0 ... (bit1 1))
Mario Carneiro (May 30 2020 at 09:29):
If you represent numbers in a different way the pretty printer will not show your numbers nicely
Mario Carneiro (May 30 2020 at 09:29):
and you can reasonably expect that human inputs will also be in this form
Manuel Eberl (May 30 2020 at 09:31):
Same in Isabelle. But why shouldn't norm_num be able to just rewrite that into something else as preprocessing, and back into binary in postprocessing?
Mario Carneiro (May 30 2020 at 09:31):
It could
Manuel Eberl (May 30 2020 at 09:31):
I think that's how it works in HOL Light
Mario Carneiro (May 30 2020 at 09:31):
It's nice to be able to read intermediate statements though
Mario Carneiro (May 30 2020 at 09:32):
In the HOLs that's mostly inaccessible anyway
Mario Carneiro (May 30 2020 at 09:32):
I've actually been working with HOL light recently and I think it's binary like lean, at least in the standard library
Mario Carneiro (May 30 2020 at 09:33):
but I also recall hearing that there was a large-base implementation of arithmetic for flyspeck
Mario Carneiro (May 30 2020 at 09:34):
For MM0 I am implementing base 16 arithmetic. I don't think that really large bases are necessarily better if the digit table overflows the cache
Manuel Eberl (May 30 2020 at 09:50):
Yes, more than 16 is probably a bit excessive.
Last updated: Dec 20 2023 at 11:08 UTC