Zulip Chat Archive

Stream: Is there code for X?

Topic: Compute Nat.primeFactors


Antoine Chambert-Loir (Jun 05 2025 at 18:34):

I'm struggling to make Lean compute docs#Nat.primeFactors , even for small integers.

example : Nat.primeFactors 6 = {2, 3} := sorry

I can rewrite to docs#Nat.primeFactorsList, and have it apply the induction hypothesis step by step, but this is a bit painful, even for 6.

Antoine Chambert-Loir (Jun 05 2025 at 18:43):

Thinking about repeat suggested this:

import Mathlib.Data.Nat.Factors
example : Nat.primeFactorsList 6 = [2, 3] := by
    repeat
      change Nat.primeFactorsList (_ + 2) = _
      simp [Nat.primeFactorsList]
      norm_num

Is it the right way to do?

Antoine Chambert-Loir (Jun 05 2025 at 18:43):

Probably not — What if I don't want to compute myself the right hand side ?

Aaron Liu (Jun 05 2025 at 18:50):

I think this is an instance of #lean4 > opaque recursion definitions break mergeSort decidability

Eric Wieser (Jun 05 2025 at 19:02):

#15420

Antoine Chambert-Loir (Jun 05 2025 at 21:02):

Cool! (I had started writing a fueled version of Nat.primeFactors…)

Eric Wieser (Jun 05 2025 at 21:04):

That is probably still a good idea

Eric Wieser (Jun 05 2025 at 21:04):

If you do that the simproc can become a dsimproc

Antoine Chambert-Loir (Jun 05 2025 at 22:30):

Is it important that the fuel variable is initialized to something reasonable?
Finding all prime factors of n requires Nat.clog 2 n steps; I I have two versions, one that proves that it goes in that much steps, and another one where I initialize fuel as n.
But the first version doesn't work as rfl, while the second does…

Antoine Chambert-Loir (Jun 05 2025 at 22:31):

import Mathlib

theorem primeFactorsList2_aux (fuel : ) (n : ) (hn : n  fuel + 1) (m : ) (hm : 2  m) :
    n / m  fuel := by
  apply Nat.le_of_lt_succ
  rw [Nat.div_lt_iff_lt_mul]
  apply lt_of_le_of_lt hn
  change fuel.succ < fuel.succ * m
  rw [Nat.lt_mul_iff_one_lt_right]
  exact hm
  exact Nat.zero_lt_succ fuel
  exact Nat.zero_lt_of_lt hm

def primeFactorsList3 :   List  :=
  let rec go (fuel : ) (n : ) (hfuel : n  fuel) : List  :=
    match fuel with
    | 0 => []
    | fuel + 1 =>
      if hk : fuel = 0  n < 2 then []
      else
        let m := Nat.minFac (n)
        have hm : 2  m := by
            apply Nat.Prime.two_le
            simp [m]
            refine Nat.minFac_prime ?_
            simp only [not_or] at hk
            intro hn
            apply hk.2
            simp [hn]
        m :: (go fuel (n / m) (primeFactorsList2_aux _ _ hfuel _ hm))
  fun n  go n n (le_refl _)

theorem primeFactorsList4_aux (fuel : ) (n : ) (h : ¬ (fuel = 0  n < 2)) (hn : n  2 ^ (fuel + 1)) (m : ) (hm : 2  m) :
    n / m  2 ^ fuel := by
  apply Nat.le_of_lt_succ
  rw [Nat.div_lt_iff_lt_mul (Nat.zero_lt_of_lt hm)]
  apply lt_of_le_of_lt hn
  apply lt_of_lt_of_le (b := (2 ^ fuel).succ * 2)
  · simp only [Nat.succ_eq_add_one, add_mul, one_mul,
     pow_succ, Nat.lt_succ_iff, Nat.le_add_right]
  · exact Nat.mul_le_mul_left (2 ^ fuel).succ hm

def primeFactorsList4 :   List  :=
  let rec go (fuel : ) (n : ) (hfuel : n  2 ^ fuel) : List  :=
    match fuel with
    | 0 => []
    | fuel + 1 =>
      if hk : fuel = 0  n < 2 then []
      else
        let m := Nat.minFac (n)
        have hm : 2  m := by
          apply Nat.Prime.two_le
          simp [m]
          refine Nat.minFac_prime ?_
          simp only [not_or] at hk
          intro hn
          apply hk.2
          simp [hn]
        m :: (go fuel (n / m) (primeFactorsList4_aux fuel n hk hfuel _ hm))
  fun n  go (Nat.clog 2 n) n (Nat.le_pow_clog Nat.one_lt_two n)


example : Nat.primeFactorsList3 105 = [3, 5, 7] := rfl
example : Nat.primeFactorsList4 105 = [3, 5, 7] := rfl -- doesn't work

Eric Wieser (Jun 05 2025 at 22:32):

I think the problem is that clog 2 n is itself missing a fuel argument?

Antoine Chambert-Loir (Jun 05 2025 at 22:34):

(Note that I do not understand the what in this way of writing a proof mkes this work.)

Eric Wieser (Jun 05 2025 at 22:34):

Indeed, example : Nat.clog 2 37 = (eval% Nat.clog 2 37) := rfl fails

Antoine Chambert-Loir (Jun 05 2025 at 22:35):

OK, so I have to fuel docs#Nat.clog as well. This makes sense.

Eric Wieser (Jun 05 2025 at 22:35):

I think you want your fuel argument to be as simple as possible, because how this method works is by reducing fuel to applications of succ

Eric Wieser (Jun 05 2025 at 22:35):

So we should fuel clog, but I doubt you want to use it in prime factors

Antoine Chambert-Loir (Jun 05 2025 at 22:36):

Hence the first version will be the right one.

Eric Wieser (Jun 05 2025 at 22:36):

I expect so, but perhaps the right approach is:

  • Fuel clog
  • Benchmark both versions, to see which is faster

Antoine Chambert-Loir (Jun 05 2025 at 22:37):

OK, I'll do that. (This is fun.)

Antoine Chambert-Loir (Jun 05 2025 at 22:37):

How do you benchmark something?
I tested both versions with big numbers, and crashed Lean.

Eric Wieser (Jun 05 2025 at 22:38):

set_option trace.profiler true in is a reasonable way

Eric Wieser (Jun 05 2025 at 22:38):

Pick numbers small enough not to crash, but big enough to take more than a second

Antoine Chambert-Loir (Jun 05 2025 at 23:03):

I wonder whether docs#Nat.minFac shouldn't be fueled as well. (I get fast results with #eval but Lean pretends the recursion is too high — there are less than 5 prime number.)

Bhavik Mehta (Jun 05 2025 at 23:15):

The change here would be to docs#Nat.minFacAux, which is the one containing the recursion. The intention is, as the docstring there describes, for norm_num or related to be the one computing this during proves. But (as I guess you're encountering just like I have), that's not always enough when the rest of a computation works by decide

Bhavik Mehta (Jun 05 2025 at 23:15):

As an aside, Nat.sqrt is also something I'd marked semireducible for the same reason, and that one should also be fueled

Eric Wieser (Jun 05 2025 at 23:15):

One thing to check here is that the weight of the extra fuel doesn't slow down #eval

Eric Wieser (Jun 05 2025 at 23:16):

If it does, then there is some extra work of writing a csimp lemma and keeping both versions

Antoine Chambert-Loir (Jun 07 2025 at 06:51):

I have a question about the docstring for docs#Nat.logC which says

An alternate definition for Nat.log which computes more efficiently. For mathematical purposes, use Nat.log instead, and see Nat.log_eq_logC.

Given function extensionality, I don't understand what is really meant here. Maybe, that it is easier to prove a mathematical lemma using the inductive definition of docs#Nat.log than that of docs#Nat.logC.

Johan Commelin (Jun 07 2025 at 06:52):

I think that's indeed the purpose of that comment. Some definitions are good for proofs, others are good for computations.

Antoine Chambert-Loir (Jun 07 2025 at 07:18):

i've made a preliminary fueled definition to replace docs#Nat.clog, which takes roughly the same time as the computation using docs#Nat.logC.
However, for large numbers, one needs to increase maxRecDepth to allow rfl to prove the desired equality:

set_option maxRecDepth 900 in
example : clog_fueled 2 123456789012345667890123456789012345667890123456789012345667890123456789012345667890 = 277 := rfl

This is roughly three times the size of the output, as if the tests that the iteration must perform were only done at the end. I had tried to make my recursion as much terminal as possible, but that is apparently not enough. Any idea?

Johan Commelin (Jun 07 2025 at 07:22):

@Antoine Chambert-Loir Is your fueled version on a branch somewhere, so that others can play with it?

Antoine Chambert-Loir (Jun 07 2025 at 07:23):

For the moment, it's just a plain file with iterative attempts. I'll make that at once. Should I put this in Mathlib.Data.Nat.Log or open a new file?

Johan Commelin (Jun 07 2025 at 07:24):

New file seems fine for experiments. We can merge it into the existing files later.

Antoine Chambert-Loir (Jun 07 2025 at 07:35):

#25557
branch#ACL/fuel_clog

Johan Commelin (Jun 07 2025 at 08:06):

@Antoine Chambert-Loir What do you think of

def clog' (b n : ) :  :=
  let rec go (fuel : ) (b n : ) :  :=
  match fuel with
  | 0 => 0
  | fuel + 1 => if 1 < b  1 < n then go fuel b ((n + b - 1) / b) + 1 else 0
  go n b n

#reduce clog' 2 37 -- 6

Antoine Chambert-Loir (Jun 07 2025 at 08:10):

This was basically my first attempt. For large numbers, rfl doesn't work, it requires maxRecDepth to be roughly 10 times the size of the output.

Antoine Chambert-Loir (Jun 07 2025 at 08:11):

This is why I rewrote is using an accumulator (to avoid the final + 1).

Johan Commelin (Jun 07 2025 at 08:26):

I guess in the meta world it will be easy to make this much faster, because then we have access to the binary representation of n.

Antoine Chambert-Loir (Jun 07 2025 at 08:30):

I just noticed something bizarre: with my version, #reduce requires the same recursion depth as yours, but proving equality (example : ... = ... := rfl) requires less.

Mario Carneiro (Jun 07 2025 at 13:00):

This is the best version I could come up with in terms of keeping the recursion depth to clog n + O(1):

def clog_fueled (b n : ) :  :=
  if b  1 then 0 else
    n.rec (fun a _ => a) (fun _ ih a n =>
      (Nat.ble n 1).rec (ih (Nat.succ a) ((n + b - 1) / b)) a) 0 n

set_option maxRecDepth 286 in
#reduce clog_fueled 2
  123456789012345667890123456789012345667890123456789012345667890123456789012345667890

Mario Carneiro (Jun 07 2025 at 13:01):

note that this is not the kernel checker doing the reduction though

Eric Wieser (Jun 07 2025 at 13:05):

Antoine Chambert-Loir said:

I just noticed something bizarre: with my version, #reduce requires the same recursion depth as yours, but proving equality (example : ... = ... := rfl) requires less.

I would guess that reduce always expends all the fuel, whereas rfl can stop sooner?

Mario Carneiro (Jun 07 2025 at 13:05):

no, the recursion depth is absurdly low, I think it's either somehow tail recursive or the recursion isn't being tracked

Mario Carneiro (Jun 07 2025 at 13:06):

kernel evaluation here is weird, I don't think there is a way to avoid proper recursion

Eric Wieser (Jun 07 2025 at 13:07):

Does swapping the rec for a match cause trouble?

Mario Carneiro (Jun 07 2025 at 13:07):

for the bool or the nat?

Eric Wieser (Jun 07 2025 at 13:07):

The nat

Mario Carneiro (Jun 07 2025 at 13:07):

for the nat it's a disaster, it triggers equation compilation and you get a brecOn and a bunch of cruft

Eric Wieser (Jun 07 2025 at 13:08):

even with termination_by structural?

Mario Carneiro (Jun 07 2025 at 13:08):

that is structural recursion as lean understands it

Eric Wieser (Jun 07 2025 at 13:09):

oh

Mario Carneiro (Jun 07 2025 at 13:11):

Eric Wieser said:

Antoine Chambert-Loir said:

I just noticed something bizarre: with my version, #reduce requires the same recursion depth as yours, but proving equality (example : ... = ... := rfl) requires less.

I would guess that reduce always expends all the fuel, whereas rfl can stop sooner?

note that if you actually expended all the fuel it would not terminate in your lifetime

Mario Carneiro (Jun 07 2025 at 13:11):

since the fuel is ~2^277

Antoine Chambert-Loir (Jun 07 2025 at 15:39):

True, but you know a priori that you don't need that much fuel, just the log of it.

Mario Carneiro (Jun 07 2025 at 17:33):

I was thinking about that, but unfortunately it's not easy to exploit that, since you... need to compute the log in order to use that as the fuel instead

Mario Carneiro (Jun 07 2025 at 17:36):

it is a bit of a shame that the computation involves constructing the numbers 123...7890, 123...7889, 123...7888 and so on down to n-277 since these numbers are pointlessly big. If it was a megabytes large number this would be quite expensive when actually computing the base 2 log of the number would be trivial on a normal computational system

Mario Carneiro (Jun 07 2025 at 17:38):

and even computing the general base log of a huge number can slice the number down very quickly, resulting in a constant factor overhead over the input storage

Johan Commelin (Jun 07 2025 at 17:43):

But don't we have the binary representation of the number at hand, whenever we actually want to do this computation?

Mario Carneiro (Jun 07 2025 at 18:15):

yes? I'm assuming the number is being represented in binary here

Mario Carneiro (Jun 07 2025 at 18:16):

that's why log2 in particular is completely trivial even if the number is gigabytes large

Antoine Chambert-Loir (Jun 07 2025 at 20:21):

Indeed, that's almost a shame to have to make these computations to get the number of binary digits. Couldn't it be more directly accessible ?


Last updated: Dec 20 2025 at 21:32 UTC