Stream: Equational

Topic: Which implications are harder for ATPs

Bruno Le Floch (Oct 25 2025 at 23:42):

All (positive) implications between laws of order up to 4 are quite easy for the Prover9 ATP. I've explored a bit which ones are a bit harder by some measures. The short answer is that implications from 450 and from 650 are both slower to obtain (more than one second with default Prover9 parameters), and require objectively more complicated steps (larger weight and depth) than others. Some other implications are outliers only in some of these ways.

First criterion: timeout of 1 second

Almost all implications can be obtained by Prover9 with default parameters (sos limit 20000, max weight 100) and with a 1 second timeout. Actually for most equations E it can show all consequences E⊨(E1∧E2∧E2...) of the equation in a single 1 second Prover9 run. The two exceptions are laws 450 and 650 (and their duals).

Second criterion: max_weight (longest expression in the proof)

Almost all implications can be obtained by Prover9 with parameter assign(max_weight, 25)., with the following exceptions (and their duals).

463⊨39 (and consequences) need max weight 27
837⊨107 needs max weight 27
1119⊨(8, 99, 151, 359, 411, 1223, 1629, 1832, 2035, 3253, …) needs max weight 27
1289⊨(411, 614, 1020, 2238, 2441, 2847, 3050, 4380) needs max weight 27
1312⊨(8, 411, 1020, 1629, 1832, 2035, 3253, 3319, 3862, 3915, …) needs max weight 27
848⊨835 needs max weight 31
650⊨many eqs (1208 of them) need max weight 33. The resulting proof of 650⊨4 has 42 steps (but see below for a 25-step proof), including several lines with the maximal weight 33, such as (x * ((y * ((z * u) * y)) * x)) * (w * ((v * (u * v)) * w)) = x * ((y * ((z * u) * y)) * x). We discussed human/LLM/Lean proofs of 650⊨4 (and the mostly identical 650⊨448) in #Machine Learning for Theorem Proving > A (semi)-autoformalization challenge: 650=>448.
510⊨(439,3862) need max weight 33. The proofs are much shorter, with 15 and 20 steps respectively. They pass through a single longest equation which states S(S(S(S(x)))) = x with S the squaring map (expanded into explicit uses of the basic operation).
450⊨(413,426,432) need max weight 35. Proofs with 16 steps with no clear structure (detailed proofs at 450-proofs.txt) with the longest step being the complicated (x * (y * x)) * (z * (((x * (y * x)) * (z * (x * (y * x)))) * (u * (w * (u * x))))) = x * (y * x).
703⊨(1426,1434,4275,4307) need max weight 55. Very short proofs (9 steps), with the step of weight 55 being the statement that C(C(C(x))) = x for C(x)=(x◇x)◇x the cubing map: written in full it is quite long.

Third criterion: max_depth (deepest nesting of the magma operation)

Almost all implications can be obtained by Prover9 with parameter assign(max_depth, 6)., with the following exceptions (and their duals). The equations 442, 450, 463, 474, 493, 645, 650, 703, 713, 910, 1119, 1289, 1312, 1313, 1316 have some implications that are not found with max depth 6. All of them are found with max_depth 7 and proofs are rather short and unremarkable.

Fourth criterion: length of proof

I don't know how to explore that systematically yet, so let me just comment on 650⊨4. With default parameters, Prover9 finds a proof with (length 113, weight 97) in a few seconds. With set(lightest_first) get proof with (length 131, weight 95). With set(breadth_first) get no proof in a minute. Setting sos_limit (number of equations processed at a given time) to tiny values has chaotic effects: assign(sos_limit, 8) gives a proof (length 45, weight 99); sos limit 9 or 13 gives proof with (length 29, weight 33); sos limit 10 gives no proof (like <8); sos limit 11 gives (length 76, weight 69); sos limit 12 gives (length 28, weight 37); sos limit 14 or 16 or 17 gives (length 31, weight 37); sos limit 15 gives (length 28, weight 33). Some more explorations with weight limit etc end up giving me the following interesting contenders for "simplest proof": (length 25, weight 37) or (length 28, weight 33). See the two proofs: 650-proofs.txt (from two different runs, so both are called Proof 1).

Terence Tao (Oct 26 2025 at 14:51):

Perhaps an abridged version of this report can be worked into the ATP section of the paper?

Jose Brox (Oct 27 2025 at 11:53):

Terence Tao ha dicho:

Perhaps an abridged version of this report

I think so, the max_weight, etc. info about positive implications fits well in the parameters tinkering paragraph I wanted to add, and I had some similar observations of changes in proof length etc. in function of the parameters. For me one problem is that the parameters are not independent, in the sense that the effect of a combined change of parameters on a proof (found or exhausted search, proof length, time needed...) is not the addition of the separated effects when we make ceteris paribus changes.

Last updated: Feb 28 2026 at 14:05 UTC

leanprover-community / mathlib

Zulip Chat Archive