Zulip Chat Archive
Stream: Equational
Topic: Austin pairs
Vlad Tsyrklevich (Oct 22 2024 at 18:19):
I'm able to generate ~75 finite implications out of 718 _finite_ unknowns (modulo equivalence, but not duality) on a local branch. I'm using a very simple method to bruteforce generalize the 206=>1648 and 3994=>3588 theorems so I can capture both of those results, and I'm sure that I'm missing some cases.
One interesting observation that I didn't realize until looking at the equation explorer just now is that 3588 and 3994 are duals, and therefore unsurprisingly 3588 also implies 3994 in the finite domain, so they are actually a (finite) equivalence class. 1113/2534 is another dual of this type. Here's a graph (zoom on the right).
Daniel Weber (Oct 23 2024 at 10:31):
A potential way to automatically prove these implications is to use Vampire, and for a bunch of a functions add the assumption , I can try to do that
Daniel Weber (Oct 23 2024 at 10:33):
Is there some kind of completeness/incompleteness for finite implications? Wikipedia's article on finite model theory says that Godel's completeness theorem fails here, but I don't know if that's already the case for the kinds of implications we're considering
Zoltan A. Kocsis (Z.A.K.) (Oct 23 2024 at 11:15):
@Daniel Weber Trakhtenbrot's theorem is a rather strong obstacle: you won't find a complete logic for finite structures, but even finding a reasonably good logic for finite-ish structures is a difficult open problem in proof theory. E.g. if you formulated the algebraic theory of lattices in such a logic, the logic should provide a proof of existence of a maximal element. If you formulated the theory of integral domains, it should provide a proof of existence of multiplicative inverses. We have no good way of getting something like that.
Michael Bucko (Oct 23 2024 at 11:45):
Daniel Weber schrieb:
A potential way to automatically prove these implications is to use Vampire, and for a bunch of a functions add the assumption , I can try to do that
Daniel, could you please somehow share those snippets / screenshots, or document how you're working with Vampire? That'd be very helpful.
Daniel Weber (Oct 23 2024 at 11:48):
Michael Bucko said:
Daniel Weber schrieb:
A potential way to automatically prove these implications is to use Vampire, and for a bunch of a functions add the assumption , I can try to do that
Daniel, could you please somehow share those snippets / screenshots, or document how you're working with Vampire? That'd be very helpful.
I haven't done this yet, as there are many such functions so I'm trying to think of a better approach
Vlad Tsyrklevich (Oct 23 2024 at 12:25):
Daniel Weber said:
A potential way to automatically prove these implications is to use Vampire, and for a bunch of a functions add the assumption , I can try to do that
I was trying to use Vampire for this, in an even simpler fashion, but I don't understand Vampire well enough and need time to read more
Terence Tao (Oct 23 2024 at 14:09):
Daniel Weber said:
Michael Bucko said:
Daniel Weber schrieb:
A potential way to automatically prove these implications is to use Vampire, and for a bunch of a functions add the assumption , I can try to do that
Daniel, could you please somehow share those snippets / screenshots, or document how you're working with Vampire? That'd be very helpful.
I haven't done this yet, as there are many such functions so I'm trying to think of a better approach
I think just asserting injective <-> surjective for left multiplication, right multiplication, and squaring should already handle most applications that we know of. One could update this list manually as new challenging implications come in that require anything beyond this.
Will Sawin (Oct 24 2024 at 20:23):
If it becomes necessary, you could also take functions defined using all subexpressions on either side of the equation, like if the equation involves the expression x * (x * y) you could add these axioms for x * (x * y) viewed as a function of x for fixed y and viewed as a function of y for fixed x.
Vlad Tsyrklevich (Oct 24 2024 at 22:01):
The current status is I'm able to resolve about 40% of open finite implications, and I'll give Vampire a go next and use that 40% as a benchmark to verify against.
Vlad Tsyrklevich (Oct 28 2024 at 17:25):
Quick update: To get something working I started generating proofs simply using Vampire with additional laws of the form for any inverses I could find of as I already had a lot of this implemented from the previous scripts. This solves 200/320 open finite implications (and also works for the implications of the three order 5/6 Austin laws.) This is implemented and verified in Lean.
I've run some tests locally adding bi-implications for left/right mul and squaring and the numbers are better, but as it's quite likely that's buggy since I don't have the Lean proofs to verify, I won't report them just yet. I need to investigate how painful adding parsing/conversion for the resulting Vampire proofs will be. In case anyone wants to double check me or is curious, the TPTP I'm using for this local test is (note: ![...]
is and ?[...]
is ):
fof(left_mul_inj_surj, axiom, ![A,M] : ?[B] : A = mul(M, B) <=> ![X, Y, M] : (mul(M, X) = mul(M, Y) => X=Y) ).
fof(right_mul_inj_surj, axiom, ![A,M] : ?[B] : A = mul(B, M) <=> ![X, Y, M] : (mul(X, M) = mul(Y, M) => X=Y) ).
fof(square_inj_surj, axiom, ![A] : ?[B] : A = mul(B, B) <=> ![X, Y] : (mul(X, X) = mul(Y, Y) => X=Y) ).
Vlad Tsyrklevich (Oct 28 2024 at 17:49):
One more note on an equation I ran across: Equation 1133 is , with and it could also be re-written . This is Equation 1979, which 1133 is equivalent to ({1133,1979} is the entire equivalence class.) I don't know if there's anything interesting about this phenomena, but I thought I would note it.
Vlad Tsyrklevich (Oct 30 2024 at 09:54):
The conversion is trickier than I expected, mostly because I'm just over my head trying to use @Daniel Weber's tooling. The conversion from Vampire to Lean with the new rules is not completely trivial as the introduction of implications/disjunctions complicates the conversion. Daniel was able to make it work in Greedy, but I'm not as skilled as him in Lean.
By chance looking around though I found Duper and using it locally for tests has been really simple, check out how trivial this is to prove a finite implication using the method above:
lemma left_mul_inj_surj (G: Type _) [Magma G] [Finite G] : (∀ X0 X1 : G, ∃ X2 : G, (X1 ◇ X2) = X0) -> (∀ X3 X4 X5 : G, ((X5 ◇ X3) = (X5 ◇ X4)) -> X3=X4) := by
intro ha _ _ x _
let f (s : G) := x ◇ s
have surj : Function.Surjective f := by
repeat intro
simp [f, ha]
have inj : Function.Injective f := by simp [Finite.injective_iff_surjective, surj]
apply inj (by assumption)
example (G: Type _) [Magma G] [Finite G] (h: Equation1076 G) : Equation151 G := by
duper [h, left_mul_inj_surj]
example (G: Type _) [Magma G] [Finite G] (h: Equation1110 G) : Equation8 G := by
duper [h, left_mul_inj_surj]
I took a couple random implications off of the equation explorer and it could solve them as well. I tried 1689=>2 and I believe it worked but proof reconstruction failed (that's my interpretation of duper#30).
I will take a stab at this approach and see how well it compares to what we get with Vampire and what the build times are like. However, more generally, this is kind of a nice tool, at least for working with implications. It's too bad we hadn't found this earlier, as it may have been useful in the early days when having a Lean-native ATP would have made filling out the implication graph very easy (though again, I'm unsure about build times.)
Josh Clune (Oct 30 2024 at 15:05):
Vlad Tsyrklevich said:
I took a couple random implications off of the equation explorer and it could solve them as well. I tried 1689=>2 and I believe it worked but proof reconstruction failed (that's my interpretation of duper#30).
Thanks for making duper#30, I'll work on figuring out what's going wrong there and push a fix as soon as I'm able. In the meantime, if you use set_option simultaneousSuperposition false
, Duper should be able to solve 1689=>2 without a proof reconstruction error (long term, I don't intend for that to be an option you should have to set, but hopefully it can serve as a workaround while I debug duper#30).
Vlad Tsyrklevich (Nov 01 2024 at 13:16):
So duper is able to solve many implications, but there also quite a few it times out on. I am able to coax it into finding them by giving it intermediate steps from Vampire, but now that I'm able to verify that this approach works, and check steps I'm not confident in, I've put down the Vampire->Lean conversion for now and gotten back to researching the fundamental approach.
I went back to looking at how 'complete' the results for left mul/right mul/inverses are, and there's a lot more we can gain than those alone. By bruteforcing additional axioms I was able to come up with a more complete set, but this approach is unideal as I'm only checking a single axiom at a time, and it's possible that some refutation requires a combination. I can't simply give every single possible rule to vampire as it fairly quickly is unable to solve implications it was previously able to.
There are a couple of approaches here: bruteforcing individual axioms and building up the minimal set we needed for previous implications and then trying on top of those, trying to use the implication graph somehow, inverses could still be useful. Perhaps finiteness gives us some other property that could be useful to give Vampire here? This is an interesting example where finding finite implications is slightly more difficult than finding them generally, as the number of additional axioms we give the ATP begins to overwhelm it.
Vlad Tsyrklevich (Nov 01 2024 at 13:17):
Another idea is to try using a higher-order ATP and feed it the general injectivity <=> surjectivity rule, though I suspect the state space explosion would render the approach useless.
Vlad Tsyrklevich (Nov 01 2024 at 13:23):
Also I'm not bruteforcing with the complete list of possible bijectivity rules right now, in the case the LHS is I am overly conservative in which rules I produce. This is related to what Will Sawin mentioned earlier with keeping individual variables constant. I need to go through and improve the choices I make there and ensure I'm doing so correctly. I'm also only currently generating bijectivity rules up to order 4 right now, and perhaps to solve some implication we actually need to go to higher orders?
Vlad Tsyrklevich (Nov 01 2024 at 13:31):
Anyways, there are 311 open finite implications on main, I have formalized solutions to 200, and another 85 I believe should be correct, leaving only 26 left, so perhaps instead of finding the correct solution I should just look at the individual cases instead. The list is:
List
Matthew Bolan (Nov 05 2024 at 04:55):
If you find a finite counterexample to 1486 -> 3877, it can be promoted via the twisting method to a finite counterexample to 1486 -> 3862. Given a magma satisfying 1486 and not 3877 , we can define a new 1486 magma by , and if I've done my algebra correctly this new magma will be a counterexample to 1486 -> 3862. There is no way around the need for a 3877 counterexample, the twisting monoid for 1486 is , while for 3877 it is , so we can't twist something satisfying both 1486 and 3877 and get something satisfying 1486 but not 3877.
Matthew Bolan (Nov 05 2024 at 05:03):
(This is the only implication on your list where twisting does anything)
Vlad Tsyrklevich (Nov 05 2024 at 06:04):
Currently, I believe I have (non-formalized) finite implications for everything but the 4 from 3308/3549.
Matthew Bolan (Nov 05 2024 at 06:06):
Oh, awesome!
Terence Tao (Nov 05 2024 at 06:16):
Huh, you may complete the finite implication graph before the original implication graph is complete!
Vlad Tsyrklevich (Nov 05 2024 at 06:18):
Well, implications are a lot easier, and filling out the finite graph has mostly been finding implications.
Matthew Bolan (Nov 05 2024 at 06:25):
So do you believe the remaining 4 implications are true or false?
Vlad Tsyrklevich (Nov 05 2024 at 06:37):
I'm not sure. The brute force approach I took may be incomplete, and I haven't finished figuring out if I could use more values. Let me explain. I was able to solve the remaining implications by using a pretty basic brute force algorithm, where I picked an LHS, call it , and then tried different inputs, e.g. , , etc. and used those as values to . However, this will never generate equations like 3308 () or 3549 () because the RHS is not of the form .
I tried to think about what statements I could make about such equations, and I never came to a conclusion. For example, for equation 343 () I can say that holding x and y constant, that surjectivity implies injectivity:
fof(eq343_bij_y, axiom, ![X,Y,Z] : ?[T] : ( mul(X, Y) = mul(Z, mul(X, T)) ) <=> ![T1,T2,Z,X] : ( (mul(Z, mul(X, T1)) = mul(Z, mul(X, T2)) ) => mul(X, T1) = mul(X, T2) )).
fof(eq343_bij_x, axiom, ![X,Y,Z] : ?[T] : ( mul(X, Y) = mul(Z, mul(T, Y)) ) <=> ![T1,T2,Z,Y] : ( (mul(Z, mul(T1, Y)) = mul(Z, mul(T2, Y)) ) => mul(T1, Y) = mul(T2, Y) )).
EDIT: This bijection is incorrect
Can I make any such statements about eq323, ?
Terence Tao (Nov 05 2024 at 06:58):
The remaining implications follow from
Lemma 1: Let be finite, and let be such that . Then .
Proof: Call a point periodic if for some . Because the forward orbit in a finite space must repeat, we see that for each there exists an such that is periodic. Let be the first for which this occurs. If , then cannot be periodic (as this would imply periodic), and then we can see that . Hence the maximal value of is at most , which implies that is periodic for every . Thus there is an such that . Since , we have , as desired.
3308 states that obeys the hypotheses of Lemma 1, which gives the desired implications. For 3549, one can similarly argue by applying to the following variant lemma:
Lemma 2: Let be finite, and let be such that . Then .
Proof: By hypothesis, uniquely determines . This prevents for any (because if is such that , then ), and hence also prevents for any . So again we have periodic for all , so for some . Then as required.
Vlad Tsyrklevich (Nov 05 2024 at 07:41):
Cool! This is a powerful method, I wonder if it's stronger than surjective↔injective
Vlad Tsyrklevich (Nov 05 2024 at 10:42):
Both of these lemmas are directly related to the original method I was using of finding inverses, in that case I used that , and the final statement of the lemmas above are derivable from the original assumption by applying . It's funny, I had assumed that the original method didn't have enough generality to cover more cases, but clearly I was not looking hard enough.
Vlad Tsyrklevich (Nov 05 2024 at 15:06):
Vlad Tsyrklevich said:
I wonder if it's stronger than surjective↔injective
Having given this idea a little more thought, I'm not sure that it offers any new axioms for equations with LHS of just 'x' so how could it be stronger.
Alex Meiburg (Nov 05 2024 at 17:00):
Just curious, so you have the finite implications complete, do you have witnesses to all of the finite non-implications? That would be surprisingly fast because I was of the impression that some of the non-implications were proved with infinite magmas, even though it didn't necessarily seem that there would be an implication in the finite case.
If you don't, how do you know which will be implications or non-implications?
Alex Meiburg (Nov 05 2024 at 17:00):
(I'm probably misunderstanding one thing or another)
Vlad Tsyrklevich (Nov 05 2024 at 17:43):
The original answer was "All remaining finite unknowns were implications", but writing this it actually seemed somewhat far-fetched, so I began to double check and I'm sorry to report that at least some of the implications I had conjecturally resolved relied on a faulty premise. I will need some time to go through and verify everything to understand how many finite unknowns remain open.
Vlad Tsyrklevich (Nov 08 2024 at 07:47):
Current status: equational#797 has the implications that I've already formalized in Lean, and I will shortly propose another change (equational#804) to add a number of finite conjectures: first the ones Terence solved above, and second, the injective<->bijective ones I have been able to verify using Duper. I don't think Duper is suitable to upstream due to versioning issues, but it should eliminate any risk that I've made any more mistakes. There was only one injective<->bijective implication I believe is correct that I did not succeed in making Duper recognize, 1516->255. I upload the Duper verified conjectures here so I can link them from Conjectures.lean. DuperConjectures1.lean
Vlad Tsyrklevich (Nov 08 2024 at 07:50):
The remaining balance is 30 unknowns modulo duality, equivalence class, and conjectures. I will take a look at how feasible it is to add a finite graph view to the equation explorer and graphiti next.
Matthew Bolan (Nov 08 2024 at 15:25):
Is the list of unknown finite implications somewhere?
Vlad Tsyrklevich (Nov 08 2024 at 15:27):
List after equational#797 and equational#804
Matthew Bolan (Nov 08 2024 at 16:32):
Where did the table here come from https://github.com/teorth/equational_theories/blob/main/equational_theories/Generated/All4x4Tables/Refutation924.lean#L20 ?
Alex Meiburg (Nov 08 2024 at 16:37):
Those are all generated (hence) the name by another program, in this case from https://github.com/teorth/equational_theories/tree/main/equational_theories/Generated/All4x4Tables/src
Matthew Bolan (Nov 08 2024 at 16:42):
I gathered that, but which other program? This particular table didn't look to me as if it could have come from any of those programs, but I gave it only a cursory glance.
Vlad Tsyrklevich (Nov 08 2024 at 16:44):
By searching for that table in https://github.com/teorth/equational_theories/blob/main/equational_theories/Generated/All4x4Tables/data/plan.txt, you can find it on line 3698, using blame on the top left in github, you can find that it was recently added by Daniel, importing vampire refutations to the All4x4Tables
Matthew Bolan (Nov 08 2024 at 16:44):
There are other tables in this folder which clearly came from ATP, for example I think I recognize the strange 12x12 854 magma and the 32x32 1485 magma.
Vlad Tsyrklevich (Nov 08 2024 at 16:45):
Agh, that's the plan though, not where it came from. Anyways it's also from vampire-generated.txt
but that explanation was the wrong way to find that info
Matthew Bolan (Nov 08 2024 at 16:46):
Thank you, I missed that file somehow.
Terence Tao (Nov 09 2024 at 16:22):
opened equational#809 to formalize the two lemmas about functions on finite sets used above to prove implications on finite magmas.
Matthew Bolan (Nov 09 2024 at 18:38):
Matthew Bolan said:
Where did the table here come from https://github.com/teorth/equational_theories/blob/main/equational_theories/Generated/All4x4Tables/Refutation924.lean#L20 ?
This table remains very impressive to me. Even imitating the vampire script which I assume found it (and allocating vampire more time), I am unable to coax vampire into outputting it. What did you do to find this thing @Daniel Weber ?
Vlad Tsyrklevich (Nov 09 2024 at 19:38):
Don't know if you've seen this already, but I presume it's from equational_theories/Generated/VampireProven/src/vampire_fin_counterexamples.py
which has the vampire command in there
Matthew Bolan (Nov 09 2024 at 21:08):
I can't quite coax that into giving me this table either. I'm guessing some small modification to the schedule there will get it though.
Daniel Weber (Nov 10 2024 at 04:31):
Searching the git history, it was generated as a counterexample to 1486 -> 2052 with vampire_fin_counterexamples.py
Matthew Bolan (Nov 10 2024 at 04:37):
Ah, that explains a lot. I tried generating it from the things it is currently listed as a counterexample to, (for some I even found smaller tables, like this one), but 2052 is not currently among them.
Terence Tao (Nov 15 2024 at 02:10):
For some of the outstanding implications on the finite magma graph, one could try filling in a translation invariant model of some fixed size, e.g., 10, using an ATP (possibly inserting explicitly the assertion that squaring is a bijection for such models, in case it speeds things up).
Terence Tao (Nov 15 2024 at 03:27):
Here is a (positive) resolution of the four outstanding implications for 3342. Setting , and , the law is , and we will claim that and for finite magmas, giving the four open implications.
Note that , hence is a homomorphism.
On a finite magma, we have some eventual period so that . From the law we have , and . Then
and a similar argument gives .
Matthew Bolan (Nov 15 2024 at 03:39):
I think the definition of should read
Terence Tao (Nov 15 2024 at 03:51):
The implication 1516->255 for finite magmas was already established (because is surjective, hence bijective, hence , so squaring is invertible, and 255 is known for squares); see the summary at https://leanprover.zulipchat.com/#narrow/channel/458659-Equational/topic/1516.20-.3E.20255/near/478524073
Ibrahim Tencer (Nov 15 2024 at 06:52):
That should say 3342 instead of 3844 right?
Vlad Tsyrklevich (Nov 17 2024 at 20:35):
Terence Tao said:
Here is a (positive) resolution of the four outstanding implications for 3342
I have this formalized, there's still some clean-up I need to do but figured I'd mention it here in order to avoid duplicating work with anyone else.
Terence Tao (Nov 17 2024 at 23:26):
I wrote up the proof of these implications in the blueprint at equational#844 so if you want to formalize them you can also add \leanok's to the blueprint.
Terence Tao (Nov 19 2024 at 01:33):
Attached at counterexample.txt
is a finite (order 84) counterexample to 503 -> 4065, 3862, refuting a net of 6 implications for finite magmas. (It also contains a link to Finite Magma Explorer, but the link is too large to put directly into Zulip.) Incidentally one should update FME to check against the finite implication table rather than the infinite implication table, as currently it does not register this as a new refutation. [EDIT: opened equational#855 for this.]
Some words on how I constructed this example. Equation 503 reads . In finite magmas, the are all invertible permutations, so this is equivalent to , which in turn is equivalent to . I interpret this as follows: if the permutation contains a cycle , then we must have for all (where we extend periodically by ). In particular each then must have a cycle that contains a portion . One nice thing about this law is that the conclusion does not actually involve directly, but only the cycles generated by .
So it becomes a question of seeing how cycles propagate to create other cycles, without causing collisions. From the rule we see that it becomes advantageous to reuse cycles as much as possible: if there was already a cycle that was of the form that does not intersect with any previously defined values of , then the cycle does not generate any new cycles to worry about at the index at least, if we simply add the old cycle to the definition of .
Note that the right absorption law obeys 503, which corresponds to all cycles being of length . Now this of course will not produce counterexamples, but it does hint that we should try to stay close to this law (cf. the "modified base magma strategy"), and this leads to the idea that the permutations should be close to the identity map in that most cycles are just 1-cycles. In order for to have a 1-cycle it is necessary that be idempotent. If every element is idempotent then we do not have any counterexamples, so I deliberately looked for models with exactly one non-idempotent at 0, and all other elements idempotent. Now it is perfectly safe for each shift to have a 1-cycle as long as . As a consequence, it now becomes perfectly safe for each shift to have a -cycle with , as this only generates safe singleton cycles.
So now I tried a greedy algorithm that I plan to terminate in finite time, iterating the cycle generation procedure until I am just left with fragments that I can close up and complete using 3-cycles and 1-cycles. All the fragments generated by the above cycle propagation rule for a given shift are going to be fine (if the magma is large enough) because we can close them up into a 3-cycle that involves an element that hasn't been used previously - unless they are adjacent to another fragment, in which case they need to be closed up into some longer cycle; I ended up mostly using 10-cycles when this occurred, because at that size the further cycle fragments that were generated no longer cause problems.
It took a lot of trial and error (and python code, which I attach if anyone is interested at
magma.py)
, but I was able to start with the cycle for (which is already enough to contradict all the outstanding implications, if it can be completed to full magma) and iterated, using new elements whenever needed to avoid cancellations. For instance, in my construction contained the cycle , contained the cycle , and contained the cycle . This mostly generates disconnected fragments that are easy to close up, but there are a couple bad cases remaining (primarily involving fragments that contain the non-idempotent 0) that required quite a bit of ad hoc fiddling to dispose of, but in the end I was able to close all the cycles and get the magma in the file above.
It may be possible to compress this example into a smaller form, perhaps using ATP assistance; I don't know if this order 84 example is too large for Lean to handle.
Terence Tao (Nov 19 2024 at 02:29):
For what it's worth, equation 476, which had been previously flagged as a "mysterious twin" to 503, has a similar interpretation , so the only difference is that the cycle fragment is assigned to rather than . So presumably a similar technique will handle the two implications coming out of 476; I might try this later if I have some free time. EDIT: actually, the two laws may be "definably equivalent" in the finite case at least, so that there is some way to transform the existing magma to refute the 476 implications also. Will look into this now.
Douglas McNeil (Nov 19 2024 at 02:33):
Wow, I thought the days of the finite magma contributions were over! I think you can get down to order 77 but still satisfy 503 and violate 4065, 3862 just by removing single elements which preserve the demonstration, but that's only a modest improvement.
Terence Tao (Nov 19 2024 at 02:48):
I finally have an explanation for the "twinning" of 476 and 503. If solves 503, with all the invertible, then we can create a 476 operation by requiring that the left multiplication operation for is the inverse of the left multiplication for :
It is then a routine matter to verify that obeys 476 if and only if obeys 503.
Applying this to the previous example, I now have a finite magma (also of order 84) that refutes 476 -> 359, 4065:
counterexample-dual.txt
Weirdly, the twinning is not perfect: for finite magmas, 476 implies 3862, but 503 does not.
Vlad Tsyrklevich (Nov 19 2024 at 07:26):
Douglas McNeil said:
Wow, I thought the days of the finite magma contributions were over!
I think the finite unknowns likely have both implications and refutations still remaining. I think some may not even be that difficult, but the infinite refutation was just easier than the finite one.
Vlad Tsyrklevich (Nov 19 2024 at 07:33):
Terence Tao said:
I don't know if this order 84 example is too large for Lean to handle.
I think this should be doable, 503 and 476 are two variable equations so this should only be ~ operations to check which is still very small. The destination equations are one variable so it's probably not even worth optimizing it by finding a specific counterexample for Lean to check.
Incidentally, @Douglas McNeil is still a ~15% improvement so if you have the magma still handy I can try to feed it to Lean and see how it responds.
EDIT: Turns out I was wrong, Lean hits a deterministic timeout, even if I quadruple the maxHeartbeats.
Douglas McNeil (Nov 19 2024 at 11:45):
^ here it is, FWIW.
Vlad Tsyrklevich (Nov 19 2024 at 14:16):
Even this requires ~90s and maxHeartbeats 1.75 million to finish, that seems somewhat out-of-proportion to the computational difficulty of this check, but I have no idea what's going on inside of Lean.
Bernhard Reinke (Nov 19 2024 at 14:17):
Here is a variant of injective <-> surjective that might be useful: If is a finite set and a map
such that is nonempty for all and for all , then for some bijection . This should be easily provable by counting, first establishing that has to be a singleton, then that is defined and injective, hence a bijection.
This can be useful to formalize 1701 => 4587 (it seems this was already proven by duper?): here we can take . 4587 is equivalent to being a singleton for all , and we get by the functional equation that they have to be disjoint (as for ).
Vlad Tsyrklevich (Nov 19 2024 at 14:35):
Bernhard Reinke said:
it seems this was already proven by duper?
Everything duper-verified was found using injective<->surjective, but the duper proofs aren't upstream-able.
Terence Tao (Nov 19 2024 at 16:11):
I've marked the 503 and 476 refutations as conjectures for now.
It may be possible to compress the example further by identifying elements. For any given element , the sets are quite small typically in this example. If there are with disjoint and with the property that for any , are not all distinct, one can try to combine them into a single element by setting equal to the choice of , not equal to if such a choice exists, or otherwise, and also setting equal to for , for , and otherwise. It's not completely guaranteed that this will produce a new counterexample, but for the rows and columns where are very small, this should happen fairly often, so one could iteratively search for valid compressions by computer perhaps.
Also, I think one can formalize in Lean the way to convert a 503 counterexample to a 476 counterexample, so one only has to do the big verification for one counterexample rather than two.
Douglas McNeil (Nov 19 2024 at 17:17):
Down to order 50, following that technique:
Vlad Tsyrklevich (Nov 19 2024 at 17:28):
We get a better-than-I-would-expect reduction here (), this builds in 13s, requires a maxHeartbeats of ~300k
Vlad Tsyrklevich (Nov 19 2024 at 17:32):
If we can get it down to 45 I think it would build without requiring maxHeartbeats changes
Terence Tao (Nov 19 2024 at 18:00):
The law 503 is automatically true when and is idempotent. This should cover a large fraction of the cases, as by design every element except is idempotent. It may be possible to then do a custom verification of 503 that is slightly faster than the brute force computation.
Vlad Tsyrklevich (Nov 20 2024 at 09:57):
I built a simple heuristic bruteforcer to find other proofs by periodicity as Terence did with 3342 above. I simply took all single-variable functions up to order 3, and then searched for implications with the additional axiom with a number of different values for i and j. For 3342, with the function that Terence used, , vampire found the implication with all values of i,j that I tested with, but for all other tested functions it found at most 1 positive hit. Even for expressions like that took ~50kb to represent as an axiom in TPTP, vampire still completed the proof in <0.1s
This is useful for finding other possible implications of this style, but obviously is incomplete and doesn't establish full evidence for the existence of such a proof. It would also be closer to the style of the original proof to try only with functions that are a shared sub-expression of the two ends of the implication, but I wanted to go fishing for other possibly interesting behavior we might find so I searched all functions up to order 3.
Looking specifically for functions that had implications for all tested values of (i,j), it found multiple hits in the current list of conjectures, but no hits among the unknowns. So this may still be useful for formalizing those unknowns, if converting the bijections proofs with vampire proves too painful, but found nothing new.
Vlad Tsyrklevich (Nov 22 2024 at 22:19):
Crazy, there are still finite refutations left that are captured by just an 8x8 magma https://teorth.github.io/equational_theories/fme/?magma=%5B%5B1%2C0%2C3%2C2%2C4%2C5%2C6%2C7%5D%2C%5B2%2C5%2C3%2C4%2C0%2C1%2C6%2C7%5D%2C%5B4%2C1%2C3%2C2%2C0%2C5%2C6%2C7%5D%2C%5B1%2C0%2C3%2C2%2C7%2C5%2C6%2C4%5D%2C%5B6%2C1%2C3%2C2%2C4%2C5%2C0%2C7%5D%2C%5B4%2C5%2C0%2C2%2C3%2C1%2C6%2C7%5D%2C%5B0%2C1%2C3%2C2%2C4%2C5%2C6%2C7%5D%2C%5B1%2C0%2C3%2C2%2C4%2C5%2C6%2C7%5D%5D
Vlad Tsyrklevich (Nov 22 2024 at 22:30):
To be fair, I guess order is perhaps a poor measure of complexity, since equations that are towards the top of the Hasse diagram will allow many models so it takes longer to search through a given order compared to others that don't. But this is just by simple exhaustive search, though I have now optimized the mace4 parameters I use for every given implication after learning about them from @Jose Brox
Vlad Tsyrklevich (Nov 22 2024 at 22:31):
I timed the amount of time it took to search through a given order with different parameters and I was surprised by how large a difference it can make.
Vlad Tsyrklevich (Nov 23 2024 at 07:32):
Another negative result: I had been meaning to try this for a while, currently I've checked for finite implications by looking at the inverses of the source and target of a given unknown implication, but what about inverses of other implications of the source? I wired up a script that consumes the output of an ATP printing equivalent laws, and fed inverses of any equivalent laws back into the TPTP on repeat. The method is kind of interesting because it's trying to move through the infinite graph of implications by essentially making a hacky duct-taped "finite ATP", but it didn't find anything new in order 4 or 5.
Terence Tao (Nov 24 2024 at 03:37):
Some cute observations about finite 1518 models (but no new implications, unfortunately).
The equation reads , so is surjective, hence bijective on finite magmas. 1518 implies 359, , hence ; so squaring is injective, hence bijective. So in fact all the are bijective, so finite 1518 magmas are left-cancellative.
This gives the following weird symmetry: define a new operation by , so . Then becomes ; squaring in this new magma inverts squaring in the old magma. Also, rewriting 1518 as and setting we obtain , so this reflected magma also obeys 1518.
I checked that the outstanding implications 47, 617, 817, 3862 are unchanged under this symmetry (and thanks to 359, the claims 617 and 47 are in fact equivalent). Not sure where to take this further though.
Douglas McNeil (Nov 25 2024 at 15:07):
@Vlad Tsyrklevich : I had a thought yesterday. To complete the finite side, does that mean someone's going to have to get 854=/>413 or prove there's no finite counterexample? :scared:
Terence Tao (Nov 25 2024 at 15:42):
Yeah, a large fraction of the remaining unresolved finite implications are coming from familiar nemeses like 854 where we were barely able to get a resolution in the infinite case by either greedy methods or confluence methods. There seem to be some genuinely difficult problems in there!
I played a bit with 1110=>1629 since it had the appealing form (where as usual and ). Immediately one gets that are bijections so that one can permute to get , , and . This gives some cute identities:
Lemma 1: (equation 8)
Proof: Set , then . Also . Cancelling, we obtain as required.
Lemma 2: .
Proof: On the one hand we have , so and hence . On the other hand . Cancelling, we obtain as required.
The latter is hinting at an abelian translation invariant model (which would automatically have commuting squaring and commuting maps) but I checked by brute force computer calculation that no model on any cyclic group of order up to 12 is available. (I should try other abelian groups like though.)
EDIT: Since , equation 1629 is equivalent to , so in translation invariant models (in which cubing is a bijection) it collapses to the idempotent law ; in general 1629 is equivalent to the claim that cubes are idempotent. The only translation invariant model I currently know of is the linear model on a ring where the coefficient obeys the equation , but such models are unfortunately idempotent. In fact it seems that all linear models (commutative or otherwise) obey 1629. One such model is where obeys the golden ratio equation ; now we don't have idempotency, but instead the cubing map is identically zero, which forces for a different reason.
Matthew Bolan (Nov 25 2024 at 15:47):
1486 isn't quite 1485, but it is still implied by 168 and so much of the directed graph machinery carries over. 1486 also has anti-implications falsified by some impressively large finite models (for example a 21x21 table!) found by vampire. As I observed above it also admits non-trivial twists, which has the consequence that it suffices to give a table refuting 1486 -> 3877.
Matthew Bolan (Nov 25 2024 at 15:51):
I also notice that there are still some open implications from 1076 and 879, which have not yet been formalized even in the infinite case. I've tried to search for finite models of these through various means with no luck.
Vlad Tsyrklevich (Nov 25 2024 at 16:17):
Douglas McNeil said:
someone's going to have to get 854=/>413 or prove there's no finite counterexample
Indeed, and one interesting thing I think about the finite graph is that unlike with the general graph, it's not clear to me which of the remaining unknowns may be implications, though others with a greater degree of mathematical sophistication may be able to derive educated opinions on the matter.
Matthew Bolan (Nov 25 2024 at 16:34):
To record the graph version of 1486 x = (y ◇ x) ◇ (x ◇ (z ◇ z)):
We ask for a directed graph with a notion of "good" paths of length 2 satisfying:
Claim 1 Any two points are connected by exactly one good path.
Claim 2 Any directed edge is the first edge of some good path.
Claim 3 if , , are good, then is good.
Given such a directed graph, we can place the structure of a 1486 magma on its vertices by defining to be the midpoint of the unique good path . Claim 3 is so that 1486 is satisfied. Conversely, given a 1486 magma, letting mean that for some , we have that trivially, and by 1486, so we can take our notion of good path to be a path of the form . Claim 1 obviously holds, Claim 2 holds as if then there exists with , and then is good. Finally, 1486 implies Claim 3 for this graph.
Unlike 1485, we know a model of 1486 which is not of size or , this order 21 example which falsified most of its outstanding implications.
Matthew Bolan (Nov 25 2024 at 16:50):
It seems the smallest 1486 magma which is not of size or is order 11, for example this one. I also found this degree 13 example. Actually, for , I only have an example of order 18
Vlad Tsyrklevich (Nov 25 2024 at 17:53):
Daniel Weber said:
Searching the git history, it was generated as a counterexample to 1486 -> 2052 with
vampire_fin_counterexamples.py
How did you generate your finsched.sch? My build of vampire doesn't seem to like it, and I don't see any of the lines in the git history of Vampire's CASC/Schedules.cpp
and couldn't find much info about schedules online.
Matthew Bolan (Nov 25 2024 at 17:53):
I think it is a modified version of the schedule for casc_sat, but I only glanced at it briefly.
Vlad Tsyrklevich (Nov 25 2024 at 17:55):
Ah, indeed. Now I do see some of these lines in CASC/Schedules.cpp
, vim was interpreting the + as a regexp instead of a char while I was searching.
Vlad Tsyrklevich (Nov 25 2024 at 18:07):
Using the Linux version downloaded from their website instead of my locally-built macOS one also resolves the remaining issues I had.
Daniel Weber (Nov 25 2024 at 19:27):
Matthew Bolan said:
I think it is a modified version of the schedule for casc_sat, but I only glanced at it briefly.
Indeed, it's simply just the fmb
parts from the casc_sat
schedule
Vlad Tsyrklevich (Nov 25 2024 at 20:07):
1728 =/> 99. That was easy, just using vampire.
Matthew Bolan (Nov 25 2024 at 20:14):
Ack, I messed up here. Claim 3 does not hold for all graphs produced this way, in fact my order 11 one gives rise to a counterexample.
Matthew Bolan said:
To record the graph version of 1486 x = (y ◇ x) ◇ (x ◇ (z ◇ z)):
We ask for a directed graph with a notion of "good" paths of length 2 satisfying:
Claim 1 Any two points are connected by exactly one good path.
Claim 2 Any directed edge is the first edge of some good path.
Claim 3 Any directed edge is the last edge of some good path.
Claim 4 if , , are good, then is good.Given such a directed graph, we can place the structure of a 1486 magma on its vertices by defining to be the midpoint of the unique good path . Claim 4 is so that 1486 is satisfied. Conversely, given a 1486 magma, letting mean that for some , we have that trivially, and by 1486, so we can take our notion of good path to be a path of the form . 1486 implies Claim 4 for this graph.
Unlike 1485, we know a model of 1486 which is not of size or , this order 21 example which falsified most of its outstanding implications.
Matthew Bolan (Nov 25 2024 at 20:24):
I think I can just drop claim 3 and then everything will work out. I only needed claims 2 and 3 to prevent any spurious edges in the graph, but claim 2 already guarantees that the edge in the graph becomes in the magma.
Terence Tao (Nov 25 2024 at 20:47):
Small remark: the dual 2126 to 1486 has previously been discussed; it has a complete rewriting system, which is why we could resolve things in the infinite case, but not the finite case.
Just for the record, here is the current state of play with the finite graph (after 1728 is resolved) with the outstanding equations (and their duals), their implications (counting duality, which multiplies by a factor of 2) and how they were resolved for the infinite graph:
- 677 (dual 2910). 2 implications, previously established by the greedy method.
- 854 (dual 2712). 10 implications, previously established by a complicated greedy method or the unique factorization method.
- 879 (dual 2650). 2 implications, established by a (not yet formalized) greedy method.
- 906 (dual 2647), 2 implications, previously established by the greedy method.
- 1076 (dual 2531), 4 implications, established by a (not yet formalized) greedy method.
- 1110 (dual 2497), 2 implications, previously established by the confluent method.
- 1486 (dual 2126), 4 implications, previously established by the complete rewriting method.
- 1516 (dual 2091), 2 implications, previously established by a greedy method.
- 1518 (dual 2054), 8 implications, previously established by the greedy method.
Matthew Bolan (Nov 25 2024 at 20:49):
Woah 1516 has outstanding finite implications? Those would be really nice to find a finite counterexample to...
Matthew Bolan (Nov 25 2024 at 20:50):
No, you already refuted the finite implication 1516 -> 255.
Matthew Bolan (Nov 25 2024 at 20:50):
Vlad Tsyrklevich (Nov 25 2024 at 20:51):
1516 -> 1489 is still open.
Matthew Bolan (Nov 25 2024 at 20:53):
Ah, my bad. But that one has been formalized in the infinite case.
Terence Tao (Nov 25 2024 at 20:54):
You're right, I've adjusted the text accordingly.
Matthew Bolan (Nov 26 2024 at 04:14):
I notice that Vampire has this flag --fmb_detect_sort_bounds
which has the description: "Use a saturation loop to detect sort bounds introduced by (for example) injective functions". Sounds useful!
I also noticed that the various sporadic magmas for law 1486 seem related, for example I can get a large portion of the table to agree between the 11x11 and 13x13 examples by doing some relabeling, and the 21 by 21 table contains many rows and columns which look like the ones coming from the smaller examples. I should get back to my "day job" though, so no more time to explore this tonight.
Vlad Tsyrklevich (Nov 26 2024 at 20:45):
Terence Tao said:
and how they were resolved for the infinite graph
I did not follow along closely enough to have a good understanding of it, but is confluence actually applicable in a finite context? Or any of the other general implication refutations for that matter.
Terence Tao (Nov 26 2024 at 22:33):
The way I understand it, confluence provides a way to determine whether two words are equal in the free magma for a law, namely if their unique reductions agree. If their reductions do not agree, then this free magma then provides a counterexample to the implication that the law implies . But the free magma can be infinite, so this does not necessarily give a finite example. If we knew that the free magma had enough finite quotient that one could separate points, then we could also get a finite counterexample, but this doesn't have to be the case (in group theory, a classic example of this is the Higman group). That said, I don't know of a confluent law whose free magma didn't have enough finite quotients to separate points.
Terence Tao (Nov 26 2024 at 22:39):
A negative result: I checked the linear models for 1110 and 906 (hoping that we had simply overlooked this possibility in previous sweeps), but it looks like these implications are immune to both commutative and noncommutative models. 906 has a twisted idempotent model where but I was not able to make much use of this. (I played around with piecewise linear models for and in some ring, as this gave six equations in six unknowns so was not overdetermined, but sadly the 906 implication was also immune to this more general model.)
Terence Tao (Nov 27 2024 at 00:48):
I have a partial explanation now of why the implication 906 -> 3862 is hard. One can write 906 as and 3682 as . From 906 we have is surjective, hence invertible on finite magmas. Specializing 906 we have , hence on canceling we have , or equivalently . On the other hand we also have , hence on canceling we have .
Thus we can prove a weaker version of in which we apply to both sides. If right multiplication by squares is invertible then we are done, but there are models for which this is not the case, such as the model mentioned previously. But in any case this helped explain why the models I was trying always seemed to obey 3862.
Terence Tao (Nov 27 2024 at 05:31):
Another immunity that 906 -> 3862 has is to models where squaring is bijective (in particular, this gives immunity to translation invariant models). From we have . Next, if we set , then . From we have , hence . If squaring is bijective, we conclude . Since also , we conclude . Now , which on replacing by and using gives . This gives , which is 3862.
Terence Tao (Nov 27 2024 at 07:11):
In fact the implication 906 -> 3862 is true for finite magmas! From and the invertibility of we see that . Iterating this we get for any . By finiteness, have a joint period, so we also have in particular . In other words, if we have , then also .
From previous discussion we have , so by the above we have , which is 3862.
Douglas McNeil (Nov 27 2024 at 18:52):
One thing which would be good to make clear in the paper is where the finite/infinite differences in behaviour come from, even in the absence of a slick metatheorem that makes everything obvious in retrospect. Simply an overview of the recipes used to demonstrate that there are no finite counterexamples, especially when there are known infinite ones, would be useful exposition.
I mean, we've had late-entry "true for finite magmas" arguments as well as meaningful order 84 counterexamples (!), so the boundary isn't an easy one..
Terence Tao (Nov 27 2024 at 19:58):
I've added this as a comment to equational#924 which is a task to write the (currently only just a stub) section of the paper here.
Matthew Bolan (Nov 28 2024 at 00:25):
1486 does not imply 3862 or 3877!
Douglas McNeil (Nov 28 2024 at 00:30):
@Matthew Bolan : any particular guided magic there, or just patience and good fortune?
Matthew Bolan (Nov 28 2024 at 00:31):
The realization that the sporadic magmas can have many of their entries perturbed without violating 1486 made me try a very long vampire run searching for examples specifically of size 21, in the hope that the already powerful example of this order could be perturbed to something stronger.
Matthew Bolan (Nov 28 2024 at 00:33):
Other than that nothing fancy at all. I'm somewhat sad it violates 3862 as well as 3877, if it had only violated 3877 we would have gotten to use the twisting method to produce a degree counterexample to 3862.
Matthew Bolan (Nov 28 2024 at 01:14):
I think the instructions on finite magma explorer point one to the wrong generate_lean.py, my guess is that I'm supposed to run the one in All4x4Tables and not FinSearch. I also ran into encoding issues and had to temporarily add
encoding='utf-8'
as an argument to all the read/writes in generate_lean.py to get it to run on my machine with my version of python.
Matthew Bolan (Nov 28 2024 at 01:46):
Terence Tao (Nov 28 2024 at 04:27):
Terence Tao (Nov 28 2024 at 04:31):
The order 12 model here comes from two basic models on 879, on and on . The carrier here can be viewed as with an operation of the form for some function . The equation 879 reduces to the functional equation , and the equation 4065 reduces to the functional equation . A little bit of Sudoku let me find an that obeyed 879 but not 4065, which after relabeling led to the above example.
Terence Tao (Nov 28 2024 at 04:34):
Oops, I just realized that this superseded @Amir Livne Bar-on 's ongoing formalization of the greedy argument for 879. Sorry...
Matthew Bolan (Nov 28 2024 at 04:45):
So the idea is a finite translation invariant construction? I'm amazed it's so small! I had run this particular implication through ATP for a while and got nowhere, I wonder how long it will take to find the solution if I have it search order 12.
Terence Tao (Nov 28 2024 at 04:45):
I'm running into the exact same encoding problem as @Matthew Bolan . A Windows thing perhaps? Where do I add the encoding='utf-8'
line to the python code precisely?
Matthew Bolan (Nov 28 2024 at 04:46):
I did it in 4 places, everywhere it calls open()
Terence Tao (Nov 28 2024 at 04:48):
It's partially translation-invariant but not fully (a fully translation-invariant model would have rather than ). This model was arrived at by a very circuitious path from adapting the 1323 type models (initially without the third component with the shift, but I eventually worked out that the model would not work without such a shift).
Matthew Bolan (Nov 28 2024 at 04:52):
Alright, I remain largely unfamiliar with the translation-invariant models, I think they stopped being used around the time I showed up.
I'm very surprised that this thing exists. I can't wait to be finally done grading so that I can take a serious crack at some of these other ones.
Terence Tao (Nov 28 2024 at 04:54):
OK, that seemed to work. Hopefully equational#941 works
Matthew Bolan (Nov 28 2024 at 04:55):
Don't you need to add a line to extra.txt as well?
Matthew Bolan (Nov 28 2024 at 04:57):
Also since you have just resolved a conjecture from the main project I think you should modify Conjectures.lean
Terence Tao (Nov 28 2024 at 04:57):
Ah, forgot to hit "save" :man_facepalming:
Terence Tao (Nov 28 2024 at 05:02):
These sorts of models by the way were also considered by @Ibrahim Tencer (and maybe by others also), though we ended up abandoning those sorts of constructions (perhaps prematurely). In general when one has a base magma one can consider "semi-direct products" (or "extensions by cocycles") with a carrier of the form and with law for some function , where is an abelian group or ring and is some linear magma operation that obeys the required equational law. In order for this "semi-direct product" to obey the law we then have to solve a (linear!) functional equation on (these equations look a little bit like cocycle equations). In retrospect this could be a more broadly applicable technique, it may be worth experimenting with it more.
Matthew Bolan (Nov 28 2024 at 05:48):
I made equational#942 to try and correct the issues we ran into importing these counterexamples.
Amir Livne Bar-on (Nov 28 2024 at 05:49):
Terence Tao said:
Oops, I just realized that this superseded Amir Livne Bar-on 's ongoing formalization of the greedy argument for 879. Sorry...
A finite counter-example (with reasoning behind it!) is much better than an infinite one.
I don't remember if we have finite immunities for all the other un-formalized constructions... I'll run some searches for the older proofs
Matthew Bolan (Nov 28 2024 at 05:49):
We do
Matthew Bolan (Nov 28 2024 at 05:51):
Copying from Terry's message above, I think the remaining open finite implications are
- 677 (dual 2910). 2 implications, or 1 up to equivalence, previously established by the greedy method.
854 (dual 2712). 10 implications, previously established by a complicated greedy method or the unique factorization method.879 (dual 2650). 2 implications, established by a (not yet formalized) greedy method.906 (dual 2647), 2 implications, previously established by the greedy method.1076 (dual 2531), 4 implications, established by a greedy method.1110 (dual 2497), 2 implications, previously established by the confluent method.1486 (dual 2126), 4 implications, previously established by the complete rewriting method.1516 (dual 2091), 2 implications, previously established by a greedy method.1518 (dual 2054), 8 implications, or 1 up to equivalence, previously established by the greedy method.
Matthew Bolan (Nov 28 2024 at 05:53):
The open 1516 implication in the finite case is different from the unformalized implication it still has, but understanding 1516's finite models may well lead to a better understanding of the law.
Amir Livne Bar-on (Nov 28 2024 at 05:56):
I was thinking about the main graph, that is equations 63, 713, 1289, 1323, 1516, 1692, 1729. Some of the greedy constructions were found quite early, before we started collecting "immunities" for the non-implications, so might have a counter-example.
Matthew Bolan (Nov 28 2024 at 06:00):
Right, but in the process of working out the finite graph we have essentially collected all of the immunities.
Amir Livne Bar-on (Nov 28 2024 at 06:01):
Got it, thanks
Ibrahim Tencer (Nov 28 2024 at 08:26):
Some notes on 1518:
I claim that in a finite 1518 model, 817 is equivalent to where is squaring. Proof: 817 says . We already know squaring is a bijection, so if we can substitute for and 817 becomes which is just 359, which we know holds.
For the other direction, 359 says , so also .
So if 817 holds, we get , but then by left-cancellation , so .
Secondly, I've been investigating the finite models of 1518 with some help from Vampire.
There are a few classes of models. One is the linear models , which were used as counterexamples to some implications. There are two subclasses of linear models. In the first, we have (plus some other condition on ), in which case and all of the consequents hold.
In the second subclass, and so if the rightmost variable in an expression is , its value is where is the level of nesting of that variable, i.e. how many times it is multiplied with something. The RHS of 47, 614, and 817 all have a nesting level of 3 and the RHS of 3862 has 2, so they all hold.
We can also generalize the second class to what we can call a piecewise model: for example we take functions and with and , and a subset and define
if
if and
if and
This model will satisfy all four consequents too. (I found this one with Vampire by excluding both and .)
We can generalize this example further: take a partition , bijections , and define
if and
The exact assumptions we need on the 's are a bit complicated, but we can assume that to simplify things. Then we get
for all , and this condition is sufficient to show 1518.
But this also implies that squaring has order 3, as well as , and . 47 will be true because , and generally an expression in will be equal to where is the nesting level, so all the consequents hold.
But it turns out there actually are models where preservation doesn't hold, e.g.
[[2, 1, 4, 5, 0, 3], [2, 3, 4, 5, 0, 1], [2, 3, 4, 1, 0, 5], [2, 3, 4, 5, 0, 1], [2, 5, 4, 3, 0, 1], [2, 3, 4, 5, 0, 1]]
if , if , otherwise.
Here the partition blocks are technically 0, 2, 4, and 135, but 024 acts as a single block when it's on the right.
So far it's unclear to me whether all the consequents will hold in the more general setting.
Ibrahim Tencer (Nov 28 2024 at 08:38):
Come to think of it, this "model" is basically just equivalent to the original problem itself, with when . But maybe it can clarify some things.
Terence Tao (Nov 28 2024 at 15:04):
I think in general the approach of considering "piecewise" models built out of simpler base magmas is a promising one, possibly also for simplifying the three most difficult refutations 1323, 1516, 1729 in the infinite magma case as well.
For 1110, I'm thinking that a piecewise linear model such as
may be promising, where live in some field , live in another field , and is a function satisfying a suitable functional equation. Here the coefficients should be chosen such that and already solve 1110, for instance one choice is to take and where solves the golden ratio equation , e.g., and , or . When verifying 1110 for and the partial translation invariance allows one to reduce to the case and one ends up with a functional equation for that in principle can be solved by linear algebra over the field . In the case this is a linear system in 25 variables (which one can cut down to 6 or so if one assumes dilation symmetry), and for it is 16 variables, but I haven't had time to work out the details (which would lead to counterexamples of order 25 or 16 if all goes well).
One may also look for translation invariant models in groups such as and - any such model which isn't idempotent would resolve 1110->1629. These are a bit too large for brute force search but maybe an ATP could handle it.
Terence Tao (Nov 28 2024 at 15:35):
For 1518, I suspect in analogy with 879 that a model of the form could be successful, where live in some small finite field which has a linear model and is a function obeying a certain linear functional equation, but I won't have time today to try to work out the linear algebra for this.
Matthew Bolan (Nov 28 2024 at 16:05):
Matthew Bolan (Nov 28 2024 at 16:07):
I have enough to automate the process I think, can spin that up later tonight if I can finish other duties.
Terence Tao (Nov 28 2024 at 16:30):
Wow, that was extremely quick! What values for the parameters did you use?
Matthew Bolan (Nov 28 2024 at 16:35):
, , and then I found with some lazy sage code, I think it wound up being
This
but I had to do a little parsing to print that, which maybe I did incorrectly.
Matthew Bolan (Nov 28 2024 at 16:42):
It found a 6 dimensional space of functions satisfying the needed equation. There are some obvious linear dependencies in the system (for example the functional equation is trivial at )
Terence Tao (Nov 28 2024 at 16:48):
That's a reassuringly large space of examples. (Though there are some trivial "coboundary" solutions that may be occupying much of this space.) It gives hope that the method should be useful for many of the other outstanding equations (though I expect it won't work for 854, as the functional equation now has three variables and the linear system is likely to be extremely overdetermined).
Terence Tao (Nov 28 2024 at 16:58):
A side remark: it seems we have stumbled upon some sort of "magma cohomology" that is a variant of group cohomology. I don't think we have enough people with the required background to explore this further for this project, but it might be something to note in the paper for future research. [ADDED: and it would finally provide a use case in our project for Magma, which has been a surprisingly absent software package so far!]
Matthew Bolan (Nov 28 2024 at 16:59):
Yes, I am reminded of the old joke that elementary school arithmetic is arithmetic in twisted by cocycles
Matthew Bolan (Nov 28 2024 at 17:18):
I'd like to say that this construction gives us an extension , but to define the first map I want to have a pointed idempotent with the identity in .
Terence Tao (Nov 28 2024 at 17:22):
It may be better to view the extension as a principal -bundle over , so that one has a translation action of rather than an embedding.
Matthew Bolan (Nov 29 2024 at 04:30):
Ok I think I have it at least somewhat automated. It found an order counterexample to the remaining 1076 implications.
Matthew Bolan (Nov 29 2024 at 04:31):
Not yet run all the others, I made some poor design decisions. With the very large tables we probably need to do better than just giving lean the table.
Matthew Bolan (Nov 29 2024 at 04:32):
Matthew Bolan (Nov 29 2024 at 04:33):
Let me modify it to print , and then I'll try at least the other two variable equations.
Douglas McNeil (Nov 29 2024 at 04:35):
854 does not imply 413.
:smiling_face_with_tear:
Matthew Bolan (Nov 29 2024 at 04:36):
How was that one found?
Matthew Bolan (Nov 29 2024 at 04:39):
Matthew Bolan said:
This is built up from two copies of and has f
equal to this
Matthew Bolan (Nov 29 2024 at 04:41):
1516 does not imply 1489, with a degree counterexample.
Matthew Bolan (Nov 29 2024 at 04:42):
Matthew Bolan (Nov 29 2024 at 04:46):
677 and 1518 seem more resilient than the others, and the fact 854 is three variables means I'll have to refactor the code to try it, but it also means the method is less likely to work.
Douglas McNeil (Nov 29 2024 at 04:47):
I started with one of the 854 with 4 10 violations that I grew back when we thought they'd be easy to find and that having a bunch of 10 violations would be helpful. Then I threw away different fractions of the cells, and grew it up to 19, and lucked out.
This brings me up to 3 useful finite magma in total over the course of the project, or about as many as you've found in the last few minutes. :joy_cat: But I'll take what I can get! :-)
Douglas McNeil (Nov 29 2024 at 05:14):
And I think you're right about large tables: we still have Terence's order 84 counterexample we squashed down to 50. More cleverness might equal more compression, but ideally Lean shouldn't be taking 10+ seconds.
Matthew Bolan (Nov 29 2024 at 05:18):
you can definitely PR yours though.
Douglas McNeil (Nov 29 2024 at 05:31):
854 does not imply 3316 or 3925.
Matthew Bolan (Nov 29 2024 at 06:03):
Quitting for the night. I could not get Terry's idea of taking to be the direct sum of a linear magma and the order 3 magma he mentions to work for 1518, but bugs are always a possibility. I also tried a few small magmas for which have been used in the past to falsify implications from 677 or 1518, with no luck. Most of my search was in the case where is a linear magma of prime order < 50, and I haven't tried fields which aren't of prime order yet.
Jose Brox (Nov 29 2024 at 06:46):
Which implications remain open now?
Terence Tao (Nov 29 2024 at 06:48):
@Matthew Bolan is keeping score at https://leanprover.zulipchat.com/#narrow/channel/458659-Equational/topic/Austin.20pairs/near/484845205 . Two implications associated to 677, eight associated to 1518, and now just four for 854 (previously ten). For some reason 854 has a tradition of being taken out in chunks, while all the other equations seem to be all-or-nothing affairs...
Matthew Bolan (Nov 29 2024 at 06:49):
Well, 1055 implies 1045, so what remains of 854 is an all or nothing affair
Vlad Tsyrklevich (Nov 29 2024 at 06:51):
Wow! A flood of progress in the past few days.
Matthew Bolan (Nov 29 2024 at 06:53):
Is the order 169 example a record? I seem to remember some large linear magmas in the Higman-Neumann project.
Vlad Tsyrklevich (Nov 29 2024 at 06:54):
I don't think there has been anything over order 100 there or anywhere else.I stand corrected.
Matthew Bolan (Nov 29 2024 at 06:54):
Jose Brox (Nov 29 2024 at 06:57):
Terence Tao ha dicho:
@Matthew Bolan is keeping score at https://leanprover.zulipchat.com/#narrow/channel/458659-Equational/topic/Austin.20pairs/near/484845205 . Two implications associated to 677,
In the Equation Explorer (finite graph option), for 677 I only see 1 implication remaining, to 255. Similarly, for 854, once we remove 3316 and 3925, I only see 3 implications remaining, to 413, 1045 and 1055.
Vlad Tsyrklevich (Nov 29 2024 at 06:58):
These are all dual reduced. e.g. 677 and the implication by its dual
Matthew Bolan (Nov 29 2024 at 06:58):
Also, 854 -> 413 has been resolved (By Douglas, just above).
Matthew Bolan (Nov 29 2024 at 07:02):
The new magmas still need to be put into lean, but I refrained from just making the usual PR because of the size of the order 169 and 49 examples (though Douglas's examples are small enough they definitely can just be added). Luckily the magmas still have a simple description for now, but probably this is what should be formalized.
In general, what is our threshhold for adding a finite magma directly?
Vlad Tsyrklevich (Nov 29 2024 at 07:06):
I'm not sure, I think if it compiles then it's fine to upstream. If it would require special handling (e.g. increasing the maxHeartbeats required for Lean to finish), then perhaps it's worth looking for smaller alternatives, though that seems fine to leave as a conjecture.
Vlad Tsyrklevich (Nov 29 2024 at 07:30):
Douglas McNeil said:
I started with one of the 854 with 4 10 violations that I grew
I wasn't following along back then, what method did you use to do this? I guess to be specific, what software and how did you use it?
Jose Brox (Nov 29 2024 at 07:38):
Terence Tao ha dicho:
and it would finally provide a use case in our project for Magma
(Just for the record, I have been using Sage for computing linear and *-linear models).
Douglas McNeil (Nov 29 2024 at 12:15):
I wasn't following along back then, what method did you use to do this? I guess to be specific, what software and how did you use it?
So IIRC we had the order 12 magma that was 854+1 10 viol (sorry, can't remember who to credit!) with a reduced order 11 version. I wrote manual Python code to perturb the 854 magma, changing and deleting values, and sometimes imposing the necessary fixes myself, but the main driver was feeding it to kissat as an SAT problem where only the new values were free and I asked it to extend it in an 854-compatible way. This eventually got me the second 10 violation, and ultimately an indefinite number of them, but didn't seem to be getting me any closer to a 413 violation.
What I did yesterday was start with my 854 with 4 10 violations magma, generate vampire code fixing some fraction of the values taken from that magma, and use the same vampire -sa fmb -fmbss {size}
command that MB had found so fruitful.
Side note: I guess I've had my own version of the bitter lesson. At the time, when I was hoping to rule out 413 violations, I also wrote some unsuccessful C++ code to start with every possible primitive 413 violation and grow every possible 854 magma, hoping they'd all fail. It managed to eliminate a lot of them, but there were 25% that were sticky. I also had my own equivalent of discovering how important the mace4 parameters were, because the code's performance changed by orders of magnitude depending on the strategy I used to pick the next empty cell to try to force.
But once the infinite counterexample was found it showed my approach wouldn't work like I wanted. The bitter lesson is this: doing the same "rule out every 413-violating seed as compatible with 854" but doing it using vampire instead of custom code was not only much easier, it managed to rule out all but 4 or 5 in no time. :man_shrugging:
Vlad Tsyrklevich (Nov 29 2024 at 12:32):
And since I haven't tried to do something like this myself and I'm curious, what does feeding a partially completed magma to vampire look like? Like, what is the TPTP?
Douglas McNeil (Nov 29 2024 at 12:37):
Pretty much what you'd guess, TBH. Here's what I used:
TPTP code
I use the same toolset to generate vampire & mace4 code, and I can't remember which of them needs the distinctness condition and which one doesn't, so I always toss it in. (Got my hopes up once when searching for a version of something, only to discover that the matrix was large but still trivial..)
Matthew Bolan (Nov 29 2024 at 13:30):
Douglas, do you mind if I make the PR to add your 854 magmas?
Douglas McNeil (Nov 29 2024 at 13:32):
No, please go ahead-- the last time I did it I got the order of ops wrong, and if memory serves you just had to sort out the process the other day anyhow so if you're up for it I'm sure it'll be more efficient. Thanks!
Matthew Bolan (Nov 29 2024 at 13:48):
Ok, its at equational#951
Terence Tao (Nov 29 2024 at 15:54):
I agree with Vlad's criterion (compiles in Lean without extra heartbeats or other parameter changes); there are edge cases where one could argue that the criterion is too restrictive or too permissions, but it is at least a clear rule.
For the order N^2 examples with N=7,13, the structure of the cohomological construction is such that the law can be verified with O(N^2) calculations rather than O(N^4) (basically, one can set the first coordinates of x,y equal to zero for free, due to the partial translation symmetry). So it should be possible to verify them in Lean within the heartbeat limit, after one sets up some API for magma extensions and a criterion for verification, which can then be applied to the three examples we have so far (plus the small order 12 example, though that one already made it into the Lean codebase). [EDIT: I wrote a blueprint chapter to try to formalize the cohomology construction in more abstract language, which could potentially be helpful for such an API.]
@Matthew Bolan : it's a shame that none of the outstanding implications for 1518 fell to the product choice for G. Just as a sanity check, does your code recover the order 12 example for 879, which uses this construction?
Matthew Bolan (Nov 29 2024 at 16:00):
Yes.
Matthew Bolan (Nov 29 2024 at 16:00):
At least, it finds this one, unsure if this is exactly the same as yours
Terence Tao (Nov 29 2024 at 16:48):
Well, there's a whole linear space of examples (minus a subspace that accidentally obeys 4065), so probably not identical, but still, that is confirmation that 1518 has some real resistance and this isn't just a coding artefact. It surprises me slightly - I view 1518 and 879 as being quite similar in structure - but perhaps sometimes the linear algebra just doesn't work out. (Cohomology is a famously subtle subject.)
Terence Tao (Nov 29 2024 at 17:18):
For 1518, one can also take to be a right-absorptive model , so the model is now of the form . Then I think the cocycle equation is and one could take to be the free abelian group generated by the subject to these relations. One could then try setting to the various models described by @Ibrahim Tencer and hope for a hit.
Matthew Bolan (Nov 29 2024 at 17:23):
I think you may have swapped the roles of compared to the blueprint, is that right? I think my code already checks right-absorptive models for as they are linear, and I've already tried a few of Ibrahim's models for .
Matthew Bolan (Nov 29 2024 at 17:24):
I intend to eventually support all finitely generated abelian groups for , but currently I cannot as sage does not seem to have good support for linear algebra in these groups, so I'll have to roll my own.
Terence Tao (Nov 29 2024 at 17:31):
Sorry for the confusion. Yes, in the blueprint I swapped the notation to conform to more standard group cohomology language; I'll try to use those conventions more consistently going forward.
Matthew Bolan (Nov 29 2024 at 17:55):
By the way, I don't see an obstacle to taking the multiplication on to be , but I'm not at home to test it anymore
Terence Tao (Nov 29 2024 at 18:37):
Yes, one could consider this more general model, though I think it doesn't add much generality (one can largely absorb the into the cocycle .
I now understand why 1518 is harder than 879: as @Ibrahim Tencer already observed the non-shift linear models for 1518 are idempotent, whereas for 879 they are non-idempotent. This is relevant because all the outstanding implications are 1-variable equations so without loss of generality we can restrict ourselves to magmas generated by one generator, which means that the base magma can also be taken to be generated by one generator. But idempotent magmas then become useless because every element only generates itself. So we really only have one useful base magma for 1518 right now, namely the shift on ; multiplying this base with other linear models doesn't accomplish anything. This indicates that we should somehow look for more candidate base magmas that are non-idempotent and generated by a single element than just the shift. So presumably Ibrahim's piecewise models are a place to start...
Pace Nielsen (Nov 29 2024 at 18:56):
I don't know if this is helpful or not, but working with Equation 677, the solutions over a generic field to a linear model come in two flavors. Type 1: is a primitive 5th root of unity (i.e. a root of ) and . Type 2: is a root of and .
Terence Tao (Nov 29 2024 at 18:59):
Hmm, for 1518, if one assumes all of the unproved consequents 47, 614, 817, 3862, I think the only nontrivial model generated by a single generator is the cyclic shift model on (up to isomorphism), which means that we're stuck with that as our only option for base model unfortunately.
Terence Tao (Nov 29 2024 at 19:11):
@Pace Nielsen Now that I see the importance of non-idempotence for the base models, I would say that the Type 2 models you listed would be more useful for the base than the Type 1 models. As for the range... I think there is some advantage to having the range be similar to the base in some ways. Otherwise one would expect to encounter "Goursat lemma" or "Schur-Zassenhaus" type obstructions to having interesting extensions.
Matthew Bolan (Nov 29 2024 at 19:54):
Small observation, but due to the fact 1518 -> 359, we have 1518 -> 47 if and only if 1518 -> 614. Prover 9 also tells me if 1518 -> 3862, then all 4 implications are true.
Pace Nielsen (Nov 29 2024 at 20:04):
@Terence Tao Although you already worked out that the cyclic shift model is the only nontrivial base model of interest for 1518, I figured it wouldn't hurt to work it out a slightly different way.
The linear models where (with in some ring) have coefficients satisfying
In an integral domain, either and is cube root of unity (giving the cyclic shift model [multiplicatively], as well as the trivial "choose the right entry" model), or and is a root of (giving an idempotent model). By the Chinese remainder theorem, we know that the generic case is essentially the direct product of these two cases.
Changing to the slightly more complicated model leads to the additional relations
So, in an integral domain, either (and nothing changes) or we are back in the shift model, but with an extra arbitrary additive shift. But that is just isomorphic (as you mentioned) to the original shift model.
Jose Brox (Nov 29 2024 at 21:09):
Terence Tao ha dicho:
3682
(It actually is 3862, I was going mad because 3682 is just and is not implied by , while 1518 is :big_smile:)
Douglas McNeil (Nov 30 2024 at 12:16):
854 does not imply 1045 or 1055: I think my work here is done. :joy_cat:
Ibrahim Tencer (Nov 30 2024 at 12:39):
As for why 1518 tends to imply that (equivalent to 817): The main obstacle is that for many elements we tend to have . So let's say that if this occurs. Then, if , , and , from 1518 we get .
Since it's enough to show that () and . Notice that 359 says , 3862 says , and 4065 (another known consequence of 1518) says . So somehow we are trying to flip 359 and 4065 around, but the relation is not symmetric in general (as in the above piecewise magma).
The equation (call it the "elevation law" for short -- not sure how to check what the number of this one is) holds in all the model classes above. And it's extremely fast for Vampire to check, I verified that it holds up to order 37 before apparently running out of memory. Notice that it has 5 operations so it's not one of the original list. What's more, if two laws out of {47, 817, } hold (in the presence of 359), then the third does too.
It also suffices that and that S preserves <. And S preserves < if S is a homomorphism. S is a homomorphism in the linear models but it may or may not be in the piecewise models.
We can also use elevation to show that if 47 and 817 hold in a 1518 model then so does 3862:
Suppose , then .
So if 1518 implies 47 and 817 it also implies 3862. So 47 and 817 are sufficient to show that the magma generated by a single element is just with the shift law.
Matthew Bolan said:
Small observation, but due to the fact 1518 -> 359, we have 1518 -> 47 if and only if 1518 -> 614. Prover 9 also tells me if 1518 -> 3862, then all 4 implications are true.
Interesting, did it produce a human-readable proof? This would mean showing 47+817 and showing 3862 are equivalent.
Ibrahim Tencer (Nov 30 2024 at 13:25):
In a 1518 magma for any we have . If we have the elevation law () then . So by 359, , and canceling we get , but then by elevation. So actually the elevation law implies 817 in any *finite 1518 model, hence it also implies 47, which means it implies 3862.
Ibrahim Tencer (Nov 30 2024 at 13:43):
(before I had said any 1518 model but we do need left-cancellation, so it will hold in finite models in particular)
Vlad Tsyrklevich (Nov 30 2024 at 15:19):
Douglas McNeil said:
854 does not imply 1045 or 1055
I have created equational#957
Terence Tao (Nov 30 2024 at 15:35):
Douglas McNeil said:
854 does not imply 1045 or 1055: I think my work here is done. :joy_cat:
Congrats! Can you share a bit about how you found this example?
I think now one can safely comment out Equation854.lean
from ManuallyProven.lean
as the various infinitary arguments previously used to establish refutations from 854 are now superseded by the finite ones, which may shave a few seconds off the Lean compilation. I'm going to try this now to test the theory.
Terence Tao (Nov 30 2024 at 15:38):
@Ibrahim Tencer based on your analysis, I wonder if one could ask an ATP whether the refutations from 1518 remain open if one additionally assumes that (a) and are bijections (this is automatic from finiteness, but one may need to give it to the ATP explicitly), (b) is a homomorphism; (c) ; and (d) existence of a homomorphism to the cyclic shift. If we can assume all these things without accidentally getting the implications, this may help guide us to building a counterexample (maybe with more ATP calculations).
Mario Carneiro (Nov 30 2024 at 15:43):
Terence Tao said:
I think now one can safely comment out
Equation854.lean
fromManuallyProven.lean
as the various infinitary arguments previously used to establish refutations from 854 are now superseded by the finite ones, which may shave a few seconds off the Lean compilation. I'm going to try this now to test the theory.
my general intuition is the opposite, large finite calculations are harder for lean than small seeds plus abstract nonsense
Douglas McNeil (Nov 30 2024 at 15:48):
@Terence Tao : Same technique as worked for my other recent 854 refutations: start with a magma with a lot of 10 violations, and in this case I used the the 413 refutation as a seed (so the 1045/55 refutation actually handles the 413 as well, I didn't lose it). Perform a bunch of runs removing random fractions of that magma, in the hopes that we keep the refutation-promoting parts while losing the portions of the magma which might be constraining us too tightly and either force or strongly coerce us into respecting the law we're trying to violate. Probably this could be done more cleverly, but since I don't know exactly what's causing what, might as well do it randomly and let the automation handle it (a "sweet" approach as opposed to a "bitter" one, I guess. :stuck_out_tongue_wink:)
Also letting the magma grow a few orders is likely important, it feels like the additional degrees of freedom give the construction more room to breathe, without requiring you to have hit exactly the right arrangement to begin with.
Basically it all comes down to something you said (which I might even be misremembering!) about needing not one but two 10 violations to show something or other 854-related, which led me to want to have a lot of them around.
Also, I looked it up, and if I'm reading right it was @Daniel Weber who found the original order 12 854 magma with the 10 violation, and everything I've found (even this order 24 one!) is ultimately downstream of that magma, however disguised it might seem.
Terence Tao (Nov 30 2024 at 16:29):
Mario Carneiro said:
Terence Tao said:
I think now one can safely comment out
Equation854.lean
fromManuallyProven.lean
as the various infinitary arguments previously used to establish refutations from 854 are now superseded by the finite ones, which may shave a few seconds off the Lean compilation. I'm going to try this now to test the theory.my general intuition is the opposite, large finite calculations are harder for lean than small seeds plus abstract nonsense
For head-to-head comparisons, I would agree. Though currently, because the equational theories project is interested in completing both the infinite magma graph and the finite magma graph, we are going to need the finite counterexamples anyway, so commenting out the infinitary arguments seems like a no-brainer to me. (Also, it is possible with some further optimization of the finite counterexample planner that some of the new finite examples can be used to remove some of the older ones, so the net performance cost may eventually go down slightly.) Though I see that the CI for updating the Lean codebase is now pretty fast (for my most recent commit it only took 1m 13s for the Lean component of the CI, which is now only a very small fraction of the full CI) - it seems that there is a lot of useful caching going on...
Mario Carneiro (Nov 30 2024 at 16:30):
of course, in this situation where you want both anyway, it's better not to do the infinite one
Mario Carneiro (Nov 30 2024 at 16:30):
although that does make me sad since it's a nice human proof
Mario Carneiro (Nov 30 2024 at 16:30):
which we are replacing with something that seems not to have a great human explanation
Terence Tao (Nov 30 2024 at 16:31):
It's still in the blueprint and in the Lean directories, just no longer imported into the optimized final theorem.
Mario Carneiro (Nov 30 2024 at 16:31):
maybe there should be another entrypoint for the "bonus slides"
Mario Carneiro (Nov 30 2024 at 16:32):
you probably don't want to have lean code which is not typechecked, it can rot if you aren't validating it periodically
Terence Tao (Nov 30 2024 at 16:35):
Hmm. Fair point. Not sure what the best solution is here. We could indeed have a ExtraStuff.lean
type file that imports all the extra stuff (there are some other side results that could be moved into an extras
folder) and have a separate CI check for that (maybe run less often than the main CI).
Mario Carneiro (Nov 30 2024 at 16:36):
my suggestion is to just have ExtraStuff
and check both in CI. I doubt it will be a statistically significant fraction of total CI time
Mario Carneiro (Nov 30 2024 at 16:36):
you can revisit if something heavy ends up in ExtraStuff
Mario Carneiro (Nov 30 2024 at 16:36):
but it's probably not worth the CI complexity right now
Vlad Tsyrklevich (Nov 30 2024 at 16:36):
Based on setting a higher maxHeartbeats, the worst offenders are equational_theories/Generated/Greedy/*
, equational_theories/ManuallyProved/Equation1701.lean
and equational_theories/Obelix.lean
. The latter two are proofs without finite counterexamples, perhaps there are Greedy examples that are not? Those are auto-generated and there are multiple of them, so it seems less sad to remove one or a few of them.
Mario Carneiro (Nov 30 2024 at 16:38):
My guess is that the autogenerated stuff is actually a CI win to remove in favor of the finite counterexample
Mario Carneiro (Nov 30 2024 at 16:39):
those translated vampire proofs are surprisingly slow to check
Terence Tao (Nov 30 2024 at 16:39):
An alternative is to not move or comment out anything (in particular restoring the import Equation854.lean
) but once we complete the formalization of both graphs, work on creating a Final.lean
file that only imports what is minimally needed to prove the final theorem.
Vlad Tsyrklevich (Nov 30 2024 at 16:41):
Among the greedy proofs, the only ones I see that immediately stand out that I know have a finite counerexample is 1112=>1629
, and 677=>255 is still unknown. So not much to gain there.
Terence Tao (Nov 30 2024 at 16:43):
My most recent PR did show that commenting out Equation854.lean
did not affect the final implication graph. But I'm now leaning towards restoring it so that it stays checked by Lean, even if this adds a few seconds back to the CI.
Douglas McNeil (Nov 30 2024 at 16:44):
Looking in from the outside, commenting out parts of the code which validate important parts of the paper seems like a misstep unless it's a major win for some tactical or strategic reason. I mean, nobody's actually interested in a random collection of integers.
Terence Tao (Nov 30 2024 at 16:45):
To be fair, we have millions of (anti-)implications that are coming from such random sets of integers already, the manually proven ones are actually just a tiny fraction of the full graph.
Terence Tao (Nov 30 2024 at 16:49):
OK. I think what I'll do is restore the import, but add a comment to ManuallyProven.lean
that the import is technically redundant for the purposes of establishing the implication graphs.
Pace Nielsen (Nov 30 2024 at 17:00):
Running
fof(eq1518, axiom, X = mul(mul(Y, Y), mul(X, mul(Y, X)))).
fof(leftcanc, axiom, mul(X, Y) = mul(X, Z) => Y = Z).
fof(leftmultsurj, axiom, ( ! [Y, Z] : ( ? [X] : (mul(Y, X) = Z) ) )).
fof(squaringsurj, axiom, (! [X] : ( ? [Y] : (mul(Y, Y) = X)))).
fof(squaring, axiom, s(X) = mul(X, X)).
fof(sixthpower, axiom, X = s(s(s(s(s(s(X))))))).
fof(squaringhom, axiom, s(mul(X, Y)) = mul(s(X), s(Y))).
fof(extrahom, axiom, f(mul(X, Y)) = mul(f(X), f(Y))).
fof(submagma, axiom, (? [X] : ( X = mul(X, mul(X, mul(X, X))) & (! [Y] : (f(Y) = X | f(Y) = mul(X, X) | f(Y) = mul(X,mul(X, X)))) ))).
fof(eq47, conjecture, X = mul(X, mul(X, mul(X, X)))).
in Vampire for 999s timed out.
Douglas McNeil (Nov 30 2024 at 17:03):
I'd say in those cases we're only caring about the _results_, even if there might be interesting stuff lurking there in the same way that MB and BLF have been analyzing 1486 that we just didn't have time to look at more closely. The infinite 854 technology is much more interesting, so it should stay in the validation circle.
Reminds me of ZAK's argument about the risks of automating too early. It's basically an accident that we didn't hit more finite 854 refutations earlier, and then the only thing that we might have thought strange about 854 is that the smallest counterexamples we found were unusually large. Maybe that would've sparked the study of the free magma that actually happened, but maybe not..
Terence Tao (Nov 30 2024 at 17:07):
Good points. I'll make a note of mentioning this in the (currently unwritten) conclusions portion of the paper.
Terence Tao (Nov 30 2024 at 17:08):
@Ibrahim Tencer For future reference, here is how one can find the equation number for :
PS C:\Users\teort\Dropbox\Lean\equational_theories\scripts> py find_equation_id.py "x*(x*x)=(x*x)*(x*x)"
The equation 'x*(x*x)=(x*x)*(x*x)' is Equation 55561: x ◇ (x ◇ x) = (x ◇ x) ◇ (x ◇ x)
Pace Nielsen (Nov 30 2024 at 17:50):
In line with what @Ibrahim Tencer and @Matthew Bolan have shown, Vampire proves that all 4 open implications involving 1518 are individually equivalent to each other, as well as to getting , subject to left cancellation. So, in particular, they all equivalent to each other for finite models. Vampire also verifies two implicit facts that come from an earlier computation of @Terence Tao namely that surjectivity of the squaring map yields surjectivity of left multiplication, and that left cancellation implies injectivity of the squaring map.
Douglas McNeil (Nov 30 2024 at 18:07):
Am I reading right? Does that settle all the 1518s, meaning unless there's a bug in some unformalized argument, 677-255 is the final boss?
Pace Nielsen (Nov 30 2024 at 18:09):
No, what it does is say that that there are (up to equivalence) just two bosses left.
Douglas McNeil (Nov 30 2024 at 18:09):
Ah, got you.
Matthew Bolan (Nov 30 2024 at 18:23):
@Ibrahim Tencer Vampire being able to check the elevation law holds in models of size less than 37 is shocking to me, and I'm having trouble replicating it. Do you mean all models or just ones of the types you list? What settings/TPTP did you run with?
Terence Tao (Nov 30 2024 at 18:37):
Have we ruled out translation-invariant models for 1518 yet? I am getting hints that there may be a translation-invariant counterexample on for some finite abelian group (in which the squaring map takes the form , so but . Note that squaring is automatically a homomorphism in these models, and I also expecting a homomorphism to the cyclic shift with these models.
Pace Nielsen (Nov 30 2024 at 18:56):
Putting 1518 with left cancellativity, and with a homomorphism, Vampire times out when trying to refute a translation-invariant approach. So you are good to keep trying this avenue!
Pace Nielsen (Nov 30 2024 at 19:00):
If the cyclic shift example is a submagma, Vampire finds problems with a pure translation-invariant approach. Your proposed examples don't have that issue.
Pace Nielsen (Nov 30 2024 at 19:06):
(deleted)
Ibrahim Tencer (Nov 30 2024 at 20:10):
Matthew Bolan said:
Ibrahim Tencer Vampire being able to check the elevation law holds in models of size less than 37 is shocking to me, and I'm having trouble replicating it. Do you mean all models or just ones of the types you list? What settings/TPTP did you run with?
Hmm, I've been reusing the same TPTP file so I would have to go back and check. I probably excluded linear models, possibly the preservation-piecewise models too. It's possible that I included something that I shouldn't have.
@Terence Tao I'm not sure this can work. If squaring is a homomorphism and then we can quotient out by the equivalence generated by for some specific . So for each divisor d of n this will give a model where . But, it's not too hard to show that if then actually , so 6 couldn't have been the order of S in the original model. This suggests trying S with order 9 or at least some odd order.
Terence Tao (Nov 30 2024 at 21:23):
I dont see why would force for any x
Ibrahim Tencer (Nov 30 2024 at 22:23):
Hmm never mind, that way of quotienting doesn't really make sense.
Ben Gunby-Mann (Dec 01 2024 at 04:34):
Pace Nielsen said:
I don't know if this is helpful or not, but working with Equation 677, the solutions over a generic field to a linear model come in two flavors. Type 1: is a primitive 5th root of unity (i.e. a root of ) and . Type 2: is a root of and .
Possibly worth noting is that in the Type 2 solutions, is a primitive cube root of unity (as when , we have ). (Furthermore but less interestingly, if is a primitive cube root of unity, then satisfies the quadratic ; this gives the two solutions for each possible .)
Interestingly, any linear model with a primitive cube root of unity satisfies 255, regardless of . (The necessary equation factors as .)
Terence Tao (Dec 01 2024 at 15:59):
I'm now thinking that a variant of the 503 construction should be able to refute the 1518 implications, though the example may end up being rather large (larger than the 84-element 503 example).
The equation 1518 is . Replacing by and then cancelling , one can rewrite this equation as . If we use to denote the relation (think of this as "labeling" the directed edge by , we can interpret this law as follows: given a labeled 2-path of the form
we have , or equivalently . To put it another way, if we define a good cycle to be a cycle
in the magma with the property that for all (extending indices periodically by ), then (by the bijectivity of left multiplication and finiteness) every directed edge extends uniquely to a good cycle, and if the edge can be labeled by , then the edge must be labeled by (in fact this is an if and only if), and the rest of the cycle also can be similarly labeled as
If we do not have , then we can keep iterating and get more labels. For instance, if we had and , then the label
on the first edge of the good cycle iterates to
So one can hope to build a 1518 magma by first designating some cycles in the magma as "good cycles", and then trying to label the edges of these good cycles with elements of the graph consistent with the above law (as well as the fact that given a good cycle , one must have as part of another good cycle), in order to generate the full multiplication table.
At first glance this looks like a terrible strategy, because we need every vertex to have exactly one outgoing edge for a given , and this looks like a very severe constraint. However, if the magma has a lot of idempotents, then the task is much easier, due to the following observation. Suppose that one has a partially defined magma operation such that is always defined, and that is defined if or are not idempotent ( or ); so we permit to be undefined when , are both idempotent. We also impose left-injectivity, as well as the partial 1518 law that if is defined, then is defined and equal to (in terms of graphs, this means that if we have where is part of a good cycle in the sense that , then we can propagate the label to ). Then we can complete the partial 1518 operation to a complete 1518 operation by setting whenever was undefined (in @Ibrahim Tencer's notation, we force the elevation relation ).
The upshot of this is that if we find a partial 1518 operation that is total for non-idempotents, we can extend the table to the idempotents as well. So one just needs enough good cycles and labels on the good cycles to cover the non-idempotents while respecting axioms such as left-injectivity, and then we can solve the problem. With some pen and paper I think I found a seed involving twelve non-idempotents, generated by two generators of order in the sense that and ; writing and , the initial seed takes the form of good cycles
for and some idempotents . One has to keep extending this seed with additional cycles to maintain all the required laws, for instance needs to be extended to a good cycle, but I think I can do this by a sort of greedy algorithm (adding fresh idempotents whenever needed) that terminates in finite time. It got too hard to do this with pen and paper though. I may try to write this strategy more formally in a blueprint chapter later today.
Matthew Bolan (Dec 01 2024 at 16:14):
I think I can program this up for you if needed.
Terence Tao (Dec 01 2024 at 16:15):
OK, I will try to write up more detailed notes later today to share on this, and hopefully they will be enough to set up some variant of a seed-finding algorithm.
Matthew Bolan (Dec 01 2024 at 16:32):
Possible that I've typoed something, but I think my existing tools are already telling me that
============
1 2 3 4 5 1
1 1 0 2 _ _
3 4 2 _ 0 _
4 5 _ 3 _ 0
1 _ 0 2 4 _
1 _ 0 2 _ 5
============
is a partial 1518 operation that is total for non-idempotents and violates 47 (In this case the only non-idempotent is 0). I guess I'll try and implement the extension procedure and see if this breaks somehow.
Terence Tao (Dec 01 2024 at 16:33):
Hmm, this will have to break somehow because squaring is necessarily a bijection on 1518 magmas, but it will be instructive to see where the breakage is actually happening.
Matthew Bolan (Dec 01 2024 at 16:39):
I imposed that if and are defined, then , should I have imposed some stronger form of 1518? I also can impose auxiliary laws as for any other seed, perhaps squaring being a bijection is an obvious one to include.
Matthew Bolan (Dec 01 2024 at 16:41):
Actually, I think I may have typoed something, one moment.
Matthew Bolan (Dec 01 2024 at 16:43):
No, it seems fine. Some testing code didn't run, but the testing code still says the table is fine.
Terence Tao (Dec 01 2024 at 16:44):
I think the laws needed are (a) is always defined and is a bijection; (b) If is undefined, then are both idempotents; (c) If then ; (d) If is defined then is defined and equal to (this is a rearrangement of of 1518). I'm writing up notes now, but it may take an hour or so before I am done, so take the above rules with a grain of salt before then.
Terence Tao (Dec 01 2024 at 18:21):
Ugh, there are significant technical difficulties in my entire approach - making most of the magma idempotents actually seems to create a lot of inconsistencies that don't seem easy to fix. Best to just ignore my previous proposal for now.
Matthew Bolan (Dec 01 2024 at 21:43):
I've cleaned up my cohomology script and in particular can allow linear models on general finite fields for now. No real new results to report, except that the order of the 1110 counterexample can be reduced from 25 to 16 (as Terry already anticipated). I had a mistake in the code for specifically this reduction - this order 16 model is only a counterexample to 4435 and not 1629. For 1629 we still need to use the original order 25 example.
Matthew Bolan (Dec 01 2024 at 23:59):
1516 -> 1489 counterexample can be reduced from size 49 to size 35
Douglas McNeil (Dec 02 2024 at 00:01):
That's below the limit Vlad guessed would be fine without heartbeat changes!
Matthew Bolan (Dec 02 2024 at 00:02):
The counterexample to 1076 -> 2294, 4435 can be reduced from order 169 to 65, but that's not small enough to link in zulip.
Matthew Bolan (Dec 02 2024 at 00:06):
Matthew Bolan (Dec 02 2024 at 00:13):
I can get the counterexample to 503 -> 4065, 3862 down from 50 to 45
Matthew Bolan (Dec 02 2024 at 00:15):
Matthew Bolan (Dec 02 2024 at 00:15):
45 might still be too large, but if someone formalizes the cohomology method this provides an avenue to resolving https://github.com/teorth/equational_theories/issues/953
Terence Tao (Dec 02 2024 at 00:16):
What procedure did you use to shrink the 503 example?
Matthew Bolan (Dec 02 2024 at 00:17):
That's just the cohomology construction, unrelated to your original example.
Matthew Bolan (Dec 02 2024 at 00:17):
Code says
G is (0)x+(2)y in Finite Field of size 5, M is (z2 + 1)x+(2*z2)y in Finite Field in z2 of size 3^2.
Zoltan A. Kocsis (Z.A.K.) (Dec 02 2024 at 00:19):
Wow!
I must say, in hindsight I'm glad I spent a bit of extra time on optimizing the FME back in October, these are much larger than anything I expected to ever show up.
Matthew Bolan (Dec 02 2024 at 00:24):
Those were the result of trying linear magmas on finite fields. I might now try to choose some select non-linear models for 1076 (and maybe 503) to try and get that 65 down a little more.
Matthew Bolan (Dec 02 2024 at 00:26):
I'd quite like to allow to be any finite abelian group, but I feel like this won't gain too much and in order to do it I need to change how the linear algebra is currently done.
Terence Tao (Dec 02 2024 at 00:29):
I think if is the direct product of two factor magmas , then in order for the cohomological construction to produce a counterexample for , it must have already done so for one of the two components (otherwise the cocycle equation for the consequent would be obeyed for both components of the cocycle, and hence for the full cocycle). In practice I think this means the finite field choices for already suffice.
Matthew Bolan (Dec 02 2024 at 00:34):
I agree, but being the integers modulo a prime power could maybe offer something, and the fact is a direct product of abelian groups doesn't mean its a direct product of factor magmas (finite fields are already an example of this).
Terence Tao (Dec 02 2024 at 00:36):
Ah, right, we have some "non-semisimple" linear models as well. Yes, in principle these could give some new constructions.
Terence Tao (Dec 02 2024 at 00:40):
Though I think a model formed by a -extension, say, could also be viewed as a tower of two -extensions. The first extension might not necessarily produce a direct counterexample, but could make a sufficiently novel and interesting new magma that when plugged in to the cohomology machine as a new base model, it can create a counterexample upon applying a second extension. So if your software can detect the existence of non-trivial extensions by finite fields (ones where the extension isn't a coboundary), these might be useful base models for a second extension which may capture at least some of the cyclic group of prime power order examples.
Terence Tao (Dec 02 2024 at 00:42):
It's hard to imagine getting a magma of order <45 out of this though, at least if we don't have a useful linear model to play with
Matthew Bolan (Dec 02 2024 at 00:42):
That makes sense, after all is a central extension of . Currently my code just computes for both laws, but I can definitely compute the actual without much trouble.
Terence Tao (Dec 02 2024 at 00:46):
In analogy with the Schur-Zassenhaus theorem, I expect the cohomology to be trivial when and are "unrelated" (in the group theory setting, this means that and have coprime order, but I don't know if that's the right condition in the magma setting). So it's interesting that you got an order 65 example and an order 45 example, it suggests that a naive version of Schur-Zassenhaus is not true for magmas.
Terence Tao (Dec 02 2024 at 00:47):
Maybe that's not so surprising. Most of group theory is also not true for magmas.
Matthew Bolan (Dec 02 2024 at 00:48):
I also noticed that didn't change between my original models for 1516 and 1076 and these new ones.
Matthew Bolan (Dec 02 2024 at 00:57):
Hmm, actually that's a little wrong for the 1516 examples I shared. The 1516 example switched from to for (though had I copy pasted slightly different options would have stayed the same). There are also examples for 1516 without a magma modulo 7 for , such as one of order 45 where
G is (1)x+(2*z2)y in Finite Field in z2 of size 3^2, M is (3)x+(3)y in Finite Field of size 5.
Matthew Bolan (Dec 02 2024 at 01:23):
A bit of a silly question, but does 1076 even have any known small (order < 13) non-linear models?
Matthew Bolan (Dec 02 2024 at 02:28):
Hah, turns out that if you restrict to the submagma generated by the counterexample, the 503 counterexample becomes only order 36. I think this means that I could have taken to be an order submagma of the linear order magma my program found.
Matthew Bolan (Dec 02 2024 at 02:31):
This is for its twin 476 and not 503, but that shouldn't make a difference
Matthew Bolan (Dec 02 2024 at 02:42):
Matthew Bolan (Dec 02 2024 at 02:46):
For the others the counterexample (at least the one found by FME) generates the entire magma. I also discovered that my reduction of the original 1110 counterexample from order 25 to 16 was incorrect - somewhere along the way I accidentally switched the law it was supposed to be falsifying from 1629 to 4435.
Matthew Bolan (Dec 02 2024 at 02:50):
1076 just doesn't seem to have many small models (I have found no nonlinear ones so far, Mace 4 is currently checking order 9), so it is going to be very hard to improve on the order 65 counterexample to 1076 -> 2294, 4435 with current ideas.
Amir Livne Bar-on (Dec 02 2024 at 06:15):
Is there a listing of the full parameters of counter-examples found this way? Maybe there are additional conditions they happen to satisfy? (Especially if these conditions limit the space of endomorphisms to consider)
Matthew Bolan (Dec 02 2024 at 06:27):
For the most part, it's the ones I've shared here. I think there were a few others, but for the most part I've been stopping runs once the first example is found.
Amir Livne Bar-on (Dec 02 2024 at 06:42):
Oh, so the coefficients given for G are the M endomorphisms from the blueprint?
Vlad Tsyrklevich (Dec 02 2024 at 08:20):
@Matthew Bolan Have you tried seeing if Lean accepts these? I'm not sure if you have a working dev environment, I can test it for you if you like, but obviously feel free to test it yourself.
Matthew Bolan (Dec 02 2024 at 14:05):
I have not tried. While I do have a working dev environment, I have no experience with lean so it is probably better for someone else to do it. The order 65 example is still pretty large, and probably won't get much smaller, so if we wish to formalize it we're going to have to formalize enough of the cohomology stuff anyway. Despite the fancy language, I don't think that should be too difficult. There is also a possibility that future examples will be massive, or that the idea is applied to some of the unformalized infinite implications.
Matthew Bolan (Dec 02 2024 at 14:14):
Amir Livne Bar-on said:
Oh, so the coefficients given for G are the M endomorphisms from the blueprint?
The coefficients of are the endomorphisms from the blueprint. I currently restrict myself to those endomorphisms coming from multiplication in the underlying field.
Vlad Tsyrklevich (Dec 02 2024 at 16:08):
Matthew Bolan said:
While I do have a working dev environment, I have no experience with lean so it is probably better for someone else to do it.
I would just do to do the same thing as you would anyway: add them to All4x4Tables and run lake build
to see if it builds successfully or not.
Terence Tao (Dec 03 2024 at 01:37):
I found a massive (N = 522) counterexample to all the 1518 refutations. FME refuses to provide a URL but still (grudgingly) accepts the text input:
output.txt
And here is the python code needed to generate that file:
array.py
My previous approach of building good cycles and dissipating all the consequences of 1518 into idempotents did eventually work, but was significantly more complicated than expected, and due to the need to keep a lot of the cycles disjoint from each other, I needed a lot of "space", leading to the large value of N.
While this approach technically works, it may be too complex to verify in Lean. (The counterexample does have a symmetry of order 12, so one can reduce the verification time a little bit from O(522^2) to O(522^2/12) in principle, but this is still unappetizing.) It may be good to find an alternate approach.
Terence Tao (Dec 03 2024 at 01:46):
The basic strategy was to use the following interpretation of 1518, writing for : every "good cycle" of the form will generate further edges , , etc., which must themselves be placed in good cycles. This process could iterate indefinitely, but if one has some idempotents which are right-absorptive with respect to each other in the sense that and , then any good 2-cycle of the form where is also idempotent, will iterate to the 1-cycles , , which then iterate to the idempotent 1-cycles , which are self-consistent (no further good cycles generated). So it is "just" a matter of starting with a good cycle that captures a counterexample to the desired law and iterating it (placing new good cycles in locations that do not cause collisions, and re-using good cycles whenever possible to avoid a combinatorial explosion) until all the non-idempotents are gone and the cycle can be iterated to the 2-cycles of the above form. This however took longer than expected; the magma contains 12 non-idempotents (which basically hold the counterexample) but also 510 idempotents (to hold all the iterations arising from the counterexample).
Zoltan A. Kocsis (Z.A.K.) (Dec 03 2024 at 02:31):
Terence Tao said:
I found a massive (N = 522) counterexample to all the 1518 refutations. FME refuses to provide a URL but still (grudgingly) accepts the text input
Great work!
The maximum length of a valid URL is technically 2048 ascii characters, so there's no way to fit this example without implementing some kind of compression :sweat_smile:
If another approach turns out to be difficult, I think it'd be possible to verify this in-kernel using some technical tricks (e.g. divide-and-conquer, split into separate proofs which are smaller and together imply the result, should not be too hard) or by formalizing the construction and some kind of ceritifcate that the construction terminates successfully which is smaller than the entire optable (might be hard).
Alex Meiburg (Dec 03 2024 at 04:31):
In terms of evaluating that effectively: the table is still mostly the right-absorptive law. If the number of exceptions was O(1), then docs#Function.update - or equivalently, writing it like f x y : = if (x = 7 ∧ y = 13) then 7 else y
- could be a more efficient representation, and most of the verification could be done "analytically".
But in this case, ~17% of the grid is not right-absorptive, which is a pretty sizable fraction. But if there's a clean representation of most of the exceptions, that could definitely help.
Have you checked if the magma has any nontrivial quotients that still witness the desired nonimplication?
Terence Tao (Dec 03 2024 at 05:03):
I have not attempted any compression. It was possible to compress the (somewhat similar) 503 construction from size 84 to size 50, and I would imagine a similar compression would be possible here, though doing so destroys the symmetries and so may not actually be a net win.
There is a lot of flexibility in the construction and it may be possible to get the size down to something more like 100 with more clever choices, though that's still rather large.
One can phrase 1518 in the following form: if a triple is of the form , then . Ordinarily this would take verifications, but for this magma the number of triples is a lot smaller than because it is designed to have a lot of repetition. For instance, due to the mostly right-absorptive nature, most of the triples end up being just . A back of the envelope calculation suggests that the number of triples is of the order of 1000, and after reducing by symmetries there are actually about 100 or so. So there may be a way to implicitly describe the multiplication operation that makes it easier to certify. (The triples are the length two subpaths of the "good cycles", of which there are about 170 or so of length 6, another 100 or so of length 2, and the rest are loops of length 1 or of a very simple form.)
Vlad Tsyrklevich (Dec 03 2024 at 06:29):
Zoltan A. Kocsis (Z.A.K.) said:
The maximum length of a valid URL is technically 2048 ascii characters, so there's no way to fit this example without implementing some kind of compression :sweat_smile:
We should all be using the fragment identifier (#
) instead of the query string (?
), the query string is (needlessly) sent to the server and causes poor cache behavior, the fragment identifier is local to the browser and has a much larger size limit. (Graphiti also uses the query string and shouldn't to accept larger inputs)
Zoltan A. Kocsis (Z.A.K.) (Dec 03 2024 at 07:53):
@Vlad Tsyrklevich Agreed. I originally did it this way to take advantage of Elm's query string key-value parser, but the features this gave me for free ended up being unused, and the current version bypasses the key-value parsing anyway. I plan to update it to support #
eventually, but now it'll have to support both (not doing so would break existing URLs).
Vlad Tsyrklevich (Dec 03 2024 at 07:54):
Yeah, I've meant to do the same but it's just lost in the other list of TODOs
Kevin Buzzard (Dec 03 2024 at 08:32):
I have a completely dumb question. Everyone seems to be happy with the idea "obviously 522^2 is too big for Lean" but if you take a step back and ask the question "is 522^2 too big for a computer program" the answer is clearly "no, it's no longer the 1950s". I remember writing code in Lean 3 about checking some equality in 3 variables mod 3 and discovering that doing the brute force approach of checking all 27 cases was surprisingly slow; I learnt from that that 27 was a big number for Lean 3 to count up to, but everyone assured me that Lean 4 was on the way, and Lean 4 was a proper programming language, not just a research project, and that Lean 4 was as fast as Haskell in some test cases. So what is actually going on here?
Here's a concrete question: can one use this funny extra reduceBool axiom to actually do this 522 calculation? And another one: can one do the calculation without the extra axiom? Or do we really have to come to terms with the fact that 522^2 is not a number that Lean can count up to? There's a program in Coq which worked out and formally verified the first million digits of pi. I bet 522^2 isn't too big for Coq.
Vlad Tsyrklevich (Dec 03 2024 at 08:50):
I think someone that knows Lean internals better would have to say why Lean chokes on such small operation tables, it may be that we're just using the system poorly.
Bruno Le Floch (Dec 03 2024 at 11:45):
Even just def matrix : Array (Array Nat) := #[#[1,2],#[3,4]]
takes forever for a 522 times 522 matrix. I tried with UInt16 or UInt32 instead of Nat with no success. ChatGPT suggests using a ByteArray and reimplementing whatever operations we need, in particular indexing ourselves in the ByteArray. This is a bit annoying because the entries of the multiplication table are bigger than a Byte so we have to work with pairs of Bytes. It seems doable though, and (almost) certain to use a flat storage, rather than nested pointers all over the place.
Zoltan A. Kocsis (Z.A.K.) (Dec 03 2024 at 12:53):
Bruno Le Floch said:
Even just
def matrix : Array (Array Nat) := #[#[1,2],#[3,4]]
takes forever for a 522 times 522 matrix.
What happens if you create an Array (Array Nat)
of the same dimension programmatically by using a short program to fill in the entries, say (x,y) \mapsto x
, instead of giving an explicit definition in the source? Does it still choke? I'd test this myself but don't have a working Lean4 installation at the moment.
Bernhard Reinke (Dec 03 2024 at 13:01):
Kevin Buzzard said:
I have a completely dumb question. Everyone seems to be happy with the idea "obviously 522^2 is too big for Lean" but if you take a step back and ask the question "is 522^2 too big for a computer program" the answer is clearly "no, it's no longer the 1950s". I remember writing code in Lean 3 about checking some equality in 3 variables mod 3 and discovering that doing the brute force approach of checking all 27 cases was surprisingly slow; I learnt from that that 27 was a big number for Lean 3 to count up to, but everyone assured me that Lean 4 was on the way, and Lean 4 was a proper programming language, not just a research project, and that Lean 4 was as fast as Haskell in some test cases. So what is actually going on here?
Here's a concrete question: can one use this funny extra reduceBool axiom to actually do this 522 calculation? And another one: can one do the calculation without the extra axiom? Or do we really have to come to terms with the fact that 522^2 is not a number that Lean can count up to? There's a program in Coq which worked out and formally verified the first million digits of pi. I bet 522^2 isn't too big for Coq.
I glanced at the conclusion of the linked paper; it seems to me that they need to trust native_compute
, so also a compiler.
Zoltan A. Kocsis (Z.A.K.) (Dec 03 2024 at 13:10):
Kevin Buzzard said:
I remember writing code in Lean 3 about checking some equality in 3 variables mod 3 and discovering that doing the brute force approach of checking all 27 cases was surprisingly slow
My infinite model showing Equation1701_not_implies_Equation4587
requires a similar (but 120-fold) case distinction, and it's really quite slow. I asked if people knew any techniques to improve it, but none of the suggestions resulted in major performance gains.
I bet 522^2 isn't too big for Coq.
From experience, as someone who worked on some very big proofs, 522^2
can be too big for most proof assistants in certain similar circumstances, including Coq, and even proof assistants based on very different formalisms such as Isabelle/HOL. And while 522^2
is trivial in terms of pure computational verification (the Finite Magma Explorer completes the check above in a fraction of a second; a C program would be much faster still), it's just about at the threshold where I'd not be surprised if it started being substantial work in terms of "creating and storing a more complicated proof object".
Bruno Le Floch (Dec 03 2024 at 13:48):
Zoltan A. Kocsis (Z.A.K.) said:
What happens if you create an
Array (Array Nat)
of the same dimension programmatically by using a short program to fill in the entries, say(x,y) \mapsto x
, instead of giving an explicit definition in the source? Does it still choke? I'd test this myself but don't have a working Lean4 installation at the moment.
Sorry, I'm struggling with the Lean syntax so I wasn't able to test that easily. I will try tonight.
Amir Livne Bar-on (Dec 03 2024 at 13:54):
Just checked, it's immediate to create an array of any size. But if you try to index a large array like arr[4][2]
you get "maximum recursion depth has been reached" when calculating or verifying a theorem that depends on the value. This error also comes very quickly.
Shreyas Srinivas (Dec 03 2024 at 14:09):
that's easy
Shreyas Srinivas (Dec 03 2024 at 14:09):
When you do indexing like that the proof of correctness is constructed by omega
Shreyas Srinivas (Dec 03 2024 at 14:10):
If instead you use unsafe indexing it works fine.
Shreyas Srinivas (Dec 03 2024 at 14:10):
See below:
def ex : Array <| Array Nat := mkArray 1122 (mkArray 512 12)
#eval ex[134]![123]! -- works. no proofs. panic if index out of bounds
#eval ex[134][123] -- max recursion
Shreyas Srinivas (Dec 03 2024 at 14:10):
For proofs, I have to guess lean is using the unary Nat representation
Shreyas Srinivas (Dec 03 2024 at 14:13):
The following also works:
def ex : Array <| Array Nat := mkArray 1122 (mkArray 512 12)
#eval ex[134]![123]! -- works
set_option maxRecDepth 10000
#eval ex[134][123] -- no more max recursion error. Now it works
Shreyas Srinivas (Dec 03 2024 at 14:20):
And for what it is worth, 522^2
is definitely not too big for lean:
#eval 522^2
#reduce 522^2
example : 522^2 = 272484 := by rfl
Shreyas Srinivas (Dec 03 2024 at 14:21):
The comparison with haskell is not fair because haskell does no static checking of index bounds whatsoever
Mario Carneiro (Dec 03 2024 at 14:30):
Shreyas Srinivas said:
For proofs, I have to guess lean is using the unary Nat representation
no
Mario Carneiro (Dec 03 2024 at 14:31):
Kevin Buzzard said:
I have a completely dumb question. Everyone seems to be happy with the idea "obviously 522^2 is too big for Lean" but if you take a step back and ask the question "is 522^2 too big for a computer program" the answer is clearly "no, it's no longer the 1950s".
I encourage you to keep this attitude. Our proof assistants are too slow by orders of magnitude and we need people to push for better
Mario Carneiro (Dec 03 2024 at 14:34):
In any case, I think this is an interesting challenge to do via the kernel and straightforwardly doable using native_decide
Shreyas Srinivas (Dec 03 2024 at 14:44):
Btw, here's a more fancy example which easily does the first eval but gets stuck on the second eval:
import Mathlib
def ex : Array <| Array Nat := @Array.ofFn (Array Nat) 512 (fun i => @Array.ofFn Nat 512 (fun j => i + j))
set_option maxHeartbeats 1000000
set_option maxRecDepth 10000
#eval ex[134]! -- works. no proofs. panic if index out of bounds
#eval ex[134] -- stuck for a long time before giving answer
Daniel Weber (Dec 03 2024 at 14:44):
The main problem with using the kernel is the inefficiency of random access, right? I wonder how efficiently we can do the verification of the table as a circuit, that might be a good model
Mario Carneiro (Dec 03 2024 at 14:46):
The fact that this is compressible to a small python file tells me that the proof is for sure smaller than 512^2 verifications anyway
Mario Carneiro (Dec 03 2024 at 14:47):
It would be interesting to verify the computation directly but IMO that would destroy a lot of structure
Bruno Le Floch (Dec 03 2024 at 17:57):
I'm having a bit of success manipulating very large ByteArray quickly, indexing them, etc. The main difficulty is to create them since the typical techniques go through an Array or List, which is too slow. Instead, I load a binary file that stores the entries of the multiplication table as pairs of bytes. Do we want to pursue that direction? It seems quite ugly to have the proof depend on a binary data file, and it will necessitate a bit of work to prove that the calculations on packed ByteArray really check the magma equations.
Terence Tao (Dec 03 2024 at 18:01):
OK, I think I am beginning to understand the issue with verifying the order 522 example efficiently in Lean. As pointed out previously, it mostly has to do with the lack of support in Lean for efficient random access memory.
Roughly speaking, the operation on the order 522 example is defined by parametric rules for , with each rule taking the form
- Rule : For all , we have .
Here is some explicit set and the functions , , are all very simple and decidable. Here is a simplified example rule:
- Sample rule: For all , , , we have .
(Actually, to get to rules instead of like , one has to add extra parameters here, but this should convey the general idea.)
Furthermore - and this is the tricky thing - there is a default rule:
- Default rule: If is not defined by any of the previous rules, then .
In order to verify that this creates a counterexample, one first has to verify that the rules don't collide with each other, so for two different rules, Rule and Rule , one has to verify that the are distinct from the . This is tedious but with or so rules I can imagine these case checks are within the capability of Lean.
The harder thing is that there are going to be about claims of the sort "For all in some set , the operation is covered by the default rule and is thus equal to ". To verify this, we need to show that is not covered by each of the other rules, and so this leads to case checks, which makes the complexity on par with the brute force calculation using a random access memory array to store the multiplication table.
In any event, even if we find some ad hoc way to make this particular calculation work, I agree with Kevin that it would be a much more useful contribution in the long term if we figure out how to give Lean the ability to efficiently work with random access arrays.
Amir Livne Bar-on (Dec 03 2024 at 18:03):
Running the script with J = 5 gives a magma of order 438 that's also a counter-example to the 4 equations
Amir Livne Bar-on (Dec 03 2024 at 18:05):
How hard would it be to prove that some arbitrary high J would work? It would be nice to formalize this, whether or not we manage to convince Lean to brute-force check this magma
Amir Livne Bar-on (Dec 03 2024 at 18:07):
(quantitative arguments are always a bit complicated, e.g. so far all the greedy constructions so far use the free group rather than familiar groups)
Terence Tao (Dec 03 2024 at 18:12):
Hmm, this could be a promising direction. A lot of my design choices for the magma were focused on keeping the magma small, so I kept reusing elements whenever I could, but this led to a lot of potential collisions to avoid. If I stop caring completely about making the magma small, then I can spread out the elements I need much more widely (in a similar spirit to the greedy algorithms) and there should be a lot fewer collisions that need checking. I could potentially run this as basically a greedy algorithm that happens to terminate in finite time. Let me think about this.
Kyle Miller (Dec 03 2024 at 18:31):
Terence Tao said:
it mostly has to do with the lack of support in Lean for efficient random access memory.
Something that's a bit interesting about kernel typechecking is that there's a lot of caching — so it might actually be that random access of a List can be faster than you think. It's sort of inefficient (keys would look something like List.get! lst 37
), but it's random access nonetheless.
Is there a summary somewhere of what it would mean to verify output.txt
? If I have time today, I might try taking up the challenge.
Amir Livne Bar-on (Dec 03 2024 at 18:40):
Equation 1518 is M[M[y][y]][M[x][M[y][x]]] == x
(for all x
and y
).
All counter-examples are at 0, so
M[0][M[0][M[0][0]]] != 0 eq 47
M[0][M[0][M[M[0][0]][0]]] != 0 eq 614
M[0][M[M[0][0]][M[0][0]]] != 0 eq 817
M[M[0][M[0][0]]][0] != M[0][0] eq 3862
Daniel Weber (Dec 03 2024 at 18:42):
Here's a potential approach for kernel verification: for every x, y
we precompute 3 values — . This gives four values of the operation, so we just need to verify that there's at most one value for any input. If we have all of these values in one list, then we can sort it, and then it can be verified in one scan of the array. Can the kernel do (linked-list-friendly) merge sort efficiently enough?
Alex Meiburg (Dec 03 2024 at 19:00):
Mario Carneiro said:
The fact that this is compressible to a small python file tells me that the proof is for sure smaller than 512^2 verifications anyway
That was my first thought as well, but the Python file doesn't describe a construction that "obviously" (as in, for a reason transparent to a human) works, I think. My understanding is more that it takes an operation table and gradually modifies it to try to make more and more properties desirable. It turns out that after ~40k such modifications (which do mostly have a nice structure), the construction closes.
It's possible there's a clever way to prove (in the normal human sense) that the construction closes, which could then certainly be translated to Lean, but I don't think there was such a proof in mind when the code was written. @Terence Tao can correct me there if I'm wrong. :)
Terence Tao (Dec 03 2024 at 19:08):
Yeah, I don't yet have a formal proof that this method had to work eventually, but instead it was an iterative process as you describe where the 1518 law was holding on most of the magma but had some places where the law was failing. I could fix that by tweaking the magma operation (and sometimes adding additional elements), but (a) sometimes this caused unwanted collisions with previous rules, requiring me to backtrack and try a different tweak, and (b) the tweak would generally introduce new violations. For (b) there was a strong sense that I could make the new violations be "simpler" than the previous violations, ultimately to the point where I could resolve the violations by enforcing the right absorptive law on the offending elements, but dealing with (a) was an ad hoc process where I would repeatedly implement a proposed solution in python and it would report a collision that I did not expect, forcing me to make the solution more complicated and larger each time. But this was in large part because I was recycling elements quite often to save space, which created a much higher density of potential collision scenarios. Following Amir's suggestion, I am now trying to see if one takes a much more "greedy" approach in which one just keeps introducing new idempotent elements to fix previous violations without any effort to reuse them, then one can close the argument in finite time. (I'm pretty certain that one can create an infinite example this way, and indeed we essentially did this for our main project; the trick is to somehow capture the property (b) I was empirically observing of the violations getting "simpler" as the iteration progressed, even if the number of such violations starts to increase exponentially.)
Mario Carneiro (Dec 03 2024 at 19:19):
I still think that this process yields a much smaller list of things to check than 512^2. It's structured as a small number of modifications on a large table, and because of that the list of things to check is only approximately the square of the number of modifications rather than the whole table
Mario Carneiro (Dec 03 2024 at 19:21):
I got distracted earlier but I'm working on a formalization of this (human) definition, but I encourage those playing with the "data dump" approach to continue
Mario Carneiro (Dec 03 2024 at 19:22):
Amir Livne Bar-on said:
How hard would it be to prove that some arbitrary high J would work? It would be nice to formalize this, whether or not we manage to convince Lean to brute-force check this magma
I expect this version of the proof to be even easier than the J = 5 version, if you can figure out how to structure it
Kyle Miller (Dec 03 2024 at 19:47):
I've got a proof of M[M[y][y]][M[x][M[y][x]]] == x
running, let's see if it completes. I got it to verify for all 0 <= x, y < 50
in 7.7 seconds, so maybe it'll be done in about 15 minutes.
To use this, you take the array itself and put it into a file named data_1518.txt
in the same directory.
code
Kyle Miller (Dec 03 2024 at 19:48):
Oh, hm, I just ran out of RAM and had to kill it.
Mario Carneiro (Dec 03 2024 at 19:51):
does it work in native_decide
?
Kyle Miller (Dec 03 2024 at 19:58):
Haven't tested, because I'm skipping the compiler when I add the matrix to the environment
Kyle Miller (Dec 03 2024 at 20:02):
(I haven't had the patience to wait for the compiler to compile this large literal... Is there any way to say "this is already in memory, please use it!"?)
Kyle Miller (Dec 03 2024 at 20:04):
At least I could try doing what native_decide
would do manually. The M[M[y][y]][M[x][M[y][x]]] == x
rule holds, according to verify_rule
.
def test_verify (filename : String) : TermElabM Bool := do
let srcPath := System.FilePath.mk (← MonadLog.getFileName)
let some srcDir := srcPath.parent | throwError "{srcPath} not in a valid directory"
let path := srcDir / filename
let data ← IO.FS.readFile path
let matrix ← parseData data
return verify_rule matrix
#eval test_verify "data_1518.txt"
-- true
Amir Livne Bar-on (Dec 03 2024 at 20:42):
How comes compiling a large literal takes more time than hundreds of small ones? Is it due to optimizations that are allowed within values but not between values in the same module? I can't think of anything else that would make the runtime super-linear...
Kyle Miller (Dec 03 2024 at 20:44):
My guess is that it's spending time trying to optimize it, and it's not happy with the quarter of a million List.cons
cells. It's going to compile to some code that calls the List.cons
a whole lot.
Mario Carneiro (Dec 03 2024 at 20:58):
The compiler also isn't good at avoiding deep terms, in particular because it automatically puts expressions in A-normal form
Mario Carneiro (Dec 03 2024 at 20:59):
combine this with the fact that handling deep nested binders is quadratic and you are in for a bad time
Kyle Miller (Dec 03 2024 at 21:09):
Another approach is to break up the computation across multiple definitions. Here's the beginning of that, it defines rule_verified_0
through rule_verified_521
, and it'll take 3-4 hours to complete... But it shouldn't run out of memory!
code
Mario Carneiro (Dec 04 2024 at 01:43):
Out of time for today, but you can see my progress on equational#976. Turns out this has all the usual structure of a greedy construction, but I don't yet understand all the invariants. (@Terence Tao ?) One thing in particular which I'm not quite following is that the code seems to add things in "cycles", which implies something about an auxiliary property like "every partial assignment induces a directed graph where every vertex in the graph has positive in and out degree". (Or maybe in/out degree 1?) It would be great if someone could figure out the greedy invariants in the style of the 854 proof.
Mario Carneiro (Dec 04 2024 at 01:53):
A concrete theorem which I don't know how to prove without some additional invariant is that the construction proceeds by adding a cycle of 6 points whenever (where is one of the non-idempotents), but why is it the case that (for some ) isn't already in the graph?
Terence Tao (Dec 04 2024 at 03:38):
This is indeed the biggest obstacle to making this construction fully automated. One needs "enough" cycles passing through that, for any for which one wishes to define , that at least one of them has the property for all , so that one can safely define this operation. Sometimes no such cycle is currently available and one has to first insert that cycle into the partial operation first before being able to define . In practice, if the magma is large enough and the cycles are sufficiently "disjoint" then one can do this, possibly after adding in a few more elements to make room, but it's hard to do it systematically. I have experimented with trying to formally create a greedy algorithm for this, but the amount of new elements and operations one has to add to "fix" any undefined operation is already about half of the size of the finite model, so verifying that the greedy construction iterates properly may well be of comparable difficulty to directly verifying the finite model.
Terence Tao (Dec 04 2024 at 04:27):
A key problem is that every cycle through a non-idempotent generates a further cycle that passes through the same non-idempotent . It turns out that this process can be made to loop back on itself in steps for some , though to avoid unwanted collisions, and to have enough cycles through available that one can always allocate one to a given to define , one needs to be moderately large (about 5 or so). Furthermore, these cycles will also generate a few other 2-cycles that also need to be added. The process does terminate and lead to the ability to define for any given , but it takes a remarkably large amount of effort to do so.
Mario Carneiro (Dec 04 2024 at 08:34):
So the way I would like to structure this is that we have some base behavior of the operation (the rule, and most of the seed properties, including all the 2 cycles), which completely determines the value of all of the elements not involved in the greedy rule. The greedy rule itself should ideally be over the smallest possible "action space" because that reduces the number of cases to check (the proof that the initial seed is functional is already quite scary looking in my formalization, so if I can restructure it as a function instead of a relation that will make things much easier).
Mario Carneiro (Dec 04 2024 at 08:42):
If there is a deterministic process of cycle adding that loops back on itself, I think that should all be done in the same extension step. We can at least predict what happens in that case and prove that it loops back properly. The hard part is the greedy step where we take a "random" element and add it to the graph, since then we lose information about the structure we've built. The number of new elements in each step is not the issue (in fact adding more new elements in each step is a good thing because it reduces the number of steps), instead we need a description of what an obstruction looks like. Your code just solves all the greedy steps in order; is it just a coincidence that this works and a different order may get stuck?
Mario Carneiro (Dec 04 2024 at 08:54):
We could define a valid greedy step as one where for all , but in that case the limit object will be one which is "stuck" such that all cycles are blocked. I did not understand your point about "one has to insert that cycle into the partial operation first", it sounds like you are modifying the seed in this step? So maybe this is the mechanism you are using to reorder things instead.
Terence Tao (Dec 04 2024 at 15:56):
Oh, this is a step that is not reflected in the final python code I uploaded, but rather in the trial-and-error process (across many versions of the python code, as well as many scratch paper and blackboard diagrams) in which I kept trying a seed, finding a collision, and then modifying the seed in an ad hoc manner to fix the problem.
In any event I think I now have a more high-level abstraction of the argument. Given a carrier , define a suitable digraph to be a directed graph of edges on with the following axioms:
- There is a bijection on the set of edges that maps each edge to a unique "successor" that starts at the final vertex of the original edge. (But there may also be non-sucessor edges emenating from .) Hence every edge also has a unique "predecessor" (but there may be other edges entering ). We can then define a "good path" in the graph in which each edge is the successor of the previous one, and similarly a "good cycle" .
- If is a good path, then is also an edge in the digraph.
In the case of finite 1518 magmas, there is an associated suitable digraph in which if for some , and the successor of is (and the predecessor is ).
Because of finiteness, the suitable graph can be thought of as an edge-disjoint collection of good cycles, as each edge will belong to exactly one cycle. Note that we allow cycles of length 1 (these will correspond to idempotents), as well as cycles of length 2. In practice, I was able to generate candidate suitable graphs by starting with a cycle that I wanted to have in the graph (to hold a desired counterexample), creating more edges by Axiom 2, completing those into cycles in some ad hoc process (guided by avoiding previous failures), and iterating until the cycles produced became 1-cycles (which iterate to themselves).
Now suppose we have selected a suitable digraph and also a bijection with the property that is in the diagraph for all . Define a partial solution to be a partially defined operation on obeying the following axioms:
- If $$y \op x = z$$, then is an edge in the digraph. In such cases we write .
- If is a good path and , then . (By periodicity, we also get the converse implication for free: if , then ).
- If is a good path, then .
- For any , we have .
One can check that if a partial solution is total, then it obeys 1518, and that is indeed the squaring operation .
One can attempt to generate a seed partial solution by imposing Axioms 3,4 to create some initial operations, and then propagating them by Axiom 2. This construction is guaranteed to obey Axiom 1, but is not guaranteed to have be a partial function: there can well be collisions in which we attempt to enforce and for distinct . I do not have a general theorem that gives clean conditions for when such collisions can be avoided: I have relied on trial and error and python code (and some intuition about how "sparse" suitable digraphs are less likely to cause collisions) to do this.
Once one has a seed, one has to complete it to a total operation, by assigning values to that are currently undefined. The easiest case is when and are both idempotent. Then we simply assign the right absorptive law , the addition of which does not break any axioms. However, one should not perform this step initially, but only at the end, because there are other completion steps that may require to be assigned something other than the right absorptive law.
The hardest case is when are both non-idempotent. Here the only solution I had was to ensure that the initial seed already defines all these operations. This is one reason why I wanted to keep the set of non-idempotents as small as possible (there are 12 of them in the specific example, but I think I have a way to cut it down to 6).
A relatively easy case is when is non-idempotent and is idempotent. Here one can proceed if (a) there is a shift isomorphism that preserves the partial solution and the suitable digraph; (b) One has ; and (c) one has as a good cycle in the digraph. In this case one can set for all integers . The axioms (a), (b), (c) will ensure that this does not break any axioms or create any collisions. So I designed the digraph and the squaring map to have these symmetry assumptions.
Finally, there is the case where is idempotent and is non-idempotent. Here, the task is to find a good cycle passing through , such that is not defined for any . Then one can set for all . This can be done if there are "enough" good cycles that are "spread out" in some way. It is in order to achieve this that the parameter in the python code had to become large, and this is major contributor to the large size of the magma.
I believe I have a slightly simpler construction of a digraph and squaring function that both leads to a smaller magma (it should be of size 144, if my calculations work out), but I will check it in python first (and also refactor the python code to reflect the above arrangemnt of the argument) before reporting it.
Vlad Tsyrklevich (Dec 04 2024 at 16:47):
Matthew Bolan said:
This is for its twin 476 and not 503, but that shouldn't make a difference
Finally got around to looking at and importing these, they didn't cause any issues.
Matthew Bolan (Dec 04 2024 at 16:48):
Do you plan to formalize the cohomology construction in general? I assume the order 65 example is a problem right?
Matthew Bolan (Dec 04 2024 at 16:48):
I can provide the needed tables to compress all these examples further if the construction itself gets formalized.
Vlad Tsyrklevich (Dec 04 2024 at 16:49):
I've got a host of others things in front of me right now, so I'm not planning to do it in the short-term. If you have the data though and it's fresh in your mind, it may be worth adding to that GitHub ticket.
Vlad Tsyrklevich (Dec 04 2024 at 16:54):
Matthew Bolan said:
I assume the order 65 example is a problem right?
Yes
Matthew Bolan (Dec 04 2024 at 17:06):
That one in particular has carrier , and operation , where is given by
this table
Matthew Bolan (Dec 04 2024 at 18:22):
Matthew Bolan said:
1516 -> 1489 counterexample can be reduced from size 49 to size 35
If we are adding those other two we might as well also add this one.
Vlad Tsyrklevich (Dec 04 2024 at 18:24):
Oh, I had missed this in the backscroll.
Terence Tao (Dec 05 2024 at 03:10):
I reduced the size of the 1518 example to 232, down from the previous record of 438, by using a different type of starting cycle and making some different choices in iterating.
output2.txt
array2.py
The python code is now refactored to first construct the list of cycles and show it obeys the required axioms 1-4, and then start building the multiplication table from that list and checking that no collisions occur. This may be somewhat easier to formalize.
Kyle Miller (Dec 05 2024 at 04:58):
@Terence Tao This time I tried a base-256 encoding of each row of the matrix, since the Lean kernel has special support for Nat, so it's a way to do random-access arrays. This made it possible to verify a form of M[M[y][y]][M[x][M[y][x]]] == x
in 22s, but it will still take a little work to prove that the calculation I did is what I'm claiming it's doing. (It's just to translate the verify_rule
function into a proposition.) The data format is the same — it's assuming the file it's reading is just the array from your output2.txt file.
code
Mario Carneiro (Dec 05 2024 at 09:19):
you also need to check that all the entries are less than 232 with this formulation
Joachim Breitner (Dec 05 2024 at 09:34):
Kyle Miller said:
This time I tried a base-256 encoding of each row of the matrix
Isn’t that pretty much what https://github.com/teorth/equational_theories/blob/6c65d0a0a0193ea88383f0234e5511e19b2f7142/equational_theories/MemoFinOp.lean does, or am I missing something here?
Mario Carneiro (Dec 05 2024 at 10:20):
the encoding is slightly different, since it's using powers of 2 and also using a list of nats instead of a single one
Mario Carneiro (Dec 05 2024 at 10:21):
it might be interesting to see whether memoFinOp can be sped up by using powers of 2 so that you can use shiftl instead of mul
Joachim Breitner (Dec 05 2024 at 11:04):
This was discussed before somewhere here. I believe it won't save much. My assumption is that the cost is kernel time, not the actual bignum operation, and if you use powers you then have to an extra mod
or lt
to clamp to the expected Fin
range, so there is an extra step needed.
But I could be wrong, predicting performance is always dangerous. Shouldn’t be hard to adapt the existing code if someone is curious.
Joachim Breitner (Dec 05 2024 at 11:05):
Is list of nat an improvement over a single nat? Maybe to work around the issue with how lean represents huge nat literals in the C code (lean4#5771)? If so, should it be an RArray
of Nat
?
Mario Carneiro (Dec 05 2024 at 11:23):
there is already a mod
involved in opOfTable
Mario Carneiro (Dec 05 2024 at 11:23):
and the t / n^i
division in particular is one of the more expensive bignum operations you can do
Mario Carneiro (Dec 05 2024 at 11:24):
we're looking at numbers with a few thousand bytes at least, it should be measurable at that size
Joachim Breitner (Dec 05 2024 at 11:26):
Mario Carneiro said:
there is already a
mod
involved inopOfTable
Right – a div and a mod. With power of two you’d need a shift and an and and a mod or lt-check.
But you are right, it’s worth a try!
Mario Carneiro (Dec 05 2024 at 11:27):
the difference is that the mod will be of numbers around 5-30, rather than 2^25-2^900
Mario Carneiro (Dec 05 2024 at 11:28):
actually, the % n
and Nat.mod_lt
proof is already there in opOfTable
Mario Carneiro (Dec 05 2024 at 11:28):
that's the exact same check either way
Joachim Breitner (Dec 05 2024 at 11:32):
It’s not quite the same (or I’m confused) – the % n
isn’t just a check, it also removes the “upper bits”, whereas in the power-of-two variant you (presumably) remove the upper bits after shifting using && (2^k-1)
or % 2%k
, and then still have to do one more operation to make lean accept it as a Fin n
.
But I agree that it’s worth a try given the smaller numbers.
Also calculating the shift as i * k
is likely cheaper than calculating the divisor n^i
.
Mario Carneiro (Dec 05 2024 at 12:48):
You can do all the bit selection using bit operations: (t >>> a*w) &&& !(1 <<< w)
and then % n
or if x < n then x else 0
at the end
Mario Carneiro (Dec 05 2024 at 12:50):
ideally you would calculate w
and !(1 <<< w)
beforehand since they are constants
Joachim Breitner (Dec 05 2024 at 13:09):
Ok, no confusion then :-)
Terence Tao (Dec 05 2024 at 15:38):
Here is a modification to the cohomology method that may help with the Lean verification of examples such as the order 169 refutation of 1076 -> 2294. Here we work with operations of the form where all take values in the same finite field of prime order (e.g., in the case of the order 169 example), the base operation is also linear, and is a function. Previously we were viewing this as a linear algebra problem over unknowns, which appears to be challenging for Lean (though no problem for e.g., SAGE). But notice that can also be interpreted as a bivariate polynomial of degree at most (with coefficients in ; actually the only monomials that show up are of the form with , thanks to Fermat's little theorem). Also, the derivative operator takes homogeneous polynomials of degree to homogeneous polynomials of degree . So we can split up into "isotypic subspaces" and assume without loss of generality that is a homogeneous polynomial of degree for some . Now there are only unknown coefficients; the previous 169-dimensional linear algebra problem has been split up into 25 linear algebra problems of dimensions between 1 and 13. I'm suspecting that one should already see counterexamples at relatively low degrees, e.g., d=2,3,4, in which case the entire magma can be defined by a low degree polynomial and verification within Lean should actually be quite easy (maybe even doable by ring
).
For fixed one could now test all primes (and all coefficients ) at once by viewing the problem as an abstract linear algebra problem with coefficients polynomial in some indeterminates obeying some polynomial identities, and apply some Grobner basis algorithm.
Matthew Bolan (Dec 05 2024 at 15:52):
Alright, that seems easy enough to do, I'll get on it.
Terence Tao (Dec 05 2024 at 16:00):
A cheap option would be to take the existing counterexample , express it as a bivariate polynomial, split it into homogeneous components, and extract the lowest degree component that isn't identically zero. This should also be a counterexample.
Terence Tao (Dec 05 2024 at 16:03):
Actually, it might not be the lowest degree component that holds the counterexample, because the 2294 derivative operator could vanish on that component. One wants the lowest degree component for which the 2294 derivative operator doesn't vanish.
Matthew Bolan (Dec 05 2024 at 16:11):
Right now I'm going to just intersect the I currently calculate with the space of degree polynomial functions. I can also do the Groebner basis computation pretty easily, but I think I want to start from scratch for that one.
Kyle Miller (Dec 05 2024 at 17:10):
Mario Carneiro said:
you also need to check that all the entries are less than 232 with this formulation
Ok, I added the couple lines of code to check this, which is the trivial part of this exercise.
On top of this, one should add
def M : Fin 232 → Fin 232 → Fin 232
| ⟨x, hx⟩, ⟨y, hy⟩ => ⟨extract data_1518 x y, data_1518_is_table x hx y hy⟩
and then restate the verification theorems in terms of this M
, but I think the underlying kernel computation theorems should still be in terms of data_1518
to avoid unnecessary Fin juggling. Anyway, the question of whether this could be checked computationally in Lean is unblocked now, right? Am I missing anything?
Kyle Miller (Dec 05 2024 at 17:24):
Joachim Breitner said:
Isn’t that pretty much what https://github.com/teorth/equational_theories/blob/6c65d0a0a0193ea88383f0234e5511e19b2f7142/equational_theories/MemoFinOp.lean does, or am I missing something here?
Sorry, I'm not very familiar with what's been done in this project. I can say that there are three performance considerations that led to my design:
- I use Nat rather than Fin for the kernel verification, leaving the Fin interface for main final theorem statements. There's no need for the kernel to spend time manipulating the Fin constructor, or to do unnecessary mod operations to make sure values are in range. These aren't necessary for the computation, so let's prove them outside the computation.
- I used a
List Nat
encoding rather than a singleNat
because the numbers were big enough that, in my experience, bignum arithmetic really does start to become noticeable. I didn't do any performance comparisons here, since my goal was creating a proof of concept. If it were a singleNat
, we're working with 55k base-256 digit numbers, and individual binary shifts are, sure, O(1), but it will take copying up to 55kB of data. That's a lot of memory overhead just for accessing the matrix, and a quick back-of-the-envelope calculation suggests that this would be at least 11GB of memory traffic just to do all the matrix accesses. Not terrible, but at least by having a list of Nats this goes down to about 50 MB of copying. - I used powers of two because division and modulo are relatively expensive bignum operations, and the numbers are big enough where I'm starting to worry. Shifting is more expensive than I'd like (it takes scanning over the entire number) but at least
255 &&& .
should only look at limb zero, vs. % 256
, which is another linear scan.
Matthew Bolan (Dec 05 2024 at 17:29):
Ok for 1076 we can take to be degree
Terence Tao (Dec 05 2024 at 17:30):
So this should lead to an extremely compact description of the magma operation on as a cubic polynomial for which 1076 should be easy to verify (and 2294 easy to refute) in Lean.
Matthew Bolan (Dec 05 2024 at 17:33):
On the carrier the law is
Terence Tao (Dec 05 2024 at 17:37):
Does it violate both 2294 and 4435?
Matthew Bolan (Dec 05 2024 at 17:37):
Yes
Terence Tao (Dec 05 2024 at 17:38):
OK. This should be straightforward to formalize, I opened equational#981 for this.
Terence Tao (Dec 05 2024 at 17:39):
If you already have the values of for the counterexamples for 2294 and 4435 that may help the formalization process a little.
Terence Tao (Dec 05 2024 at 17:41):
I think I'll also update the relevant section of the paper, this is an extremely concrete example to illustrate the cohomology method
Douglas McNeil (Dec 05 2024 at 17:43):
Lunchtime in my time zone, so I can catch up. How widely applicable are these approaches? Could there be improved and/or simplified refutations for already handled pairs? Can't remember if it's in the current plans or not, but I know collecting statistics on how different techniques performed was mooted at one point.
Matthew Bolan (Dec 05 2024 at 17:46):
Terence Tao said:
If you already have the values of for the counterexamples for 2294 and 4435 that may help the formalization process a little.
I'm having to undo some coordinate transforms I did to feed it into FME, but I believe both violations are at .
Terence Tao (Dec 05 2024 at 17:48):
It feels like this technique could be automated. Though even the preliminary task of identifying the linear magma solutions to each equation hasn't yet been completed (related to equational#364), so constructing linear extensions of linear magmas by polynomial cocycles for all 4694 equations would be a certain amount of work. But it seems doable to me, given enough effort.
Matthew Bolan (Dec 05 2024 at 17:48):
I actually wasted quite a bit of time these past few days partially automating the process of identifying the linear magma solutions.
Terence Tao (Dec 05 2024 at 17:50):
One nice thing about low degree polynomials is that if an identity fails, it tends to fail nearly everywhere (Schwartz-Zippel lemma, applied to polynomial identity testing), so actually in retrospect one could have picked at random and already have a good chance of landing on a counterexample. But certainly having an explicit pair can't hurt.
Terence Tao (Dec 05 2024 at 17:58):
If you can get to the point where you can automatically identify the smallest non-trivial linear magma over a finite field for each of the 4694 equations, that would be a nice addition to Equation Explorer. I would be curious to know what proportion of the refutations in the finite implication graph can be resolved either by linear models, or by linear extensions of linear models, I am guessing the percentage to be extremely high (say 98% or so).
Matthew Bolan (Dec 05 2024 at 17:58):
I ran into the annoying issue that in order to compute the non-commutative linear models, what I want to do amounts to computing a Groebner basis in the free (non-commutative) algebra over . I found this website which lists computer algebra packages and which non-commutative rings they support, but the ones I have access to don't work out. The maple package JanetOre sounds promising though, I just don't have easy access to maple at the moment.
Without the computation over , I still think we can get away with only computing Groebner bases in free algebras over fields, provided the package we are using is able to express the basis in terms of the original generators. This way we can see which primes need to be invertible to guarantee the polynomials we care about lie in the ideal. Unfortunately, I again could not figure out how to do this (but I'm pretty sure this approach should be possible with one of the systems I have access to, I did not try very hard).
Terence Tao (Dec 05 2024 at 17:59):
I suspect the commutative models over a finite field should already suffice for a very large fraction of refutations.
Matthew Bolan (Dec 05 2024 at 17:59):
In contrast, the commutative situation is very simple since sage can do groebner bases over , my cohomology program already has this as a subroutine.
Matthew Bolan (Dec 05 2024 at 18:03):
Terence Tao said:
I suspect the commutative models over a finite field should already suffice for a very large fraction of refutations.
Yes, but as far as I can tell many implications have not been fully checked for (non-commutative) linear immunity, and I'd like to do this. It's annoying because it should be possible for most implications (though there's a good chance the Groebner basis computation does not terminate for some, I think we lose this guarantee in the non-commutative setting), there just doesn't seem to be any one black box I can use for the computation.
Terence Tao (Dec 05 2024 at 18:10):
Actually, for the purposes of the paper, could you run your algorithm for the 1110->1629 refutation with p=5, and ? This should be an even simpler example than the p=13 1076 example.
Matthew Bolan (Dec 05 2024 at 18:19):
That one needs degree .
Terence Tao (Dec 05 2024 at 18:21):
Would you be able to give explicit coefficients?
Terence Tao (Dec 05 2024 at 18:22):
That already is slightly lowering my expectations for the method for the broader implication graph though. I was vaguely hoping that the quadratic and cubic examples would already be quite successful in clearing out implications. Though perhaps it is premature to extrapolate from N=2 data points...
Matthew Bolan (Dec 05 2024 at 18:26):
Ah, I was confused by something and now I am less confused. I think we cannot actually exclude monomials where the exponent on or is greater than , since replacing them using Fermat's little theorem would violate homogeneity.
Vlad Tsyrklevich (Dec 05 2024 at 18:28):
Just using Kyle's method above for faster indexing, the 1076 counterexample finishes in <5s, no maxHeartbeats changes.
Matthew Bolan (Dec 05 2024 at 18:29):
The law is
Terence Tao (Dec 05 2024 at 18:30):
@Matthew Bolan Ah, I missed that, thanks for the correction.
Vlad Tsyrklevich (Dec 05 2024 at 18:30):
Right now I just shimmed it into the current architecture that resolves the refutation by decide!
which seems to be ~2x slower for 1518 than Kyle's Nat.fold check and I haven't had a chance to look at that yet.
Terence Tao (Dec 05 2024 at 18:30):
I assume you mean instead of ?
Matthew Bolan (Dec 05 2024 at 18:30):
yes,
Matthew Bolan (Dec 05 2024 at 18:31):
and there is an obvious transcription error, one second
Terence Tao (Dec 05 2024 at 18:32):
I'll double check it in python
Matthew Bolan (Dec 05 2024 at 18:33):
Yeah, I've actually already done that, so any errors remaining are just transcription ones. I believe it is now corrected though.
Mario Carneiro (Dec 05 2024 at 18:34):
Kyle Miller said:
Anyway, the question of whether this could be checked computationally in Lean is unblocked now, right? Am I missing anything?
No I think you are right, this should solve the issue
Kyle Miller (Dec 05 2024 at 18:38):
@Vlad Tsyrklevich In case it helps, the reason for Nat.fold
was that I saw that it had good properties for kernel reduction, and I worried that the Decidable instance might use something like docs#Nat.all, which seemed like it might be slower because it would have to defer reducing the match
in &&
(but I didn't measure anything!)
Terence Tao (Dec 05 2024 at 18:41):
OK, the quintic counterexample for 1110->1629 works, and x = (1,0) for instance serves as a specific counterexample. FME
Mario Carneiro (Dec 05 2024 at 18:43):
PS: I feel like this thread should have been split ages ago, half of the project has now happened on it
Terence Tao (Dec 05 2024 at 18:47):
Fair enough; I'll open threads for the specific finite implications remaining to be either formalized or resolved.
Terence Tao (Dec 05 2024 at 19:08):
OK, I've spun off the 677 and Lean+Duper topics. I think the task of formalizing the large counterexamples for 1076 and 1518 can be continued here, as it seems we are very close to completing it.
Terence Tao (Dec 05 2024 at 23:35):
Is it worth developing a tactic for verifying magma laws for bytecode arrays? In principle all the finite magma refutations could pass through this tactic which might make it faster.
Matthew Bolan (Dec 06 2024 at 00:34):
I have something resembling a general solver for small working now. Unfortunately it is pretty slow once gets bigger than . I am not taking advantage of the fact the problem is linear in the coefficients of the cocycle at all, so probably I can improve that. In the meantime it revealed that there is a quadratic example for 1516 not implying 1489 on ,
Vlad Tsyrklevich (Dec 06 2024 at 07:07):
Terence Tao said:
In principle all the finite magma refutations could pass through this tactic which might make it faster.
There's already something like this on main, memoFinOp
. It may make sense to unify FinitePoly with All4x4Tables to make it easier to convert all memoFinOp users to the new format, as FinitePoly doesn't define operation tables.
Vlad Tsyrklevich (Dec 06 2024 at 07:10):
@Zoltan A. Kocsis (Z.A.K.) I noticed that FinitePoly only has magmas of size , and when I tried deleting the FinitePoly refutations, I noticed something odd. There were refutations that went missing that were refuted by 2x2 magmas that All4x4Tables also knows about. But those refutations are just not being generated. That seems surprising to me because I don't see anywhere in generate_lean.py
that is looking at the FinitePoly tables to avoid generating those refutations, so I'm not sure what's happening. Do you know off the top of your head why this might be happening?
Zoltan A. Kocsis (Z.A.K.) (Dec 06 2024 at 12:06):
Vlad Tsyrklevich said:
Zoltan A. Kocsis (Z.A.K.) I noticed that FinitePoly only has magmas of size , and when I tried deleting the FinitePoly refutations, I noticed something odd. There were refutations that went missing that were refuted by 2x2 magmas that All4x4Tables also knows about. But those refutations are just not being generated. That seems surprising to me because I don't see anywhere in
generate_lean.py
that is looking at the FinitePoly tables to avoid generating those refutations, so I'm not sure what's happening. Do you know off the top of your head why this might be happening?
Not off the top of my head (it's been a while since I looked at the project at all), but am happy to take a look tomorrow!
Vlad Tsyrklevich (Dec 06 2024 at 17:51):
I spent a lot of time trying to make this into a nice macro and seem to have made everything worse. I'm out of my depth in meta-programming. Trying to use Lean's native parsing introduced a huge overhead (~15s), it seems to be faster to do finOpTable "[[1,2...}
instead of finOpTable [[1,2,...
but I'm sure I could just be doing something foolish. One thing I'm surprised by is that IsTable
takes ~15s which is almost as long as verifying the actual satisfiability. Does that sound right to you @Kyle Miller ?
native parsing example
Kyle Miller (Dec 06 2024 at 19:52):
@Vlad Tsyrklevich I'm not sure there's any benefit to using Lean's native parsing here — the matrices are just data, and all that matters is that the resulting loaded table is a witness. You could even make it load from a base 64 blob or something, if you want the Lean source to be shorter. (At least, these are my own opinions as a relative outsider to this project.)
Using mkAppM is going to be slower than using toExpr like I did. mkAppM goes through a more general procedure with a lot more calculation.
I noticed that IsTable is slow to check too, but it was an acceptable amount of time so I didn't look into it. Best I can say is that it's making use of random access all the way, rather than say iterating over the rows of matrix
and checking the condition, which would be more efficient. You'd have to back that up with a proof — this alternative formulation can go into decidableIsTable
using decidable_of_iff
, which is a way to set which algorithm to use for a given proposition.
Vlad Tsyrklevich (Dec 06 2024 at 22:37):
equational#991 adds support for Kyle's implementation, it is much easier to look at the first commit to avoid the spam from re-generating counter-examples.
Joachim Breitner (Dec 06 2024 at 22:53):
Did anyone already check if using RArray
instead of List
makes a difference? (Assuming the project is already on a new enough lean, if not the code can be copied from https://github.com/leanprover/lean4/pull/6070 easily temporarily.)
Vlad Tsyrklevich (Dec 06 2024 at 23:01):
Just ran a very quick local test, and it does not seem to, when testing with the largest counterexample we have.
Joachim Breitner (Dec 07 2024 at 00:39):
thanks, guess the lists are small enough for this to not matter.
Last updated: May 02 2025 at 03:31 UTC