Zulip Chat Archive
Stream: Equational
Topic: Dynamically detecting useful implications to decide
Daniel Weber (Oct 02 2024 at 03:18):
I want to run some automated theorem provers at a larger scale, but running it on all unknown implications is both impractical and a waste of resources - as implications are proven/refuted, they will imply many other currently unknown ones.
Optimally I think we want some API, which interactively gives an unknown implication, and then receives one of true, false or unknown, and then updates the closure based on that and again gives an new unknown implication, and so on and so forth. Perhaps it could generate a file with conjecture
s at the end.
This raises a technological problem: how do we implement this kind of interactive API?
This also raises an algorithmic problem, of what this API will do. It could run an incremental dynamic transitive closure algorithm (there are algorithms with amortized time per addition, which is great for our , compared to the time it will likely take to decide an implication), but I'm really not sure how one finds the "best" implication to consider. What measure should that even use?
Terence Tao (Oct 02 2024 at 04:10):
One quick and dirty solution is to prune the equation list only every N implications tested for some intermediate value of N, eg N=1000. While the birthday paradox will mean that some redundant implications will be proven this way, the percentage of redundancies is quite small and not worth overoptimizing. It will help to have the outstanding implications sorted in a canonical way, eg lexicographic ordering, so that when one resumes testing after each reduction of the outstanding implications one can quickly recover one’s place in the new list of implications.
Daniel Weber (Oct 02 2024 at 04:22):
That would mean only taking advantage of transitivity incidentally, not intentionally choosing implications which use it, right?
Terence Tao (Oct 02 2024 at 04:47):
Well, the implications that would transitively imply a lot of other implications are likely to also be the ones that are hardest to prove, so focusing on them may actually be the incorrect strategy since your automated theorem prover will presumably take a lot more time to solve them (or fail to do so after such a long time). So it's not obvious to me that there is a surefire way to optimize.
One could take an empirical approach: first sample some random number of implications, say 1000 of the outstanding implications, run each one through your automated theorem prover and record the success rate and run time, and also record how many further implications each of the 1000 implications yield. Then some simple math would lead to a prediction of what type of implication yields the biggest expected "payoff", and then one could focus subsequent searches on that "demographic" of implication first. [One could be more advanced and use neural nets instead of "simple math" to do some machine learning here, but this approach could consume more CPU than it saves.]
Terence Tao (Oct 02 2024 at 04:53):
Such an empirical approach can also inform which automated theorem prover you want to try first (or what timeout parameters you want to assign to the ATP).
Daniel Weber (Oct 02 2024 at 05:26):
Running vampire on 1000 implications with a timeout of 0.1 seconds, it proved 495 of them, refuted 218, and didn't manage the other 287, in about a minute. I'll try to run it with a larger timeout (on the same problemset) and see how much that improves, and then I'll compute what each yields by transitivity
Daniel Weber (Oct 02 2024 at 05:52):
Well, using 24x time, it managed to prove 1 more implication and disprove 2 more implications
Terence Tao (Oct 02 2024 at 05:54):
Perhaps moving in the other direction (shortening the timeout) is actually better? 0.1 seconds x 3 million implications is like 83 hours, though presumably implementing transitivity reduction every so often could chip away at that.
Terence Tao (Oct 02 2024 at 05:56):
Presumably the optimal strategy is some staggered approach where one first sets the timeout low to cut down the number of outstanding implications by a significant factor, then raises the timeout and starts again, though the math of optimization becomes quite complex and it may be better to work with rules of thumb and gut feeling after some point
Daniel Weber (Oct 02 2024 at 06:07):
Something interesting is that when running the prover in refutation mode it managed to refute a lot more implications, 428, although it took more time
Terence Tao (Oct 02 2024 at 06:14):
By the way, the open implications whose converse is already proven true are particularly high yield because if proven true they reduce the number of laws up to equivalence by one and potentially remove thousands of implications from the outstanding list. It might be worth brute forcing over those first (disabling refutations as these would not be high yield). EDIT: equational#188
Terence Tao (Oct 02 2024 at 06:18):
1000-495-428=77 is more than a 10x reduction, which would be fantastic
Daniel Weber (Oct 02 2024 at 06:28):
Before running this in a large scale it'll probably be good to create a custom schedule for vampire for these kinds of problems
Daniel Weber (Oct 02 2024 at 08:28):
I wonder if trying to create a simple neural network to guess whether a given implication is true or false would be good, as it can then direct the ATP
Daniel Weber (Oct 02 2024 at 10:46):
Daniel Weber said:
Before running this in a large scale it'll probably be good to create a custom schedule for vampire for these kinds of problems
I managed to configure it such that it can solve 948/1000 implications in 50 seconds, which I think is really good
Shreyas Srinivas (Oct 02 2024 at 10:52):
Daniel Weber said:
I want to run some automated theorem provers at a larger scale, but running it on all unknown implications is both impractical and a waste of resources - as implications are proven/refuted, they will imply many other currently unknown ones.
Optimally I think we want some API, which interactively gives an unknown implication, and then receives one of true, false or unknown, and then updates the closure based on that and again gives an new unknown implication, and so on and so forth. Perhaps it could generate a file withconjecture
s at the end.
This raises a technological problem: how do we implement this kind of interactive API?
This also raises an algorithmic problem, of what this API will do. It could run an incremental dynamic transitive closure algorithm (there are algorithms with amortized time per addition, which is great for our , compared to the time it will likely take to decide an implication), but I'm really not sure how one finds the "best" implication to consider. What measure should that even use?
Ah finally an algorithmic challenge. You want to check the literature for dynamic and local algorithms for reachability. Deletions are the challenging aspect, but insertions alone shouldn't be a challenge.
Daniel Weber (Oct 02 2024 at 10:52):
There shouldn't be any deletions - we only generate implications, don't we?
Shreyas Srinivas (Oct 02 2024 at 10:53):
Yeah then it shouldn't be tough. A union find structure should do the trick
Daniel Weber (Oct 02 2024 at 10:53):
This is a directed graph
Shreyas Srinivas (Oct 02 2024 at 10:54):
One neat trick to sidestep implementation complexity is to use a good SQL database, because their implementations have good dynamic algorithms for join
Shreyas Srinivas (Oct 02 2024 at 10:55):
Currently at a talk. Once I am on a machine, I will try to find a more concrete answer
Daniel Weber (Oct 02 2024 at 10:58):
Daniel Weber said:
Daniel Weber said:
Before running this in a large scale it'll probably be good to create a custom schedule for vampire for these kinds of problems
I managed to configure it such that it can solve 948/1000 implications in 50 seconds, which I think is really good
Apparently after using transitivity on this there are only 2640882 unknown implications remaining out of the previous 3209693, which I find a bit hard to believe :thinking:
Shreyas Srinivas (Oct 02 2024 at 11:01):
This paper : https://arxiv.org/abs/2002.00813
Do check its related version section which cites a number of papers on variants of this problem
Vlad Tsyrklevich (Oct 02 2024 at 11:03):
Daniel Weber said:
Daniel Weber said:
Daniel Weber said:
Before running this in a large scale it'll probably be good to create a custom schedule for vampire for these kinds of problems
I managed to configure it such that it can solve 948/1000 implications in 50 seconds, which I think is really good
Apparently after using transitivity on this there are only 2640882 unknown implications remaining out of the previous 3209693, which I find a bit hard to believe :thinking:
Easy to believe if it further fills out the Equation 2 equivalence class.
Daniel Weber (Oct 02 2024 at 15:17):
Daniel Weber said:
Daniel Weber said:
Before running this in a large scale it'll probably be good to create a custom schedule for vampire for these kinds of problems
I managed to configure it such that it can solve 948/1000 implications in 50 seconds, which I think is really good
I ran this iteratively and computed the closure every so often as suggested, and it's supposed to finish in a few hours — I think I should have conjecture
statements leaving only a few tens of thousands of implications completely unknown
Terence Tao (Oct 02 2024 at 15:23):
If we are down to say 50,000 remaining equations then there may be no further need for dynamic optimization of these tools. (Though once the big expensive AI tools come online, it may still be worth thinking about how to allocate their queries most efficiently. I guess we should basically throw all of our cheap tools at that remaining set first.)
Daniel Weber (Oct 02 2024 at 19:04):
Vlad Tsyrklevich (Oct 02 2024 at 19:24):
Cool! Are you working on converting these proofs to Lean? I'm asking to see if it's time to change tack on what I'm working on, or perhaps I simply misunderstand the results.
Joachim Breitner (Oct 02 2024 at 20:21):
For the non-implications found by vampire, is it feasible to extract the counter example magma from the vampire output?
Vlad Tsyrklevich (Oct 02 2024 at 21:13):
Also, I noticed just a few _not_implies_Equation1
conjectures in there, is that expected?
Daniel Weber (Oct 03 2024 at 00:51):
There's a bug there I just noticed, the equations numbers aren't being decremented by 1 before indexing to the array
Daniel Weber (Oct 03 2024 at 00:51):
I think this should be reverted for now and I'll rerun it
Terence Tao (Oct 03 2024 at 01:12):
Oof. Unfortunately the PR is too deep now to revert. Can you make a placeholder PR to remove the relevant files perhaps?
Terence Tao (Oct 03 2024 at 01:13):
In retrospect the condensation to 7999 implications was too good to be true... projections suggested 50K was more reasonable
Daniel Weber (Oct 03 2024 at 01:14):
Terence Tao (Oct 03 2024 at 01:21):
The good news is that this sort of issue isn't impacting the rest of the project significantly. But it's good to be reminded that errors can still abound outside of the Lean-verified space (and I guess in principle there could also still be some issues within Lean in case our interpretation of the Lean results are not quite right, though this seems extremely remote given the basic nature of our proof statements).
Nicholas Carlini (Oct 03 2024 at 01:27):
I very nearly did this exact thing with the off-by-one for my 4x4 tables and even committed it but caught it before I pushed... If someone knows how to make the CI break that would be helpful to catch things like this (raised in equational#217)
Daniel Weber (Oct 03 2024 at 01:28):
Nicholas Carlini said:
I very nearly did this exact thing with the off-by-one for my 4x4 tables and even committed it but caught it before I pushed... If someone knows how to make the CI break that would be helpful to catch things like this (raised in equational#217)
equational#218 makes extract_impications
error if there are inconsistent conjectures, but I now see that it errors even after removing vampire
Daniel Weber (Oct 03 2024 at 01:31):
It was, funnily enough, a different off-by-one. I fixed it
Daniel Weber (Oct 03 2024 at 01:37):
It's now claiming to prove 99% of these implications, but I'm a bit suspicious of it, perhaps I'll try it on a few implications whose status is proven as a sanity check
Terence Tao (Oct 03 2024 at 01:37):
One of my dreams with this project was to make research mathematics more closely resemble modern software engineering. I guess I should be careful what I wish for :joy:
Cody Roux (Oct 03 2024 at 02:18):
Most of software engineering is like working in a factory producing Mona Lisas
Daniel Weber (Oct 03 2024 at 02:47):
Daniel Weber said:
It's now claiming to prove 99% of these implications, but I'm a bit suspicious of it, perhaps I'll try it on a few implications whose status is proven as a sanity check
It seems to work correctly :thinking:
Daniel Weber (Oct 03 2024 at 13:50):
I rerun it for all implications, and I made equational#230 and it's consistent. However, someone should scrutinize it more before merging, I'm not all that confident about it now
Shreyas Srinivas (Oct 03 2024 at 14:29):
It is stuck on a merge conflict
Vlad Tsyrklevich (Oct 03 2024 at 14:45):
Daniel Weber said:
I rerun it for all implications, and I made equational#230 and it's consistent. However, someone should scrutinize it more before merging, I'm not all that confident about it now
If it helps, I ran equation search with your conjectures in the implications list for a couple of minutes and it did not hit a contradiction (yesterday when I tried it, it hit a contradiction very quickly)
Daniel Weber (Oct 03 2024 at 14:47):
I also managed to generate models for many of the non-implications, which also increases my confidence in them
Vlad Tsyrklevich (Oct 03 2024 at 14:49):
Ah yes I should also have specified, I actually explicitly did not check the conjectured non-implications as I don't have good tooling for this.
Daniel Weber (Oct 03 2024 at 14:54):
In some cases vampire outputs a model in a strange form. For Equation3994_not_implies_Equation3588
, for instance (Not (x ∘ y = (z ∘ (x ∘ y)) ∘ z -> x ∘ y = z ∘ ((x ∘ y) ∘ z))
), the model it outputs is:
% # SZS output start Saturation.
cnf(u12,axiom,
mul(X0,X1) = mul(mul(X2,X3),mul(mul(X0,X1),mul(X2,X3)))).
cnf(u9,axiom,
mul(X0,X1) = mul(mul(X2,mul(X0,X1)),X2)).
cnf(u10,negated_conjecture,
mul(sK0,sK1) != mul(sK2,mul(mul(sK0,sK1),sK2))).
% # SZS output end Saturation.
from what I'm reading from the documentation that's supposed to be "A Herbrand model expressed as a saturated set of formulae." but I'm not sure what that means, and there's no further explanation. Does anybody know how this describes a model?
Daniel Weber (Oct 03 2024 at 16:39):
I think it might mean I'm supposed to use something along the lines of this:
def MagmaOp : FreeMagma Nat → FreeMagma Nat → FreeMagma Nat
| x2 ⋆ (x0 ⋆ x1), x2 => MagmaOp x0 x1
| x, y => x ⋆ y
but lean warns that I'm not allowed to repeat a variable. Is there a good workaround for that?
Daniel Weber (Oct 03 2024 at 16:40):
Actually, no, that won't work anyway
Andrés Goens (Oct 03 2024 at 16:51):
Daniel Weber said:
I think it might mean I'm supposed to use something along the lines of this:
def MagmaOp : FreeMagma Nat → FreeMagma Nat → FreeMagma Nat | x2 ⋆ (x0 ⋆ x1), x2 => MagmaOp x0 x1 | x, y => x ⋆ y
but lean warns that I'm not allowed to repeat a variable. Is there a good workaround for that?
(FWIW in general, a workaround is using different variables and checking if they agree), so something like this works:
def MagmaOp : FreeMagma Nat → FreeMagma Nat → FreeMagma Nat
| x@(x3 ⋆ (x1 ⋆ x2)), y => if x3 = y then MagmaOp x1 x2 else x ⋆ y
| x, y => x ⋆ y
Daniel Weber (Oct 03 2024 at 17:04):
Daniel Weber said:
In some cases vampire outputs a model in a strange form. For
Equation3994_not_implies_Equation3588
, for instance (Not (x ∘ y = (z ∘ (x ∘ y)) ∘ z -> x ∘ y = z ∘ ((x ∘ y) ∘ z))
), the model it outputs is:% # SZS output start Saturation. cnf(u12,axiom, mul(X0,X1) = mul(mul(X2,X3),mul(mul(X0,X1),mul(X2,X3)))). cnf(u9,axiom, mul(X0,X1) = mul(mul(X2,mul(X0,X1)),X2)). cnf(u10,negated_conjecture, mul(sK0,sK1) != mul(sK2,mul(mul(sK0,sK1),sK2))). % # SZS output end Saturation.
from what I'm reading from the documentation that's supposed to be "A Herbrand model expressed as a saturated set of formulae." but I'm not sure what that means, and there's no further explanation. Does anybody know how this describes a model?
I think I managed to understand how this is supposed to be used --- we want to say that these (except u10
) are exactly what you can prove from Equation3994 (up to substituting values in X0, X1, ...), I think simply by checking all possible derivations, and therefore we can't prove 3588
.
Does this seem correct? Is there a good way to check all possible derivations? Is there a less meta way to make this argument?
Daniel Weber (Oct 03 2024 at 17:30):
No, I don't think this is correct either. I think the correct argument is some kind of formalizing of KB completion, and that seems really hard
Nicholas Carlini (Oct 04 2024 at 05:16):
I wrote a PR that does this dynamic detection of useful theorems. It's fairly fast if you don't need to re-run lake, ~1.5 seconds to initialize +2ms per time you want to get the next theorem to prove.
If you'd like to be able to update the graph in realtime and have it give you the next best one after you've tried one, that's a fairly easy change to make, if it would be useful. equational#270
Johan Commelin (Oct 04 2024 at 05:51):
You say it currently suggests looking at equations that might imply Eq2. According to https://nicholas.carlini.com/tmp/view.html all equations for which this is undecided are of the form x = f(x, y, ...)
. So the techniques from #Equational > Presentations of the form x = f(x,y,...) are simple might apply.
Johan Commelin (Oct 04 2024 at 05:54):
Aah, but none of those equations seem to have only two variables. So my idea in the other thread doesn't apply
Daniel Weber (Oct 04 2024 at 12:12):
I'm making some progress in Leanifying Vampire proofs:
theorem Equation262_implies_Equation4676 (G : Type*) [Magma G] (h : Equation262 G) : Equation4676 G := by
by_contra nh
simp only [not_forall] at nh
obtain ⟨sk0, sk1, sk2, sk3, sk4, nh⟩ := nh
have eq9 (X0 X1 X2 : G) : ((X0 ∘ X1) ∘ X0) ∘ X2 = X0 := (h ..).symm
have eq10 : (sk0 ∘ sk1) ∘ sk2 ≠ (sk0 ∘ sk3) ∘ sk4 := nh
have eq12 (X3 X4 X5 : G) : X3 ∘ X4 = X3 ∘ X5 := superpose eq9 eq9
have eq20 (X0 X1 X2 X3 : G) : ((X0 ∘ X1) ∘ X2) ∘ X3 = X0 := superpose eq12 eq9
have eq22 (X0 : G) : (sk0 ∘ sk1) ∘ sk2 ≠ (sk0 ∘ sk3) ∘ X0 := superpose eq12 eq10
have eq54 (X0 X1 X2 X4 X5 : G) : (X0 ∘ X1) ∘ X2 = (X0 ∘ X4) ∘ X5 := superpose eq20 eq20
have eq76 (X0 X1 : G) : (sk0 ∘ sk1) ∘ sk2 ≠ (sk0 ∘ X0) ∘ X1 := superpose eq12 eq22
subsumption eq76 eq54
theorem Equation1350_implies_Equation654 (G : Type*) [Magma G] (h : Equation1350 G) : Equation654 G := by
by_contra nh
simp only [not_forall] at nh
obtain ⟨sk0, sk1, sk2, nh⟩ := nh
have eq9 (X0 X1 X2 : G) : (X1 ∘ (((X2 ∘ X0) ∘ X0) ∘ X1)) = X0 := (h ..).symm
have eq10 : sk0 ≠ (sk0 ∘ (sk1 ∘ ((sk2 ∘ sk1) ∘ sk1))) := nh
have eq12 (X0 X1 X2 X3 : G) : (((X2 ∘ X3) ∘ X3) ∘ ((X0 ∘ X1) ∘ X1)) ∘ X3 = X1 := superpose eq9 eq9
have eq19 (X1 X2 : G) : X2 ∘ ((X1 ∘ X2) ∘ X2) = X2 := superpose eq12 eq12
have eq27 (X1 X3 : G) : X1 ∘ (X3 ∘ X1) = X1 := superpose eq12 eq19
have eq61 (X0 X2 : G) : X0 = X2 := superpose eq9 eq27
subsumption eq10 eq61
Daniel Weber (Oct 04 2024 at 12:21):
I also created some finite models at equational#276
Terence Tao (Oct 04 2024 at 14:55):
Johan Commelin said:
Aah, but none of those equations seem to have only two variables. So my idea in the other thread doesn't apply
I just presented some heuristics at #Equational > visualization of graph closure that suggest that the behavior of 1-variable equations, 2-variable equations, and >2-variable equations in the graph are in fact extremely different, and just proposed equational#278 to explore this. This may help improve our theoretical intuition as to what to expect in the different regimes.
Daniel Weber (Oct 04 2024 at 15:24):
I made equational#281 with a bunch of proofs created by Vampire
Vlad Tsyrklevich (Oct 04 2024 at 16:17):
After your change (checking if my local implications survived):
$ wc -l new_theorems_to_include.csv
0 new_theorems_to_include.csv
Last updated: May 02 2025 at 03:31 UTC