Zulip Chat Archive

Stream: Equational

Topic: Using SAT solvers for model generation

Mario Carneiro (Oct 14 2024 at 18:21):

Have we tried sat encoding these, or is it all integer reasoning?

Terence Tao (Oct 14 2024 at 18:23):

We have some universal quantifiers though, so I'm not sure if we can reduce things to a pure SAT solver. (But I don't have a good feel for what the gap between SAT solving and SMT solving is.)

Mario Carneiro (Oct 14 2024 at 18:23):

My expectation is that a sat encoding is the most efficient way to e.g. exhaustively test all 5x5 magmas for a given property

Cody Roux (Oct 14 2024 at 18:23):

I dunno, I think someone mentioned bit-vectors at some point, and obviously people have considered

$(\mathbb{Z/2Z})^n$

Mario Carneiro (Oct 14 2024 at 18:24):

@Terence Tao The way universal quantifiers are handled is by just writing out all the cases. I know it sounds dumb, but it's surprisingly effective

Terence Tao (Oct 14 2024 at 18:25):

I see, you are specializing to a medium sized finite model. I could see that if our current methods get exhausted and we get stuck on a few holdout equations, then deploying SAT solvers to find medium sized finite models might be a good approach.

Mario Carneiro (Oct 14 2024 at 18:25):

in this case you would have n^3 variables saying a*b = c for all combinations a,b,c : Fin n, plus some constraints saying it's a function and the laws and non-laws are satisfied

Julian Berman (Oct 14 2024 at 18:27):

An extremely minor note on the wiki page for Tours of selected equations (unrelated to the current discussion but I can't see where to put it otherwise): https://github.com/teorth/equational_theories/wiki/Tour-of-selected-equations#equations-3588-x--y--z--x--y--z-3994-x--y--z--x--y--z-an-austin-pair uses "Austin pair" before defining it which happens in the next equation. I'd fix it but it looks like that requires collaborator access.

Cody Roux (Oct 14 2024 at 18:27):

I don't know that anyone's tried this, at any rate it would likely be in the scripts folder.

Terence Tao (Oct 14 2024 at 18:27):

The largest SAT solver success stories I know of involve ~1000 variables so naively I would imagine that one can handle magmas of size up to 10 or so this way, but for specific equations I could imagine there are clever tricks to extract better performance

Mario Carneiro (Oct 14 2024 at 18:28):

SAT solvers can handle millions of variables

Mario Carneiro (Oct 14 2024 at 18:28):

obviously not all problems are created equal, but you can get lucky and there are also lots of techniques to break up problems into easy chunks

Mario Carneiro (Oct 14 2024 at 18:30):

I think it's potentially feasible to apply the technique up to 10x10 tables, maybe higher depending on how hard the combinatorics of the law are

Cody Roux (Oct 14 2024 at 18:30):

Definitely worth a try, though then one would want to somehow extract the counter-model (I did it for Z3 counter-examples, and even that was a bit tedious)

Zoltan A. Kocsis (Z.A.K.) (Oct 14 2024 at 18:30):

Z3 was used for finite counterexamples, see the All4x4Tables directory.

Mario Carneiro (Oct 14 2024 at 18:31):

When the result is SAT it spits out a satisfying assignment, which can be directly read off as a multiplication table

Terence Tao (Oct 14 2024 at 18:31):

I think this is an excellent approach to try on any law that survives the assaults of our current techniques (in particular @Daniel Weber's automated testing of the greedy algorithm approach on the ~35 remaining equations).

Mario Carneiro (Oct 14 2024 at 18:31):

lucky for you we don't care about UNSAT (proofs can get huge)

Mario Carneiro (Oct 14 2024 at 18:33):

@Zoltan A. Kocsis (Z.A.K.) Do you know how long the 4x4 table computations took? That could give a good comparison

Zoltan A. Kocsis (Z.A.K.) (Oct 14 2024 at 18:34):

CPU-years, I believe. But @Nicholas Carlini is the one who can answer that.

Terence Tao (Oct 14 2024 at 18:36):

The number of inequivalent equations (particularly at the current state of progress) is massively smaller than the number of inequivalent 4 x 4 magmas, so I can imagine that a SAT solving approach that runs through each of the equations separately will be more efficient than the brute force searching of all 4 x 4 magmas. Certainly for 5 x 5 there should be no comparison, the SAT solving approach should be superior simply because the brute force approach is infeasible.

Mario Carneiro (Oct 14 2024 at 18:37):

right, the idea is to automate the sudoku solving approach you were talking about before

Terence Tao (Oct 14 2024 at 18:37):

In particular, this could be a good approach for a future project trying to work with pair implications EquationX \and EquationY \implies EquationZ.

Andrés Goens (Oct 14 2024 at 18:47):

Cody Roux said:

If Z3 counts as a model checker, then yes, if you mean, e.g. SPIN then no

right, I would not have counted Z3 as a model checker, but as an SMT solver (I don't normally think of BMC as model checking, although I know some people do, hence the confusion I guess)

Zoltan A. Kocsis (Z.A.K.) (Oct 14 2024 at 18:49):

@Mario Carneiro Of course one should try and deploy SAT solvers in as many capacities as possible (in this project, I used them to find minimal covers initially in All4x4Tables, but then heuristics became faster and more effective).

I don't expect off-the-shelf SMT solvers to outperform Vampire at finding small finite magmas, tbh. Maybe CVC5?

Let me propose an initial test case: how long does it take CVC5 or Z3 to find a smaller counterexample than my 13x13 one showing that 2531 does not imply 1075? If it takes long, then I doubt any law that survives the assaults of our current techniques will fall to them.

Mario Carneiro (Oct 14 2024 at 18:50):

Z3 isn't a sat solver, it's kind of silly to be using it as one

Mario Carneiro (Oct 14 2024 at 18:50):

it uses a sat solver but you could just call it directly

Siddharth Bhat (Oct 14 2024 at 18:50):

Cadical (a pure SAT solver) is also faster than Z3 on pure SAT problems.

Zoltan A. Kocsis (Z.A.K.) (Oct 14 2024 at 18:51):

Z3 isn't a sat solver, it's kind of silly to be using it as one

I never said it was.

Mario Carneiro (Oct 14 2024 at 18:51):

I'm trying my hand at setting this up. I don't know python, how do I make a line like from eznf import modeler work?

Vlad Tsyrklevich (Oct 14 2024 at 18:51):

Mario Carneiro said:

right, the idea is to automate the sudoku solving approach you were talking about before

My idea was in particular not necessarily to construct finite counterexamples (I am sure a computer can do that better than me), but to try to 'see' the constraint relationships between different squares and imagine how an infinite example may satisfy that. Of course I am not convinced that finite counterexamples don't exist, so it would be good to exhaust that option first

Mario Carneiro (Oct 14 2024 at 18:52):

is there a requirements.txt or something I need to add packages to?

Shreyas Srinivas (Oct 14 2024 at 18:52):

There is a PR doing this

Notification Bot (Oct 14 2024 at 18:52):

33 messages were moved here from #Equational > Thoughts and impressions thread by Mario Carneiro.

Shreyas Srinivas (Oct 14 2024 at 18:52):

For now put requirements.txt in gitignore and put your dependencies there

Andrés Goens (Oct 14 2024 at 18:53):

Mario Carneiro said:

is there a requirements.txt or something I need to add packages to?

there's loads of ways, but the easy way is just to pip install your dependency

Shreyas Srinivas (Oct 14 2024 at 18:54):

pip install complains about global installs these days

Shreyas Srinivas (Oct 14 2024 at 18:54):

Please work in a subfolder of scripts

Shreyas Srinivas (Oct 14 2024 at 18:54):

Put your requirements.txt inside

Mario Carneiro (Oct 14 2024 at 18:55):

I'm in a subfolder of scripts... and it does complain

Shreyas Srinivas (Oct 14 2024 at 18:55):

And then run python -m venv ./folder path

Shreyas Srinivas (Oct 14 2024 at 18:55):

Then you are inside a virtual env shell. Run pip3 install -r requirements.txt

Mario Carneiro (Oct 14 2024 at 18:55):

what are the last two arguments?

Mario Carneiro (Oct 14 2024 at 18:55):

and what goes in requirements.txt

Shreyas Srinivas (Oct 14 2024 at 18:56):

Your dependencies list

Mario Carneiro (Oct 14 2024 at 18:56):

so just eznf and nothing else?

Shreyas Srinivas (Oct 14 2024 at 18:56):

It should work unless there are sub dependencies

Shreyas Srinivas (Oct 14 2024 at 18:58):

venv sets up a custom local dev (mostly local python+pip+packages install + environment variables)

Mario Carneiro (Oct 14 2024 at 18:58):

from the folder I ran python -m venv . but it doesn't do anything

Mario Carneiro (Oct 14 2024 at 18:58):

or rather it creates files but doesn't create a shell

Mario Carneiro (Oct 14 2024 at 18:58):

and pip3 still seems to be global

Shreyas Srinivas (Oct 14 2024 at 18:59):

https://www.freecodecamp.org/news/how-to-setup-virtual-environments-in-python/

Andrés Goens (Oct 14 2024 at 18:59):

you normally want to source the activate script that it generates in the venv

Shreyas Srinivas (Oct 14 2024 at 18:59):

did you activate it?

Andrés Goens (Oct 14 2024 at 18:59):

so something like source ./bin/activateif it put them directly in your folder with the above

Mario Carneiro (Oct 14 2024 at 19:01):

ah yeah that worked

Mario Carneiro (Oct 14 2024 at 19:01):

should I not put it in .?

Julian Berman (Oct 14 2024 at 19:03):

(You should not. Call it venv or .venv or whatever, i.e. python3 -m venv venv if you want a directory alongside your source files in ..)

Mario Carneiro (Oct 14 2024 at 19:16):

vscode seems to want .venv at the workspace root

Mario Carneiro (Oct 14 2024 at 21:24):

Zoltan A. Kocsis (Z.A.K.) said:

Mario Carneiro Of course one should try and deploy SAT solvers in as many capacities as possible (in this project, I used them to find minimal covers initially in All4x4Tables, but then heuristics became faster and more effective).

I don't expect off-the-shelf SMT solvers to outperform Vampire at finding small finite magmas, tbh. Maybe CVC5?

Let me propose an initial test case: how long does it take CVC5 or Z3 to find a smaller counterexample than my 13x13 one showing that 2531 does not imply 1075? If it takes long, then I doubt any law that survives the assaults of our current techniques will fall to them.

I've got a working version of the cadical solver for this problem. As expected, this example is not something you can do off the shelf, and it looks like you didn't solve this one off the shelf with Z3 either but did a lot of finessing with additional laws. This technique is also amenable to adding additional laws to restrict the search space, but it starts to take more than a minute around size 8 on this example.

Terence Tao (Oct 14 2024 at 21:51):

This is a useful data point to have.

Mario Carneiro (Oct 14 2024 at 21:53):

equational#583 in case someone wants to push it further than I have the patience for

Mario Carneiro (Oct 14 2024 at 21:54):

I can confirm at least that none of the unknowns have models up to size 5, but I'm guessing we already knew that

Cody Roux (Oct 15 2024 at 00:38):

Nice!

Bartosz Podkanowicz (Oct 15 2024 at 05:54):

I can propose testing Minizinc (website: https://www.minizinc.org/). Minizinc is modelling language which is supported by a range of different solvers.
(SAT, mixed integer programming solvers and others).
I have created simple example code in Minizinc.
For example in case of searching for model that shows 2531 ! => 1075,
solver Chuffed 0.13.1 is able to check all models of size 7 in few seconds.
Below is my code in minizinc for a case 2531 ! => 1075 and models of size 7.
2531and1075.mzn

Daniel Weber (Oct 15 2024 at 05:55):

I used Vampire for finite model finding, it found a few magmas (https://github.com/teorth/equational_theories/blob/main/equational_theories/Generated/VampireProven/Disproofs1.lean, https://github.com/teorth/equational_theories/blob/main/equational_theories/Generated/VampireProven/Disproofs2.lean, https://github.com/teorth/equational_theories/blob/main/equational_theories/Generated/All4x4Tables/data/vampire-generated.txt)

Michael Bucko (Oct 15 2024 at 16:21):

Bartosz Podkanowicz schrieb:

I can propose testing Minizinc (website: https://www.minizinc.org/). Minizinc is modelling language which is supported by a range of different solvers.
(SAT, mixed integer programming solvers and others).
I have created simple example code in Minizinc.
For example in case of searching for model that shows 2531 ! => 1075,
solver Chuffed 0.13.1 is able to check all models of size 7 in few seconds.
Below is my code in minizinc for a case 2531 ! => 1075 and models of size 7.
2531and1075.mzn

Thank you for sharing. Experimenting with Minizinc now.

Michael Bucko (Oct 15 2024 at 18:17):

@Bartosz Podkanowicz Wrote my first minizinc program today (for rhs":"Equation1692","lhs":"Equation63"), and want to use it as a boilerplate for an agent. Does it look okay, or I am doing something wrong? I'd appreciate your feedback.

int: N = 2;  % Adjust N as needed
set of int: Domain = 0..N-1;

var Domain: x;
var Domain: y;

var Domain: t1 = (y + x) mod N;
var Domain: t2 = (t1 + y) mod N;
var Domain: lhs = (t1 + t2) mod N;

var Domain: t3 = (x + y) mod N;
var Domain: t4 = (x + t3) mod N;
var Domain: rhs = (y + t4) mod N;

constraint
    (lhs = x) /\ (rhs != x);

solve satisfy;

output ["Counterexample found: x = ", show(x), ", y = ", show(y)];

Bartosz Podkanowicz (Oct 15 2024 at 18:46):

Michael Bucko said:

Bartosz Podkanowicz Wrote my first minizinc program today (for rhs":"Equation1692","lhs":"Equation63"), and want to use it as a boilerplate for an agent. Does it look okay, or I am doing something wrong? I'd appreciate your feedback.

int: N = 2;  % Adjust N as needed
set of int: Domain = 0..N-1;

var Domain: x;
var Domain: y;

var Domain: t1 = (y + x) mod N;
var Domain: t2 = (t1 + y) mod N;
var Domain: lhs = (t1 + t2) mod N;

var Domain: t3 = (x + y) mod N;
var Domain: t4 = (x + t3) mod N;
var Domain: rhs = (y + t4) mod N;

constraint
    (lhs = x) /\ (rhs != x);

solve satisfy;

output ["Counterexample found: x = ", show(x), ", y = ", show(y)];

@Michael Bucko

I think you wanted to write '(for rhs":"Equation63","lhs":"Equation1692")' (swapped)
Minizinc searches for any evaluation of variables that satisfies all constraints.
You need to add constraint that for all x, y equation corresponding to lhs holds.
(in current code it will hold for exact values of x, y, but not for all values)

Example of modification:

int: N = 2;  % Adjust N as needed
set of int: Domain = 0..N-1;
array[Domain, Domain] of var Domain: op;


predicate op_constraint() =
  forall(x in Domain, y in Domain) (
     op[x,y] = (x+y) mod N
  );
constraint op_constraint();

predicate equation1692() =
  forall(x in Domain, y in Domain) (
    x = op[op[y, x], op[op[y,x], y]]
  );
constraint equation1692();

var Domain: x;
var Domain: y;

var Domain: t3 = (x + y) mod N;
var Domain: t4 = (x + t3) mod N;
var Domain: rhs = (y + t4) mod N;

constraint (rhs != x);

solve satisfy;

output ["Counterexample found: x = ", show(x), ", y = ", show(y)];

Michael Bucko (Oct 15 2024 at 19:14):

@Bartosz Podkanowicz Thank you. That's very kind of you.

Michael Bucko (Oct 16 2024 at 05:18):

@Bartosz Podkanowicz If that's really 63 vs 1692, the result I'm getting is "unsatisfiable" (which I guess means that no possible assignment of values to the variables that can satisfy all the constraints simultaneously). In this case (since the constraint is (lhs = x) /\ (rhs != x); , it means that it simply could not find a counterexample due to its search capabilities, but it does not mean that a counterexample does not exist, right?

Bartosz Podkanowicz (Oct 16 2024 at 08:45):

@Michael Bucko (which I guess means that no possible assignment of values to the variables that can satisfy all the constraints simultaneously) - yes.
In our case the code doesn't represent all counterexamples.
There is constraint on the size of counterexample (size of N) and constraint on which types of operations are considered.
There could be counterexample with different number of elements.

Bernhard Reinke (Oct 19 2024 at 07:24):

I am trying to use vampire to help construct an interesting model for 2054. I am confused by the output of vampire, it looks like

% SZS status CounterSatisfiable for 2054.p
% # SZS output start Saturation.
tff(u36863,axiom,
    (![X1] : ((m(X1,X1) != X1)))).

[...]
% # SZS output end Saturation.
% ------------------------------
% Version: Vampire 4.5.1 (commit 57a6f78c on 2020-07-15 11:59:04 +0200)
% Termination reason: Satisfiable

% Memory used [KB]: 27888
% Time elapsed: 0.882 s
% ------------------------------
% ------------------------------

Does this mean vampire has found a model or not?

Daniel Weber (Oct 19 2024 at 07:25):

It mean it found a model, but it's infinite. I think it's basically a confluent rewriting system, but I'm not sure about the details

Eric Taucher (Oct 19 2024 at 10:55):

Bernhard Reinke said:

the output of vampire

I asked OpenAI o1-mini model about the output as I too do not know what it means.

If you have ChatGPT Plus the link to the shared conversation should work.
https://chatgpt.com/share/67138efd-a6d8-8013-8a25-c499a877b7bb

For others here is the conversation

Prompt

Please explain this Vampire output

% SZS status CounterSatisfiable for 2054.p
% # SZS output start Saturation.
tff(u36863,axiom,
    (![X1] : ((m(X1,X1) != X1)))).

[...]
% # SZS output end Saturation.
% ------------------------------
% Version: Vampire 4.5.1 (commit 57a6f78c on 2020-07-15 11:59:04 +0200)
% Termination reason: Satisfiable



% Memory used [KB]: 27888
% Time elapsed: 0.882 s
% ------------------------------
% ------------------------------

Thinking

Analyzing output details

I’m digging into Vampire's output, noting SZS status as "CounterSatisfiable" and examining generated saturated clauses within the Saturation block to understand the provability of the input formula.

Understanding Vampire's task

I'm thinking through the Vampire theorem prover's capabilities, specifically analyzing the satisfiability of a problem file.

Reply

Certainly! Let's break down and explain each part of the Vampire output you've provided. Vampire is a highly efficient first-order theorem prover used in automated reasoning and formal verification. The output you've shared provides detailed information about how Vampire processed a specific problem (in this case, 2054.p). Here's a step-by-step explanation:

1. SZS Status Line

% SZS status CounterSatisfiable for 2054.p

SZS Status: SZS is a standard ontology for classifying the results of automated reasoning tasks. It helps in uniformly describing outcomes across different theorem provers.
CounterSatisfiable: This status indicates that the problem provided (2054.p) is satisfiable, meaning there exists at least one interpretation (or model) where the given statements are true. In other words, the conjecture cannot be universally proven false; instead, a counterexample exists that satisfies the premises but not the conclusion.

2. Saturation Block

% # SZS output start Saturation.
tff(u36863,axiom,
    (![X1] : ((m(X1,X1) != X1)))).

[...]
% # SZS output end Saturation.

Saturation: Vampire uses a technique called saturation to process logical clauses. Saturation involves exhaustively applying inference rules to derive all possible consequences from the given axioms and conjectures until no new information can be generated.
tff Syntax: This stands for Typed First-order Form, part of the TPTP (Thousands of Problems for Theorem Provers) language used to represent logical formulas in a standardized way.
- u36863: This is a unique identifier for the axiom or clause.
- axiom: Indicates that the following statement is an axiom (a foundational assumption).
- ![X1] : ((m(X1,X1) != X1)):
  - ! denotes universal quantification (for all).
  - [X1] specifies the variable being quantified.
  - m(X1,X1) != X1 states that for all X1, the expression m(X1, X1) is not equal to X1. This is part of the logical statements Vampire is processing.
[...]: Indicates that there are additional lines or clauses in the saturation process that have been omitted for brevity.

3. Termination Details

% ------------------------------
% Version: Vampire 4.5.1 (commit 57a6f78c on 2020-07-15 11:59:04 +0200)
% Termination reason: Satisfiable

Version Information: Specifies the version of Vampire used (4.5.1) along with the specific commit identifier and the date of that commit. This is useful for reproducibility and understanding the exact features or bug fixes present during the run.
Termination Reason: Indicates why Vampire stopped processing. In this case, it terminated because the problem was determined to be satisfiable. This means Vampire found a model where the axioms hold true, and no contradiction was derived.

4. Resource Usage

% Memory used [KB]: 27888
% Time elapsed: 0.882 s
% ------------------------------
% ------------------------------

Memory Used: Vampire utilized 27,888 KB (approximately 27.3 MB) of memory during its execution. This gives an idea of the computational resources required to process the problem.
Time Elapsed: The entire processing took 0.882 seconds. This showcases Vampire's efficiency in handling the given problem.

Summary of the Output

Problem Identification: Vampire processed the problem file 2054.p.
Result: The SZS status CounterSatisfiable indicates that the problem is satisfiable. Essentially, Vampire did not find a contradiction in the given axioms and conjectures; instead, it found a model where all statements hold true.
Saturation Process: Vampire applied its saturation-based reasoning to exhaustively process the logical clauses, represented here partially by the tff syntax.
Termination: The reasoning process concluded because it established satisfiability, not because it reached a resource limit or encountered an error.
Performance Metrics: The entire operation was completed swiftly (under a second) and with modest memory usage.

Implications

For Theorem Proving: A CounterSatisfiable status often means that the conjecture posed in the problem is not universally valid; there's at least one scenario where the premises hold, but the conclusion does not. This is valuable for identifying limitations or exceptions within formal systems.
Next Steps: If your goal was to prove the conjecture, this result suggests revisiting the axioms or considering additional constraints to either strengthen the conjecture or better understand the countermodel provided by Vampire.

I hope this breakdown clarifies the Vampire output for you! If you have specific questions about any part of the output or need further details on how to interpret or act on these results, feel free to ask.

Last updated: May 02 2025 at 03:31 UTC