Zulip Chat Archive

Stream: batteries

Topic: SAT core defs in Std?


James Gallicchio (Jan 03 2023 at 10:14):

Thoughts on putting the core definitions for SAT (CNF formulas, satisfiability, etc.) in Std?

It's a bit outside the scope. I personally would prefer to put it in a separate library, but I'd like us to settle on one set of core defs for all SAT-related projects, for easy interop.

A non-exhaustive list of (not nec. equivalent) core SAT definitions from different Lean4 projects:
My encoding library: https://github.com/JamesGallicchio/eternity2/blob/main/lean/Eternity2/SATSolve/EncCNF.lean
Mario's LRAT checker: https://github.com/leanprover-community/mathlib4/blob/master/Mathlib/Tactic/Sat/FromLRAT.lean
Joe's LRAT checker: https://github.com/joehendrix/lean-sat-checker/blob/main/SatChecker/Core.lean
Siddhartha's pure Lean SAT solver: https://github.com/siddhartha-gadgil/Saturn/blob/main/Saturn/Core.lean
Wojciech's (abandoned?) SAT core library: https://github.com/Vtec234/lean-sat/blob/main/LeanSat/Data/Cnf.lean

Also some lean3 projects
Cayden's encoding library: https://github.com/ccodel/verified-encodings/blob/master/src/cnf/clause.lean
Ryan Greenblatt's verified DPLL solver: https://github.com/rgreenblatt/verified_sat/blob/master/src/types.lean

James Gallicchio (Jan 03 2023 at 10:19):

The main outlier I see is Cayden's, where the CNF formulas are parametric on the underlying variable type (which is Nat in all other instances). I think it's justified in the encoding library, but unnecessarily general for the other projects.

James Gallicchio (Jan 03 2023 at 10:23):

I also don't want to duplicate work with https://github.com/ufmg-smite/lean-smt, but I don't see any CNF-specific stuff there.

James Gallicchio (Jan 03 2023 at 10:24):

Anyways, if these defs don't fit in Std, I'll put together a separate repo for it.

πš πš˜πš“πšŒπš’πšŽπšŒπš‘ πš—πšŠπš πš›πš˜πšŒπš”πš’ (Jan 03 2023 at 16:20):

I think that an obstacle to standardization is that depending on what you want to use the CNF for, there are different requirements on the representation. For example List Clause is okay if you only ever need to fold the whole thing but would be absurdly inefficient for a SAT solver. So realistically the library would need to include something like PersistentCnf and Cnf, like List vs Array. But even then, what if you need a clause-database type representation which isn't an Array but some more specific data structure? I think the one thing we could all agree on is the mathematical model which would just be some inductive datatype of propositional formulas :)

πš πš˜πš“πšŒπš’πšŽπšŒπš‘ πš—πšŠπš πš›πš˜πšŒπš”πš’ (Jan 03 2023 at 16:25):

James Gallicchio said:

The main outlier I see is Cayden's, where the CNF formulas are parametric on the underlying variable type (which is Nat in all other instances). I think it's justified in the encoding library, but unnecessarily general for the other projects.

If you want to emit efficient code, it seems the thing to do is parameterize over the literal type e.g. so that it can be UInt64 and the first bit denotes the polarity. Parameterizing over the variable type forces a representation Literal Ξ½ which afaik Lean will represent as boxed which is a lot of pointer dereferencing when these are accessed often.

James Gallicchio (Jan 03 2023 at 18:07):

I would mainly want to agree on a mathematical model, but preferably one that is also reasonably efficient for any sort of program interoperation between projects.

James Gallicchio (Jan 03 2023 at 18:08):

Or, maybe if we get a good collections interface in Std, we can parameterize the CNF type by the collection types...

Mario Carneiro (Jan 03 2023 at 21:25):

I agree with Wojciech, this is an area where you can't really standardize the data representation because that depends a lot on what the solver is doing or how it works. The only thing I would expect could be shared are things like List.all for defining the model

James Gallicchio (Jan 04 2023 at 07:38):

I don't think Lean would be viable for writing high-performance SAT solvers at the moment, so I wasn't really thinking about that as an application. I'm moreso looking for us to maintain something akin to PySAT which is a library for encoding to CNF and interfacing with external SAT solvers.

Maybe for now I just work on building out the PySAT equivalent library, and then we can see if it's a good place for mathlib's LRAT verifier and Cayden's verified encodings to be housed.

James Gallicchio (Jan 04 2023 at 07:39):

(And any other SAT-adjacent projects that we want to maintain longterm)

Kevin Buzzard (Jan 04 2023 at 08:50):

I know nothing about programming but out of interest why do you not think lean would be viable for writing high-performance SAT solvers?

Reid Barton (Jan 04 2023 at 08:52):

Maybe what we want is a type class that explains how to decode some type X into a SAT instance (with variables of type a)

Mario Carneiro (Jan 04 2023 at 08:55):

that does sound more promising, although I still wonder what kind of common interface we can give to the different SAT-like types

Mario Carneiro (Jan 04 2023 at 08:59):

Kevin Buzzard said:

I know nothing about programming but out of interest why do you not think lean would be viable for writing high-performance SAT solvers?

We can probably get decently close (i.e. within an order of magnitude) to high performance SAT solvers with lean code if the compiler doesn't fail us, but to get the last bit of performance requires more control over data layout than we have any intention of providing in pure lean code. In this regime the next best thing is to link to a C function or library and axiomatize its behavior using implemented_by; you could still get pretty close to full verification that way if the model is fine-grained enough.

Notification Bot (Jan 04 2023 at 20:34):

35 messages were moved from this topic to #lean4 dev > Lean LLVM/MLIR by Sebastian Ullrich.

James Gallicchio (Jan 15 2023 at 12:41):

Okay, first commit of LeanSAT has dropped :) it's currently just what I had already written for another project, but it is already pretty nice to build encodings with

James Gallicchio (Jan 15 2023 at 12:42):

I currently plan to add docs, more examples, and an IPASIR interface with (hopefully) more solvers.

James Gallicchio (Jan 15 2023 at 12:45):

And then I'll probably go in the direction of Reid's suggestion and add a SAT encodable/decodable typeclass

James Gallicchio (Jan 15 2023 at 12:46):

LMK if there's something else you'd like to see in it!

Mario Carneiro (Jan 15 2023 at 12:47):

a test suite / CI would be nice

James Gallicchio (Jun 26 2023 at 00:50):

James Gallicchio said:

The main outlier I see is Cayden's, where the CNF formulas are parametric on the underlying variable type (which is Nat in all other instances). I think it's justified in the encoding library, but unnecessarily general for the other projects.

I've been steadily improving the library and building more encodings, and I am becoming convinced that Cayden is totally correct that CNFs should be parametrized on the variable type, in particular so that you can differentiate temporary/autogenerated variables from meaningful variables. As it turns out I've encountered many of the same issues as I did when I was trying to formalize register machines :) we're composing graphs, essentially, and having the types reflect that is helpful.

James Gallicchio (Jun 26 2023 at 00:53):

I also now thoroughly believe I want the CNF type to just be parametric on collection type-- it is very helpful to have a single shared type for shuffling CNFs around various tools, but I am starting to hit the limitations of using List, and I never do any operations other than cons/append and fold...

Mac Malone (Jun 28 2023 at 02:15):

Mario Carneiro said:

to get the last bit of performance requires more control over data layout than we have any intention of providing in pure lean code

What makes you think this? I know that in the compiler discussions there was a lot of consideration of features that would give users much more fine grain control of data layout. So, I definitely could see pure Lean reaching this point eventually.

Mario Carneiro (Jun 28 2023 at 02:34):

For instance, I don't see lean losing the refcounts any time in the foreseeable future

Mario Carneiro (Jun 28 2023 at 02:34):

and that adds a bunch of boxing overhead to every abstraction

James Gallicchio (Jun 28 2023 at 03:13):

Mario Carneiro said:

For instance, I don't see lean losing the refcounts any time in the foreseeable future

I thought this was one of the things on the long term roadmap for linear types...

James Gallicchio (Jun 28 2023 at 03:17):

Regardless, I don't currently have and don't plan to have concrete implementations of special data structures in LeanSAT, just interfaces to common formats and such

Tobias Grosser (Jun 29 2023 at 07:53):

I certainly would be interested in learning if/how we can get more performance out of lean.

Tobias Grosser (Jun 29 2023 at 07:54):

Probably not 100% related, but @Siddharth Bhat and @Henrik BΓΆving work on an LLVM backend for Lean plus a codegen_by extension that allows LLVM-IR to be used to implement lean function.

Tobias Grosser (Jun 29 2023 at 07:54):

I would expect that this could offer fast SIMD code for certain data structures.


Last updated: May 02 2025 at 03:31 UTC