Zulip Chat Archive

Stream: Equational

Topic: Command-line Prover


Michael Bucko (Oct 23 2024 at 10:35):

How about turning our prover running experience into one command-line prover that could be used in our ml or algorithm development efforts?

One could then add their own tools to that list (let's say someone has a network analysis idea, or wants to aggressively test certain patterns assuming certain conditions, or maybe generate data and train a model).

Below is just a (rough) sketch, and I'd first make it work with Vampire (cross-platform), assuming that I can reliably run Vampire in the command line setting. The real tool would use the script we already have to turn Vampire output into Lean.

I'd need to understand what the main formats are (is TPTP recommended? can anything be turned into that format?). I guess the output would be Lean.

So, command line: TPTP file (?) -> Prover (picks a prover, picks a line, picks a strategy, picks a custom tool, runs in defined order, perhaps all that defined in a yaml file) -> Lean (?)

import argparse
import subprocess
import time
import sys
import os

def translate_implication(implication, tool):
    if tool in ['prover9', 'mace']:
        # No translation needed
        return implication
    elif tool == 'vampire':
        # Translate to TPTP syntax
        translation_table = {
            '->': '=>',
            '<->': '<=>',
            '-': '~',
            '&': '&',
            '|': '|'
        }
        for k, v in translation_table.items():
            implication = implication.replace(k, v)
        return implication
    else:
        return implication

def create_prover9_input(implication1, implication2, goal):
    implication1 = translate_implication(implication1, 'prover9')
    implication2 = translate_implication(implication2, 'prover9')
    goal = translate_implication(goal, 'prover9')
    input_content = 'formulas(assumptions).\n'
    input_content += f'  {implication1}.\n'
    input_content += f'  {implication2}.\n'
    input_content += 'end_of_list.\n\n'
    input_content += 'formulas(goals).\n'
    input_content += f'  {goal}.\n'
    input_content += 'end_of_list.\n'
    return input_content

def create_mace_input(implication1, implication2, goal):
    implication1 = translate_implication(implication1, 'mace')
    implication2 = translate_implication(implication2, 'mace')
    goal = translate_implication(goal, 'mace')
    input_content = 'formulas(assumptions).\n'
    input_content += f'  {implication1}.\n'
    input_content += f'  {implication2}.\n'
    input_content += f'  -({goal}).\n'  # Negate the goal
    input_content += 'end_of_list.\n'
    return input_content

def create_vampire_input(implication1, implication2, goal):
    implication1 = translate_implication(implication1, 'vampire')
    implication2 = translate_implication(implication2, 'vampire')
    goal = translate_implication(goal, 'vampire')
    input_content = ''
    input_content += f'fof(assumption1, axiom, {implication1}).\n'
    input_content += f'fof(assumption2, axiom, {implication2}).\n'
    input_content += f'fof(goal, conjecture, {goal}).\n'
    return input_content

def main():
    parser = argparse.ArgumentParser(description='Run a theorem prover with two implications.')
    parser.add_argument('implication1', type=str, help='First implication (premise)')
    parser.add_argument('implication2', type=str, help='Second implication (premise)')
    parser.add_argument('goal', type=str, help='Goal to prove (conclusion)')
    parser.add_argument('--tool', type=str, choices=['mace', 'prover9', 'vampire'], required=True, help='Tool to use')
    parser.add_argument('--timeout', type=int, default=None, help='Timeout in seconds')

    args = parser.parse_args()

    implication1 = args.implication1
    implication2 = args.implication2
    goal = args.goal
    tool = args.tool
    timeout = args.timeout

    # Create input file for the tool
    if tool == 'mace':
        input_file_content = create_mace_input(implication1, implication2, goal)
        tool_executable = 'mace4'
        input_filename = 'mace_input.in'
    elif tool == 'prover9':
        input_file_content = create_prover9_input(implication1, implication2, goal)
        tool_executable = 'prover9'
        input_filename = 'prover9_input.in'
    elif tool == 'vampire':
        input_file_content = create_vampire_input(implication1, implication2, goal)
        tool_executable = 'vampire'
        input_filename = 'vampire_input.p'
    else:
        print('Unknown tool.')
        sys.exit(1)

    # Write input file
    with open(input_filename, 'w') as f:
        f.write(input_file_content)

    # Run the tool
    try:
        start_time = time.time()
        if tool == 'vampire':
            cmd = [tool_executable, input_filename]
        else:
            cmd = [tool_executable, input_filename]
        process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        if timeout is not None:
            stdout, stderr = process.communicate(timeout=timeout)
        else:
            stdout, stderr = process.communicate()
        end_time = time.time()

        # Process output
        output = stdout.decode()
        if tool == 'prover9':
            if 'THEOREM PROVED' in output:
                print('Theorem was proved.')
            elif 'SEARCH FAILED' in output:
                print('Theorem was not proved.')
            else:
                print('Unknown result.')
        elif tool == 'mace':
            if 'Exiting with 1 model(s) found' in output:
                print('Model found. The assumptions are consistent, and the goal is not a logical consequence.')
            elif 'NO MODELS' in output:
                print('No model found. The goal is a logical consequence of the assumptions.')
            else:
                print('Unknown result.')
        elif tool == 'vampire':
            if 'Refutation found. Thanks to Tanya!' in output:
                print('Theorem was proved.')
            elif 'Satisfiable' in output:
                print('Theorem was not proved.')
            else:
                print('Unknown result.')
        else:
            print('Unknown tool output.')
        # Optionally, print the full output
        print('Output:')
        print(output)

    except subprocess.TimeoutExpired:
        process.kill()
        print(f'Tool timed out after {timeout} seconds.')

    finally:
        # Remove the input file
        if os.path.exists(input_filename):
            os.remove(input_filename)

if __name__ == '__main__':
    main()

Jose Brox (Oct 23 2024 at 12:29):

Michael Bucko ha dicho:

How about turning our prover running experience into one command-line prover that could be used in our ml or algorithm development efforts?

I cannot answer your question about formats, but having this tool would be amazing!

Michael Bucko (Oct 23 2024 at 12:36):

Jose Brox schrieb:

Michael Bucko ha dicho:

How about turning our prover running experience into one command-line prover that could be used in our ml or algorithm development efforts?

I cannot answer your question about formats, but having this tool would be amazing!

Thank you for this feedback.
It's also relatively "easy" to do once we :

  • dockerize the ATP experiences (or start from a platform),
  • run them in the command line,
  • have some tests in place (we need to see how each of these tools is used),
  • design a yaml that could describe our strategy behind the proof (there is a big question here: what's the role of AI in refining the strategy; my take on this is: eventually that of a sparing partner (but depends on the person and their level of expertise (and energy)), which means Cursor for Math, or stronger plugins are needed)

The good news is: the first version that supports tests, Vampire, strategies could be done quite fast. One dependency for me personally is that I'd love to make Vampire work for me on a macbook pro).

Philip Zucker (Oct 25 2024 at 14:28):

I have a toy version of a multitool ATP in my project, really only meant for my debugging purposes https://github.com/philzook58/knuckledragger/blob/main/kdrag/solvers/__main__.py It's not dockerized, probably linux only. I only know it installs on collab, my computer, and github CI. If there were nice builds of these tools, I'd happily consume them. The common interface is writing z3 python expressions and using one of these Solver classes https://github.com/philzook58/knuckledragger/blob/main/kdrag/solvers/__init__.py instead of z3py's . Is there some other common input or output that would be more useful?

Michael Bucko (Oct 25 2024 at 14:33):

That's pretty cool!
Do you think all of these provers are necessary, or perhaps some are more important than others?
I was thinking of going "vertical", i.e. doubling down on transformers + vampire + z3 (for instance, Vampire alone has issues on a macbook)

Philip Zucker (Oct 25 2024 at 14:34):

vampire + z3 seems pretty dominant.

Michael Bucko (Oct 25 2024 at 14:34):

I guess this kind of installer could be nice as a VSCode (or Cursor) plugin?

Michael Bucko (Oct 25 2024 at 14:35):

Philip Zucker schrieb:

vampire + z3 seems pretty dominant.

Yes, that was my thinking. If that's the case, then I'd probably go vertical.

Philip Zucker (Oct 25 2024 at 14:37):

I said that without thinking a little. For higher order theorems, eprover and zipperposition can work well. Twee has the best human readable proof output and can output completed equational rewrite systems. nanocopi is the only intuitionistic logic prover in my list. kissat is probably good for very combinatorial stuff. But for typical purposes vampire + z3 is dominant.

Michael Bucko (Oct 25 2024 at 14:39):

Philip Zucker schrieb:

I said that without thinking a little. For higher order theorems, eprover and zipperposition can work well. Twee has the best human readable proof output. nanocopi is the only intuitionistic logic prover. kissat is probably good for very combinatorial stuff. But for typical purposes vampire + z3 is dominant.

Then I'd go vertical + agentic. See:

https://buildermath.substack.com/p/on-math-platform

I was thinking of having equivalence graphs as storage, lineage, and formalized knowledge base (with layers of embeddings ofc):

https://buildermath.substack.com/p/on-implication-flow

But I am still learning what the perfect experiment would be. I am also taking the vertical approach to experiments -- I am trying to learn very deep insights first.

Michael Bucko (Oct 25 2024 at 15:19):

Just realized you made egg - love it !!!


Last updated: May 02 2025 at 03:31 UTC