Zulip Chat Archive
Stream: Equational
Topic: Eval metrics for agentic proof generation
Michael Bucko (Nov 05 2024 at 13:53):
This question is inspired by @Bernhard Reinke 's script from today. In problems like this, it's very easy to make a typo or forget something.
Wouldn't it be a good idea to simply generate this code based on insights and then continuously run it given a set of metrics and tests:
- metrics: runtime, proof type, category, some sort of mle, and so on
- tdd: edge conditions, some examples
We'd just need to collect a bunch of tptp scripts from this project, and start small. And then run it a number of times. It'd save us potentially a lot of time in the long run.
fof(comm_1, axiom,
m00(X,Y) = m00(Y,X) &
m01(X,Y) = m10(Y,X) &
m02(X,Y) = m20(Y,X) &
m03(X,Y) = m30(Y,X) &
m11(X,Y) = m11(Y,X) &
m12(X,Y) = m21(Y,X) &
m22(X,Y) = m22(Y,X) &
m23(X,Y) = m32(Y,X) &
m33(X,Y) = m33(Y,X)
).
fof(auto_2, axiom,
m01(X,Y) = m03(X,Y) &
m10(X,Y) = m30(X,Y) &
m11(X,Y) = m33(X,Y) &
m12(X,Y) = m32(X,Y) &
m13(X,Y) = m31(X,Y) &
m21(X,Y) = m23(X,Y)
).
fof(aeven_3, axiom,
m00(m00(Y,X),Y) = X &
m10(m01(Y,X),Y) = X &
m20(m02(Y,X),Y) = X &
m30(m03(Y,X),Y) = X &
m22(m20(Y,X),Y) = X &
m32(m21(Y,X),Y) = X &
m02(m22(Y,X),Y) = X &
m12(m23(Y,X),Y) = X
).
fof(eunital_4, axiom,
m00(e,X) = X &
m01(e,X) = X &
m02(e,X) = X &
m03(e,X) = X &
m00(X,e) = X &
m10(X,e) = X &
m20(X,e) = X &
m30(X,e) = X
).
fof(square_triv_5, axiom,
m00(X,X) = e &
m13(X,X) = e &
m22(X,X) = e &
m31(X,X) = e
).
fof(idem_6, axiom,
m11(X,X) = X &
m12(X,X) = X &
m21(X,X) = X &
m23(X,X) = X &
m33(X,X) = X
).
fof(middle_bij_7, axiom,
m22(e,X) = X &
m20(e,X) = X &
m22(X,e) = X &
m02(X,e) = X
).
fof(bodd_8, axiom,
m02(X,Y) = m11(Y,m01(X,Y)) &
m12(X,Y) = m12(Y,m11(X,Y)) &
m22(X,Y) = m13(Y,m21(X,Y)) &
m32(X,Y) = m10(Y,m31(X,Y)) &
m02(X,Y) = m31(Y,m03(X,Y)) &
m12(X,Y) = m32(Y,m13(X,Y)) &
m22(X,Y) = m33(Y,m23(X,Y)) &
m32(X,Y) = m30(Y,m33(X,Y))
).
fof(eq1323, axiom,
X = m00(Y, m00(m00(m00(Y, Y), X), Y)) &
X = m13(Y, m21(m20(m11(Y, Y), X), Y)) &
X = m22(Y, m02(m00(m22(Y, Y), X), Y)) &
X = m31(Y, m23(m20(m33(Y, Y), X), Y)) &
X = m01(Y, m10(m01(m00(Y, Y), X), Y)) &
X = m10(Y, m31(m21(m11(Y, Y), X), Y)) &
X = m23(Y, m12(m01(m22(Y, Y), X), Y)) &
X = m32(Y, m33(m21(m33(Y, Y), X), Y)) &
X = m02(Y, m20(m02(m00(Y, Y), X), Y)) &
X = m11(Y, m01(m22(m11(Y, Y), X), Y)) &
X = m20(Y, m22(m02(m22(Y, Y), X), Y)) &
X = m33(Y, m03(m22(m33(Y, Y), X), Y)) &
X = m03(Y, m30(m03(m00(Y, Y), X), Y)) &
X = m12(Y, m11(m23(m11(Y, Y), X), Y)) &
X = m21(Y, m32(m03(m22(Y, Y), X), Y)) &
X = m30(Y, m13(m23(m33(Y, Y), X), Y))
).
fof(eq2744, conjecture,
X = m00(m00(m00(Y, Y), m00(Y, X)), Y) &
X = m31(m21(m11(Y, Y), m10(Y, X)), Y) &
X = m22(m02(m22(Y, Y), m20(Y, X)), Y) &
X = m13(m23(m33(Y, Y), m30(Y, X)), Y) &
X = m10(m01(m00(Y, Y), m01(Y, X)), Y) &
X = m01(m22(m11(Y, Y), m11(Y, X)), Y) &
X = m32(m03(m22(Y, Y), m21(Y, X)), Y) &
X = m23(m20(m33(Y, Y), m31(Y, X)), Y) &
X = m20(m02(m00(Y, Y), m02(Y, X)), Y) &
X = m11(m23(m11(Y, Y), m12(Y, X)), Y) &
X = m02(m00(m22(Y, Y), m22(Y, X)), Y) &
X = m33(m21(m33(Y, Y), m32(Y, X)), Y) &
X = m30(m03(m00(Y, Y), m03(Y, X)), Y) &
X = m21(m20(m11(Y, Y), m13(Y, X)), Y) &
X = m12(m01(m22(Y, Y), m23(Y, X)), Y) &
X = m03(m22(m33(Y, Y), m33(Y, X)), Y)
).
Michael Bucko (Nov 05 2024 at 14:18):
Based on the feedback from @Philip Zucker , I'm looking into LeanDojo. Perhaps some of this is already doable.
Last updated: May 02 2025 at 03:31 UTC