Zulip Chat Archive

Stream: Equational

Topic: Eval metrics for agentic proof generation


Michael Bucko (Nov 05 2024 at 13:53):

This question is inspired by @Bernhard Reinke 's script from today. In problems like this, it's very easy to make a typo or forget something.

Wouldn't it be a good idea to simply generate this code based on insights and then continuously run it given a set of metrics and tests:

  • metrics: runtime, proof type, category, some sort of mle, and so on
  • tdd: edge conditions, some examples

We'd just need to collect a bunch of tptp scripts from this project, and start small. And then run it a number of times. It'd save us potentially a lot of time in the long run.

fof(comm_1, axiom,
  m00(X,Y) = m00(Y,X) &
  m01(X,Y) = m10(Y,X) &
  m02(X,Y) = m20(Y,X) &
  m03(X,Y) = m30(Y,X) &
  m11(X,Y) = m11(Y,X) &
  m12(X,Y) = m21(Y,X) &
  m22(X,Y) = m22(Y,X) &
  m23(X,Y) = m32(Y,X) &
  m33(X,Y) = m33(Y,X)
).

fof(auto_2, axiom,
  m01(X,Y) = m03(X,Y) &
  m10(X,Y) = m30(X,Y) &
  m11(X,Y) = m33(X,Y) &
  m12(X,Y) = m32(X,Y) &
  m13(X,Y) = m31(X,Y) &
  m21(X,Y) = m23(X,Y)
).

fof(aeven_3, axiom,
  m00(m00(Y,X),Y) = X &
  m10(m01(Y,X),Y) = X &
  m20(m02(Y,X),Y) = X &
  m30(m03(Y,X),Y) = X &
  m22(m20(Y,X),Y) = X &
  m32(m21(Y,X),Y) = X &
  m02(m22(Y,X),Y) = X &
  m12(m23(Y,X),Y) = X
).

fof(eunital_4, axiom,
  m00(e,X) = X &
  m01(e,X) = X &
  m02(e,X) = X &
  m03(e,X) = X &
  m00(X,e) = X &
  m10(X,e) = X &
  m20(X,e) = X &
  m30(X,e) = X
).

fof(square_triv_5, axiom,
  m00(X,X) = e &
  m13(X,X) = e &
  m22(X,X) = e &
  m31(X,X) = e
).

fof(idem_6, axiom,
  m11(X,X) = X &
  m12(X,X) = X &
  m21(X,X) = X &
  m23(X,X) = X &
  m33(X,X) = X
).

fof(middle_bij_7, axiom,
  m22(e,X) = X &
  m20(e,X) = X &
  m22(X,e) = X &
  m02(X,e) = X
).

fof(bodd_8, axiom,
  m02(X,Y) = m11(Y,m01(X,Y)) &
  m12(X,Y) = m12(Y,m11(X,Y)) &
  m22(X,Y) = m13(Y,m21(X,Y)) &
  m32(X,Y) = m10(Y,m31(X,Y)) &
  m02(X,Y) = m31(Y,m03(X,Y)) &
  m12(X,Y) = m32(Y,m13(X,Y)) &
  m22(X,Y) = m33(Y,m23(X,Y)) &
  m32(X,Y) = m30(Y,m33(X,Y))
).

fof(eq1323, axiom,
  X = m00(Y, m00(m00(m00(Y, Y), X), Y)) &
  X = m13(Y, m21(m20(m11(Y, Y), X), Y)) &
  X = m22(Y, m02(m00(m22(Y, Y), X), Y)) &
  X = m31(Y, m23(m20(m33(Y, Y), X), Y)) &
  X = m01(Y, m10(m01(m00(Y, Y), X), Y)) &
  X = m10(Y, m31(m21(m11(Y, Y), X), Y)) &
  X = m23(Y, m12(m01(m22(Y, Y), X), Y)) &
  X = m32(Y, m33(m21(m33(Y, Y), X), Y)) &
  X = m02(Y, m20(m02(m00(Y, Y), X), Y)) &
  X = m11(Y, m01(m22(m11(Y, Y), X), Y)) &
  X = m20(Y, m22(m02(m22(Y, Y), X), Y)) &
  X = m33(Y, m03(m22(m33(Y, Y), X), Y)) &
  X = m03(Y, m30(m03(m00(Y, Y), X), Y)) &
  X = m12(Y, m11(m23(m11(Y, Y), X), Y)) &
  X = m21(Y, m32(m03(m22(Y, Y), X), Y)) &
  X = m30(Y, m13(m23(m33(Y, Y), X), Y))
).

fof(eq2744, conjecture,
  X = m00(m00(m00(Y, Y), m00(Y, X)), Y) &
  X = m31(m21(m11(Y, Y), m10(Y, X)), Y) &
  X = m22(m02(m22(Y, Y), m20(Y, X)), Y) &
  X = m13(m23(m33(Y, Y), m30(Y, X)), Y) &
  X = m10(m01(m00(Y, Y), m01(Y, X)), Y) &
  X = m01(m22(m11(Y, Y), m11(Y, X)), Y) &
  X = m32(m03(m22(Y, Y), m21(Y, X)), Y) &
  X = m23(m20(m33(Y, Y), m31(Y, X)), Y) &
  X = m20(m02(m00(Y, Y), m02(Y, X)), Y) &
  X = m11(m23(m11(Y, Y), m12(Y, X)), Y) &
  X = m02(m00(m22(Y, Y), m22(Y, X)), Y) &
  X = m33(m21(m33(Y, Y), m32(Y, X)), Y) &
  X = m30(m03(m00(Y, Y), m03(Y, X)), Y) &
  X = m21(m20(m11(Y, Y), m13(Y, X)), Y) &
  X = m12(m01(m22(Y, Y), m23(Y, X)), Y) &
  X = m03(m22(m33(Y, Y), m33(Y, X)), Y)
).

Michael Bucko (Nov 05 2024 at 14:18):

Based on the feedback from @Philip Zucker , I'm looking into LeanDojo. Perhaps some of this is already doable.


Last updated: May 02 2025 at 03:31 UTC