Zulip Chat Archive

Stream: PhysLean

Topic: PhysProver


Joseph Tooby-Smith (Jan 26 2026 at 06:53):

Just came across this paper:

https://arxiv.org/pdf/2601.15737

Haven't had time to look at the details yet - but needless to say it looks interesting.

Joseph Tooby-Smith (Jan 26 2026 at 06:55):

(Some overlap with the authors of the paper discussed at: #PhysLean > PhysLib )

Joseph Tooby-Smith (Jan 26 2026 at 10:13):

Here is my summary:

  1. They use lemma's and proofs from PhysLean to form a dataset, split into training and test data.
  2. They supplement the training data with synthetic data generated from Claude.
  3. They apply reinforcement learning on DeepSeek-Prover-V2-7B.
  4. There resultant model, PhysProver, results in improvements over existing models of about 3%.
  5. PhysProver seems to use results from PhysLean correctly, resulting in less hallucinated lemmas.
  6. The new model PhysProver surpassed base version on maths results as well as physics results.
  7. They say they will release their dataset and model.

Joseph Tooby-Smith (Jan 26 2026 at 10:46):

GitHub repo maybe at: https://github.com/hanningzhang/PhysProver

Rein Zustand (Jan 29 2026 at 10:27):

There is https://axiomatic-ai.com/, if you ctrl-f/cmd-f Ax-prover, it is 2.4x better at quantum mechanics proofs than Sonnet 4, and +35% better than DeepSeek-Prover-V2-671B. Maybe @Austin Letson can illuminate some details here.

Rein Zustand (Jan 29 2026 at 10:33):

This is the paper about Ax-prover: https://arxiv.org/abs/2510.12787.

The dataset was generated through an iterative human-in-the-loop process combining automated proof synthesis with expert curation. The initial set of 134 theorems were manually extracted from [49]. An automated coding agent (Claude Opus [6]) first generated formal statements and proof attempts for the theorems, producing both complete proofs and partial derivations. A quantum physics expert then reviewed each statement and proof, identifying gaps, correcting errors, and standardizing operator definitions to ensure that each question was well formed and solvable. The final dataset replaces these proofs with sorry statements.

The citation 49 is Michael A Nielsen and Isaac L Chuang. Quantum computation and quantum information.

Joseph Tooby-Smith (Jan 29 2026 at 10:43):

The comparison I would be most interested in is one with Aristotle. Doesn't seem like either of these papers make this comparison.


Last updated: Feb 28 2026 at 14:05 UTC