Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Improving Mathematical Reasoning with Process Supervision

Tyler Josephson ⚛️ (Jun 01 2023 at 00:12):

https://openai.com/research/improving-mathematical-reasoning-with-process-supervision

Adam Topaz (Jun 01 2023 at 18:49):

Here's a link to the "MATH dataset" they used for testing: https://github.com/hendrycks/math/

RexWang (Jun 02 2023 at 04:02):

Tyler Josephson ⚛️ 发言道：

https://openai.com/research/improving-mathematical-reasoning-with-process-supervision

Here is the paper related to the blog: https://cdn.openai.com/improving-mathematical-reasoning-with-process-supervision/Lets_Verify_Step_by_Step.pdf

Dongwei Jiang (Jun 04 2023 at 14:45):

I have just completed a thorough read-through of the recent paper, and as a novice in the field of theorem proving, here are some initial insights I've gathered, subject to potential refinement:

1 Given the multi-step nature intrinsic to theorem proving, it seems like an excellent field for implementing this particular method.

2 The notion of outcome-based supervision draws some parallels to expert iteration, as seen in previous theorem proving papers. In this approach, the validity of intermediary steps is assessed based on the final result. However, identifying the erroneous step in theorem proving appears challenging. OpenAI tackles this issue by enlisting human annotators. Implementing a similar strategy could indeed represent progress in this field.

Dongwei Jiang (Jun 05 2023 at 00:30):

As I continue to ponder over this, I can't help but wonder: Might elements of process-based supervision already exist within Lean? It's worth noting that we often receive error notifications when employing inappropriate tactics.

Additionally, the paper's use of scores to establish the correct reasoning path reminds me of the best-first search method, something I've seen incorporated in several theorem proving research works.

Yet, a thought lingers: If only we had some means to discern whether we're straying down an incorrect proof path—especially in cases that don't necessarily register as errors in Lean...

Last updated: May 02 2025 at 03:31 UTC