Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: on doing RL against Lean R1-style

Andy Jiang (Jan 29 2025 at 20:13):

After seeing the following tweet of Andrei Karpathy https://x.com/karpathy/status/1884676486713737258
"For friends of open source: imo the highest leverage thing you can do is help construct a high diversity of RL environments that help elicit LLM cognitive strategies. To build a gym of sorts. This is a highly parallelizable task, which favors a large community of collaborators."
I hope I can maybe start a discussion here about how/whether it's possible to collaboratively create a massive set of Lean problems from current LLM level (say AMC 10 math which is a little below current capabilities) continuously into the realm of open questions for the task of training, using reinforcement learning, LLMs to do research math.

Of course there are current efforts of creating large Lean datasets like from Numina but I think maybe those are a little limited currently in scope and the process is maybe not completely collaborative right now.

One reason to do this now is that after the success of R1 (I guess from success of o1 and R1 revealing the method) it seems like very simple RL paradigms (binary feedback of correctness, pure rollout style without search) applied to current generation LLMs seem to be working well to improve its math abilities (there's I guess some debate about how much curating there is in pretraining data but let me ignore this). So it seems plausible that this approach can succeed.

Some potential counterpoints are: maybe this work is more efficiently done by training better auto formalizers and then auto formalizing all existing math literature into a huge problem set and it's not clear the open source community has the resources to do this well collaboratively.

Maybe the biggest counterpoint is that probably the open source community won't have the resources to do any of the actual RL training at the end collaboratively (which will probably cost tens of millions even after acquiring the necessary gpus [edit: I think some estimates of the RL part of R1 given the base model is closer to 100K USD so maybe $1M USD on compute could be enough for this part if one already has the gpus but all cost estimates are pure guesses by me]). And also that the big players, even in the open source side like meta and deepseek, are also already very motivated to do this as solving research math problems is a very tantalizing benchmark for AI now.

The above are just some of my opinions to maybe help start a discussion here--I'm curious about other people's thoughts.

Last updated: May 02 2025 at 03:31 UTC