Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Crowdsourcing ATP


Simon Sorg (May 26 2025 at 14:20):

Currently, theorem prover infrastructure is mostly closed-source. I think DeepSeekProver V1.5 and the implementation based on it are the only open source implementations available. Some major AI lab comes around, treats ATP as an RL problem, gets SOTA, and sometimes the resulting model is open weights.
I'm wondering: since training ATP models is solving a hard engineering challenge and throwing compute at the problem, how come there is no public initiative? Shouldn't it be possible to design a system working on heterogeneous hardware, so that everyone can contribute compute to it? At least a heterogeneous proof search would be exciting. The resulting dataset could be made available for everyone (or only academic institutions).

I'm sure I must be missing something. I searched for old threads on this, but couldn't find any. If anyone can give me a hint to why this is not being done or where it has been discussed previously, I'd be glad!

Justin Asher (May 26 2025 at 14:50):

@Simon Sorg Am working on this: #Machine Learning for Theorem Proving > MCP Tools for LLMs and Agentic Mathematics .

I know @harry sanders is interested in creating a powerful Lean gym, i.e., RL environment, as well.

Simon Sorg (May 26 2025 at 14:56):

I was thinking about running distributed proof searches on arbitrary hardware, say simply starting a docker container. For example, one could feed in stuff from SorryDB, and if a proof is found, a PR is created. This is somewhat different from MCP I think, but I do not understand MCP deep enough to be sure, so please correct me.

Justin Asher (May 26 2025 at 17:46):

I think it just depends on what you want to do. The idea behind MCP is to give the LLMs access to tools, such as search engines, running Lean code, among other things. The reason I am interested in MCP is it is easy to integrate with any of the frontier models, so if we build with MCP in mind, we should always be at the cutting edge. I think that what you are describing is in line with what I would like to do: adding an agentic workflow on top of the MCP tools. I think we need the MCP tools first, though, for this to work well.


Last updated: Dec 20 2025 at 21:32 UTC