Zulip Chat Archive
Stream: Machine Learning for Theorem Proving
Topic: TheoremLlama
Jason Rute (Jul 14 2024 at 14:08):
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts
- Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, Tong Zhang (none of them appear to be on this Zulip)
Jason Rute (Jul 14 2024 at 14:09):
I haven't read it in detail, but like DeepSeek-Prover, it uses LLMs for whole proof synthesis (with 128 samples like in DeepSeek-Prover). It also appears to use autoformalization to get more data, and natural language reasoning (I'm not sure if that is similar to how it is used in Draft, Sketch, Prove).
Jason Rute (Jul 14 2024 at 14:09):
While it lags behind DeepSeek-Prover on MiniF2F, it appears that the code, model and dataset are all open sourced! Also, I think there are some interesting ideas which I need to read more to understand.
Jason Rute (Jul 14 2024 at 14:09):
Since it is open-sourced, I'd love it if this was benchmarked on some of the recent benchmarks coming out when they come out. Again, like DeepSeek-Prover, it isn't clear to me yet if it is specific just to competition problems like MiniF2F or if it also does well on "real-world" Lean proofs. (I don't think it has a way to handle new definitions/theorems.)
Jason Rute (Jul 14 2024 at 14:09):
Also, this paper, DeepSeek-Prover, and the earlier LLM papers like DSP (and successors) makes it seem more and more likely that LLMs are not bad for whole proof synthesis (even without seeing the intermediate goal states). It seems that a huge part is just to fine-tune on better data (through say auto-formalization or manual data curation).
Bolton Bailey (Jul 14 2024 at 21:43):
Open-sourced
This would be nice, but currently all I see on that GitHub page is a readme
Last updated: May 02 2025 at 03:31 UTC