Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Paper: Autoformalization with Large Language Models

Jason Rute (May 27 2022 at 14:44):

Autoformalization with Large Language Models

Authors: @Yuhuai Tony Wu , @Albert Jiang , @Wenda Li , @Markus Rabe , Charles Staats, Mateja Jamnik, @Christian Szegedy
Twitter announcement: https://twitter.com/Yuhu_ai_/status/1529887383629443072
arXiv: https://arxiv.org/abs/2205.12615

The next figure shows a perfect translation of a grade school math problem by PaLM. This is remarkable because such a statement is completely out-of-distribution – no formal mathematicians are interested in formalizing grade school math problems ;) 4/ https://twitter.com/Yuhu_ai_/status/1529887383629443072/photo/1

- Yuhuai (Tony) Wu (@Yuhu_ai_)

Jason Rute (May 27 2022 at 14:44):

I know some here say that auto-formalization is near, but in reality it is a really hard task. However, Google's N2Formal team and researchers at Cambridge seem to have made some progress on certain classes of self-contained problems. The idea is that those pre-trained large language models one hears about on the news, like Codex and PaLM, already have some built-in ability to translate natural math to formal math, Isabelle in this case. For those of us in AI, this isn't shocking but it is pleasantly surprising. One basically has to ask the model in the right way (prompt engineering) and it will translate an informal math problem to a formal one. I think they focused on about 4K math competition problems and formalized about 25% of the statements correctly with the language model. Further, then one can take those formalized statements and use them to improve existing neural theorem provers. Note @Stanislas Polu et al already showed it is possible to improve a neural theorem prover on MiniF2F by manually building a curriculum of formal problems for the prover to train on. But now, with a language model one can just auto-formalize a diverse list of problems to train on. And indeed the authors show strong improvement on the Thor model (which also just came out) applied to the MiniF2F benchmark.

Jason Rute (May 27 2022 at 14:44):

And for those interested in auto-formalization in general, there is a small discussion in Section 6 on auto-formalization of real math and of auto-informalization where we turn formal math into informal English math, which is an easier task for these models to do.

Jason Rute (May 27 2022 at 14:44):

Also, nothing about this work seems specific to Isabelle.

Last updated: May 02 2025 at 03:31 UTC