Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Minerva


Yuhuai Tony Wu (Jul 02 2022 at 01:40):

I'd like to share with folks here about our recent work, Minerva (blog post, paper): a language model that can reason mathematics, general science in natural language, using step-by-step reasoning. Some remarkable achievements with this model:

1) 51% acc on MATH, a math competition dataset, which was predicted to happen in 2025.
2) 65% acc in 2022 Poland’s National Math Exam which is more than the national average!
3) 30% acc on “undergrad” problems in physics, biology, chemistry, economics, etc.
4) 82% on GCSE Mathematics - this is a UK exam for 16 years old taken by about 700K students each year.

You can find some example solutions at sample explorer.

Appreciate any comments, questions, or other feedback!

Yuhuai Tony Wu (Jul 02 2022 at 01:46):

One clear limitation is the lack of verification with natural language models, and especially when it comes to proofs. But I'm optimistic to make use of the autoformalization abilities of these models to help verify, and further provide feedback signals to ground meanings etc.

Jason Rute (Jul 02 2022 at 14:58):

Nice work @Yuhuai Tony Wu!

Floris van Doorn (Jul 02 2022 at 16:27):

This is very impressive. Some of the solutions in the samples are really good!


Last updated: Dec 20 2023 at 11:08 UTC