Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: New paper on arXiv

Jeremy Avigad (Jan 08 2022 at 17:32):

Someone pointed me to this paper posted on arXiv:
https://arxiv.org/abs/2112.15594
The abstract begins: "We demonstrate that a neural network pre-trained on text and fine-tuned on code solves Mathematics problems by program synthesis." It ends: "This is the first work to automatically solve, grade, and generate university-level Mathematics course questions at scale. This represents a milestone for higher education."
I haven't read it yet.

Johan Commelin (Jan 08 2022 at 17:39):

Comments: 128 pages, 250 tables

Zygimantas Straznickas (Jan 08 2022 at 17:45):

Reddit discussion about the paper: https://www.reddit.com/r/MachineLearning/comments/rutbpv/r_a_neural_network_solves_and_generates/ .

Jason Rute (Jan 08 2022 at 18:13):

It’s really only 8 pages with a really long appendix made up of what looks like example math problems.

Kunhao Zheng (Jan 09 2022 at 09:57):

They used Codex to translate math problems (maybe heavy prompt engineering here, they didn't fine-tune the model) into Python code and executed the code to get the answers.
(Background: One can hint Codex with some descriptions or docs of a function, and let it generate the function that satisfies the descriptions.)
Here the descriptions happens to be the math problems itself.
The same idea of auto-formalization can be also tried on lean: hinting the Codex the math problems and let it generate lean code w/ proof or only a sorry there.

Kunhao Zheng (Jan 09 2022 at 12:55):

I remember reading another quite similar paper on arXiv months ago. Found it here:

Solving Probability and Statistics Problems by Program Synthesis
https://arxiv.org/pdf/2111.08267.pdf

It doesn't surprise me much that it's from the same group of people.

Junyan Xu (Jan 09 2022 at 21:23):

On p.4 it says

Interaction. he original question may not be a prompt that synthesizes a program whose execution results in the correct answer. In addition, the answer may require multiple steps with clear plots or other modalities. We therefore may interactively prompt Codex until reaching the correct answer or visualizations, making the minimum necessary changes from the original question. Panel D in Figure 2 shows an example of interaction to produce multiple plots.

So this step (prompt generation) isn't quite automated yet.

Junyan Xu (Jan 09 2022 at 21:48):

This paragraph

C. Questions and Prompts.We classify the transformations from the original course questions to the Codex prompts resulting in correct solutions into the following three classes: (i) As-is prompt: Original question and Codex prompt are the same, (ii) Automatic prompt transformation: Original question and Codex prompt are different, and the Codex prompt is generated automatically by Codex itself, (iii) Manual prompt transformation: Original question and Codex prompt are different, and the Codex prompt is generated by a human.

as quoted on reddit, appears in the first version of the paper but no longer in the latest version. Not sure what changed from version to version.

Michael R Douglas (Jan 09 2022 at 22:46):

Iddo Drori, lead author on these papers, will speak in the New Technologies seminar on March 9.

Junyan Xu (Mar 09 2022 at 18:59):

starting in 30min

Jason Rute (Mar 09 2022 at 19:04):

Actually it has already started right now.

Junyan Xu (Mar 09 2022 at 19:13):

Oh, my mistake, thanks

Jason Rute (Mar 09 2022 at 20:17):

@Michael R Douglas The speaker said he would be able to give a citation for their work on proof synthesis. Would you be able to follow up with him and post it here? (He said it was on arXiv, but I can't seem to find any examples of proofs in say https://arxiv.org/pdf/2112.15594.pdf, although it is possible I'm just not searching for the correct keywords or looking in the right papers.)

Adam Topaz (May 20 2022 at 17:22):

Not really lean related, but might still be interesting for folks in this stream:
https://arxiv.org/abs/2203.08913

Adam Topaz (May 20 2022 at 17:25):

Related HN discussion: https://news.ycombinator.com/item?id=31448360

Jason Rute (Aug 15 2022 at 03:56):

This paper was recently published in PNAS (which is similar to Science or Nature). I haven’t looked at the new version of the paper but hopefully thorough review helped clarify a lot of the iffy and too-good-to-be-true sounding results from the original arXiv paper.

Adam Topaz (Nov 23 2022 at 04:33):

Just mentioning the following paper I came across, in case anyone finds it interesting:
https://arxiv.org/abs/2211.08671

Junyan Xu (Nov 25 2022 at 19:46):

This isn't an arXiv paper and was published a while ago, but still:
Socially situated artificial intelligence enables learning from human interaction
From the abstract (emphasis mine):

Humans have long demonstrated an ability to learn from interactions with others. However, AI agents learn in social isolation. To create intelligent systems that understand more than a fixed slice of the world, our article formalizes socially situated AI—a framework that enables agents to interact with people as they simultaneously learn new concepts about the world around them. Using our framework, we deploy a field experiment on a photo-sharing social network where our agent interacts with hundreds of thousands of people to learn concepts about the visual world. We combine advances in deep learning, computer vision, natural language processing, and human–computer Interaction to deliver a human-centered AI that learns from interactions with people in social environments.

We manifest our framework as an interactive agent that learns how to ask natural language questions about photos as it broadens its visual intelligence on a large photo-sharing social network. Unlike active-learning methods, which implicitly assume that humans are oracles willing to answer any question, our agent adapts its behavior based on observed norms of which questions people are or are not interested to answer. Through an 8-month deployment where our agent interacted with 236,000 social media users, our agent improved its performance at recognizing new visual information by 112%. A controlled field experiment confirmed that our agent outperformed an active-learning baseline by 25.6%.

I think the method could be very relevant when it comes to accelerating AI's absorption of mathematical and scientific knowledge. It could open an avenue of teaching mathematics to machines other than formalizing/mathlib building/simp lemma and tactic writing.

Meta AI has recently released Galactica and Cicero, and it would be wonderful if we could combine them :) Galactica has apparently generated a lot of activities during the short time it was online, and I think people are willing to hold long form conversations with AI if it could conceivably pay back; if one mathematician demonstrates interest/curiosity about one math topic during conversation, AI may be motivated to acquire it from another person knowledgeable about the topic. Some account system may be needed to keep track of every correspondent's preferences, so as to communicate in a personalized way and avoid triggering adverse reactions (AI is now probably good enough to infer sentiments but an upvote/downvote system could also be nice to have).

Last updated: May 02 2025 at 03:31 UTC