Zulip Chat Archive
Stream: Machine Learning for Theorem Proving
Topic: Christian Szegedy and Tony Wu joined xAI
Junyan Xu (Jul 12 2023 at 17:46):
Jason Rute (Jul 12 2023 at 21:14):
Is Markus Rabe still at Google? Do you know if anyone is left at Google working on AI for mathematics/formalization/theorem proving?
Jason Rute (Jul 12 2023 at 21:17):
And are Szegedy or Tony working on the same topics still at xAI?
Eric Wieser (Jul 12 2023 at 21:18):
https://www.linkedin.com/in/markusnrabe/ suggests Markus is not
Jason Rute (Jul 12 2023 at 21:19):
It might also be worth mentioning that some of the HTPS authors at Meta (many also authors on DSP) first switched to working on LLMs, namely LLaMA and then left to start a company about a month ago: https://twitter.com/aimistral/status/1670133250130427905
Andreas Gittis (Jul 12 2023 at 22:05):
Jason Rute said:
Is Marcus Rabe still at Google? Do you know if anyone is left at Google working on AI for mathematics/formalization/theorem proving?
My understanding is that the N2Formal group at Google no longer works on automated theorem proving; I don't know for sure though. Like the HTPS authors you mentioned, Markus left to join a startup. Seems like large industry companies are turning to focus on the emergent chatbot tech over other lines of research. I wonder if the same thing is happening at DeepMind after Google's announcement merging it with Google Brain.
Junyan Xu (Jul 12 2023 at 22:45):
And are Szegedy or Tony working on the same topics still at xAI?
Seems they're still working on math and reasoning according to Tony's tweet and Twitter bio. I was aware that Christian left since April, but heard about Tony's departure only from the NY Times article. Interestingly that Tony's webpage and Markus's LinkedIn both say "stealth startup" but they turn out to be different ...
Junyan Xu (Jul 12 2023 at 22:52):
Also see announcement from Greg Yang:
Finally launched http://x.ai!
The mathematics of deep learning is profound, beautiful, and unreasonably effective. Developing the "theory of everything" for large neural networks will be central to taking AI to the next level. Conversely, this AI will enable everyone to understand our mathematical universe in ways unimaginable before.
Math for AI and AI for math!
Any mathematician/theorist excited about this needs to DM me!
Junyan Xu (Jul 13 2023 at 00:02):
Do you know if anyone is left at Google working on AI for mathematics/formalization/theorem proving?
Also, don't forget DeepMind is now part of Google, and given what data they're collecting it's conceivable they've taken up the IMO grand challenge.
Andreas Gittis (Jul 13 2023 at 00:17):
Junyan Xu said:
Interestingly that Tony's webpage and Markus's LinkedIn both say "stealth startup" but they turn out to be different ...
Ah okay, I'll remove the name from my previous post.
Jason Rute (Jul 13 2023 at 02:50):
Is it public that DeepMind is collecting that data?
Junyan Xu (Jul 13 2023 at 04:56):
One can see the name of the researcher who posted the study before taking it, so I think it's public to anyone who registered a Prolific account via the link.
On the other hand, Geoffrey Irving (coauthor of the paper Deep Network Guided Proof Search (2017) with Szegedy and Kaliszyk, now a safety researcher at DeepMind) quite impressively formalized Hartogs's theorem a while ago. I'd guess some DeepMind people are still pursuing this line of research (earlier version); also interesting to see AutoBound coming out of Google lately.
Ryan McCorvie (Jul 14 2023 at 20:12):
On twitter now, there is a stream from x.ai discussing the new venture. Talia Ringer is livetweeting it for anyone who wants the short version.
Dongwei Jiang (Jul 20 2023 at 01:33):
It's quite surprising to see that the majority of individuals from leading organizations (OpenAI, Meta, Google, and the like) who were previously involved in automatic theorem proving seem to have shifted their focus. As a novice in this area, I'm genuinely interested to gather your thoughts on the possible reasons for this trend. Could it be:
1 These major companies are primarily focused on advancing Large Language Models and may perceive theorem proving as less immediately applicable to their operational needs?
Or 2 They are pioneers in this field. And the impressive work these individuals have done in this field has opened doors to exciting startup opportunities they couldn't pass up?
Jason Rute (Jul 20 2023 at 08:00):
https://twitter.com/AlbertQJiang/status/1679598310045220866?s=20
Jason Rute (Jul 20 2023 at 08:01):
https://twitter.com/jessemhan/status/1679602978339057669?s=20
Jason Rute (Jul 20 2023 at 08:02):
(That was Albert expressing the same thought and Jesse giving his personal experience.)
Dongwei Jiang (Jul 20 2023 at 09:26):
@Jason Rute Thanks for the info! I didn't know about that conversation. It looks more like the second point. Thanks again :smile:
Zhangir Azerbayev (Jul 20 2023 at 16:01):
Dongwei Jiang said:
It's quite surprising to see that the majority of individuals from leading organizations (OpenAI, Meta, Google, and the like) who were previously involved in automatic theorem proving seem to have shifted their focus. As a novice in this area, I'm genuinely interested to gather your thoughts on the possible reasons for this trend. Could it be:
1 These major companies are primarily focused on advancing Large Language Models and may perceive theorem proving as less immediately applicable to their operational needs?
Or 2 They are pioneers in this field. And the impressive work these individuals have done in this field has opened doors to exciting startup opportunities they couldn't pass up?
I think an important reasoning for people leaving theorem proving is that their AGI timelines have drastically shortened over the past few years. I came into theorem proving thinking that a superhuman theorem prover would be a major milestone on the long journey towards AGI, like AlphaZero was. Now it looks like we will solve AGI and get theorem proving as a special case.
Jonathan Yan (Jul 20 2023 at 16:06):
The definition of AGI here could use a bit clarification. If it's about computers doing all normal human's cognitive work, satisfying both "G" and "I", it might still not generalize "up" to the stronger reasoning skills mathematicians have.
Zhangir Azerbayev (Jul 20 2023 at 16:16):
Jonathan Yan said:
The definition of AGI here could use a bit clarification. If it's about computers doing all normal human's cognitive work, satisfying both "G" and "I", it might still not generalize "up" to the stronger reasoning skills mathematicians have.
If we're at median human level but not yet at academic mathematician level, we can put in more compute and more data and get to academic mathematician level.
What I said above is not quite precise, because models can be strongly superhuman along some axes while far behind humans on others. This is already the case: gpt4 is superhuman in knowing facts but far worse than humans at executing long term plans. The more precise statement is "once models are median human level along all axes of intelligence then getting them to academic mathematician level along all axes is a matter of compute and data".
Jonathan Yan (Jul 20 2023 at 16:31):
Zhangir Azerbayev said:
Jonathan Yan said:
The definition of AGI here could use a bit clarification. If it's about computers doing all normal human's cognitive work, satisfying both "G" and "I", it might still not generalize "up" to the stronger reasoning skills mathematicians have.
If we're at median human level but not yet at academic mathematician level, we can put in more compute and more data and get to academic mathematician level.
What I said above is not quite precise, because models can be strongly superhuman along some axes while far behind humans on others. This is already the case: gpt4 is superhuman in knowing facts but far worse than humans at executing long term plans. The more precise statement is "once models are median human level along all axes of intelligence then getting them to academic mathematician level along all axes is a matter of compute and data".
This is not obvious to me, because it's cheap to produce the data for lots of median level human tasks. What if a large model learns to solve all of these relatively easy tasks in somewhat wrong ways, for example by overfitting. Then more compute and data will not improve the system's capability for more difficult tasks.
Zhangir Azerbayev (Jul 20 2023 at 16:46):
Jonathan Yan said:
Zhangir Azerbayev said:
Jonathan Yan said:
The definition of AGI here could use a bit clarification. If it's about computers doing all normal human's cognitive work, satisfying both "G" and "I", it might still not generalize "up" to the stronger reasoning skills mathematicians have.
If we're at median human level but not yet at academic mathematician level, we can put in more compute and more data and get to academic mathematician level.
What I said above is not quite precise, because models can be strongly superhuman along some axes while far behind humans on others. This is already the case: gpt4 is superhuman in knowing facts but far worse than humans at executing long term plans. The more precise statement is "once models are median human level along all axes of intelligence then getting them to academic mathematician level along all axes is a matter of compute and data".
This is not obvious to me, because it's cheap to produce the data for lots of median level human tasks. What if a large model learns to solve all of these relatively easy tasks in somewhat wrong ways, for example by overfitting. Then more compute and data will not improve the system's capability for more difficult tasks.
A few points.

Learning the entire joint distribution of human performance is different from simulating median human performance. In particular, if we have the entire joint distribution of human performance we can condition on the desired slice of the distribution. In deep learning, this idea is called "just ask for generalization".

If by "overfit" you meant overfit to the pretraining data, I don't think we will have the problem of overfitting to 10 trillion tokens any time soon.

When doing RLHF/RLAIF for something like chat models, the reward model is very brittle and easy to overfit to. However, mathematics is a particular domain where it's much easier to create robust reward models. For example, you can think of Lean as a very good GOFAI reward model for mathematical correctness. Of course, mathematical correctness is not all we care about, and we also want a reward model that encodes mathematical beauty.
Jonathan Yan (Jul 20 2023 at 23:34):
Re 2: by "overfit" I mean not learning the appropriate algorithm. For example, a model that solves the IMO challenge by memorizing all patterns of competitive math problems probably cannot generalize to Fermat's Last Theorem? A human IMO expert, however, with limited memory and inference budget, cannot memorize then try all these problem patterns. The expert must generate many such patterns in the process of solving a problem, ie an algorithm that "rediscovers" many such math patterns on the fly. This algo is more "general" so gets a greater chance when attempting harder math like FLT.
This is consistent with your point 1. Today's large models with huge compute budgets are under less pressure than humans to "think properly", so for example they can memorize many simple logos, some incorrect heuristics and brute force to solve common tasks. This learned skill plausibly generalizes to many unseen tasks, even to the point of doing all human cognitive work that has industry applications, but fails on qualitatively harder tasks. This is the scenario I imagined earlier where we create AGI in a practical sense but still cannot solve math.
Strongly agree that it'll be a huge step forward if "mathematical beauty" can be encoded!
Siddhartha Gadgil (Jul 21 2023 at 03:47):
Contrasting Olympiad problems and "Fermat's last theorem" is misleading in terms of what it takes. Mathematics advances in a series of steps, each of which can come from (among other things):
 By analogy from previous mathematics
 Combining ideas from different places.
 Learning from examples: for instance deeply understanding an example of a construction of Jorgensen (of a hyperbolic structure) was a key part of Thurston's geometrrization.
 Abstraction and generalization.
 Cleverness (a la Olympiad problems).
 Refactoring clever ideas to understand (to quote Dennis Sullivan: "why be clever when we can be intelligent").
 Trying a lot of stuff and retaining what works. Also understanding why something fails after making progress and modifying.
A great piece of mathematics involves a lot of such steps, building on different pieces of mathematics and using different strengths in different parts.
If each step can be accomplished by AI then putting them together may just be engineering. Till recently the missing capability was reasoning by analogy (including generalization  which is analogy with an example/special case).
Also while aesthetic appeal has a role in mathematics (personally I prefer though Poincare's description of this as "Joy of comprehension") it is not necessary for guiding mathematics (at least in areas I know). At any time there are clear ultimate goals, specific test cases which are beyond the limits of present techniques, explicit open problems, newly developed techniques that can be extended, and various other guides to what is desirable worthwhile progress.
I am not claiming that AI will reach the level of say Gromov  I have no basis for judging whether this will happen one way or the other. All I am saying is that looking at the end result of research mathematics may mislead people into thinking this is a mysterious activity of an entirely different nature from other reasoning.
Jonathan Yan (Jul 21 2023 at 04:06):
If mathematics is not qualitatively harder, as suggested by the above comment, more precisely, math progress can come from the culmination of lots of standard reasoning steps, but not necessarily involving a mysterious step like the "Joy of comprehension", then I agree with the feeling expressed earlier that we will "get theorem proving as a special case" of AGI.
Albert Jiang (Jul 21 2023 at 18:59):
Language modelling is very useful for ATP. A lot of companies close doors to their models so I can't mess around and hack stuff. Some people saw the benefits in ACTUALLY opening up the best models to the entire community and decided to build their own.
Last updated: Dec 20 2023 at 11:08 UTC