Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: AI and artistic taste


Johan Commelin (Feb 09 2021 at 05:28):

One claim that is often made when it comes to AIs that will conjecture and prove their own theorems is that the mathematical spaces is infinite-dimensional and the AI will just wander off into some silly boring corner while it happily proves silly boring theorems.

I don't know much about AI. But have people trained AIs to judge music, or other forms of art? And score it along axes like

  • this will be popular among the public (likes on youtube etc)
  • this will be considered "original"
  • this is technically difficult, but executed perfectly
  • etc...

Johan Commelin (Feb 09 2021 at 05:29):

It would be extremely interesting if an AI could automatically recognize theorems in mathlib as "helper lemmas" or "main results".

Kevin Buzzard (Feb 09 2021 at 07:38):

[Of course some might say that if you give a bunch of computer scientists a theorem prover then they may well also just wander off into some silly boring corner and happily prove silly boring theorems...]

Roman Bars (Feb 09 2021 at 08:13):

I think partly the reason why humans don't wander off into silly boring corners is that we are essentially monkeys and so we have monkey intuition about several objects (including natural numbers and the 3-dimensional Euclidean space).

The vast majority of college students worldwide who take calculus probably don't know ZFC or any other formal system in which reals can be defined yet they manage to take derivatives. They just visually imagine a straight line and that's usually good enough.

Giving AI monkey intuition might be a way to prevent it from going to a silly boring corner. But developing monkey intuition took us millions of years so I am not sure how easy would it be.

Kevin Buzzard (Feb 09 2021 at 08:20):

On the other hand in the early 1980s when I was an active chess player it was sometimes argued that these crappy chess computers which played at sub-club level would never amount to anything, because humans played chess with a refined intuition which a machine would never emulate.

Stanislas Polu (Feb 09 2021 at 09:17):

@Johan Commelin we have techniques for that; called GANs (generative adversarial networks). They're not shining as much on text as they do on image; but they have already been applied successfully there. The training objective is exactly what you're describing. One model conjectures statements; the other judge them and attempt to discriminate between synthetic and "ground-truth" statements (say the ones in mathlib). The equilibrium is reached when the judge can't discriminate the fake statements anymore.

There are many variations around that idea that could be explored and are hugely exciting. But it all starts with having a good "prover" so that once we have generated these potentially interesting statements (or dare I say, definitions) we can submit them to a system that is not trivially bad at demonstrating/working with them (cc @Jesse Michael Han who I know is prolly excited by these kind of projects). From there we can hope for continuous self-improvement and therefore potentially interesting knowledge creation.

We're right now focusing on the prover; because without a good prover there's not much useful we can do with these generations (without intensive human labour, which is just simply not available on planet earth for this kind of tasks at the scale we would need :p)

Johan Commelin (Feb 09 2021 at 09:19):

@Stanislas Polu No, I'm not asking about proving things, or recognizing which conjectures are true.

I'm talking about a metrics beauty, important : expr -> real. And I wonder if an AI can get close to what humans can do there intuitively.

Johan Commelin (Feb 09 2021 at 09:19):

Will it recognise quadratic_reciprocity not just as a "true" expression, but as one that is important, elegant, surprising, fundamental.

Stanislas Polu (Feb 09 2021 at 09:20):

@Johan Commelin in the setup above that would be exactly the signature of the discriminator :+1:

Johan Commelin (Feb 09 2021 at 09:20):

Ooh, ok. Then I misunderstood what you were describing :see_no_evil:

Stanislas Polu (Feb 09 2021 at 09:21):

The exact "pseudo-signature" would be: likeness_to_mathlib : expr -> real but that's a not so bad proxy to what you're looking for in the space of all possible statements.

Johan Commelin (Feb 09 2021 at 09:24):

I'm still confused why you call it likeness_to_mathlib...

Gabriel Ebner (Feb 09 2021 at 09:24):

I wonder how you would generate negative training examples, i.e. theorem statements that are "unlike mathlib". There are still many theorems which are not in mathlib, but are important and beautiful.

Johan Commelin (Feb 09 2021 at 09:25):

I think that 95% of mathlib is ugly, boring, stupid, simple helper lemmas, rfl-lemmas, simp-lemmas, api building stuff.

Johan Commelin (Feb 09 2021 at 09:26):

An AI shouldn't train on a random subset of mathlib to get a sense of what it should aim for to prove a beautiful, elegant, surprising theorem

Johan Commelin (Feb 09 2021 at 09:26):

Of course it can train on mathlib to get a good sense of how to build a strong and useful API around a definition

Johan Commelin (Feb 09 2021 at 09:26):

But to guide an AI towards interesting results, it should probably train on Fabstracts...

Stanislas Polu (Feb 09 2021 at 09:28):

@Gabriel Ebner So that's the main benefits of GANs, you don't need negative samples to start with. This setup, when it works, converges towards the distribution you want to mimic: here the distribution of mathlib statements. The negative examples are generated by the network that attempts to fake them; at the beginning they are quite bad and the discriminator can discriminate them. And is trained along with the generator on exactly that (discrimination of the mathlib statements vs the statements generated by generator).

Stanislas Polu (Feb 09 2021 at 09:28):

@Johan Commelin you can replace the mathlib distribution by another one you feel is more important; such as "important statements from mathlib only"

Stanislas Polu (Feb 09 2021 at 09:29):

or if you want to mimic your tastes "important statements from mathlib that are beautiful to johan" :)

Johan Commelin (Feb 09 2021 at 09:29):

Right... just filter by git blame :rofl: :laughing:

Stanislas Polu (Feb 09 2021 at 09:29):

Good training signal! :)

Johan Commelin (Feb 09 2021 at 09:30):

But like I said, I think there is only a very small subset of mathlib that qualifies as "landmark" results.

Stanislas Polu (Feb 09 2021 at 09:30):

agreed! and data scarcity is obviously an issue here

Johan Commelin (Feb 09 2021 at 09:30):

It would be a subset of what is mentioned in overview.yaml

Yuhuai Tony Wu (Feb 09 2021 at 20:22):

@Johan Commelin We have been working on something related to this, namely, extracting theorems from a proof that could look interesting/useful to humans, by imitating human theorem extraction process. We applied the learned agent to MetaMath, and discovered quite a few theorems that are used frequently in the MetaMath library. It's a recent submission to ICML (a machine learning conference). If you'd like, I can share the submission with you.

Johan Commelin (Feb 09 2021 at 20:23):

Sounds good! I'm interested in seeing the paper

duck_master (Mar 01 2021 at 05:10):

Johan Commelin said:

One claim that is often made when it comes to AIs that will conjecture and prove their own theorems is that the mathematical spaces is infinite-dimensional and the AI will just wander off into some silly boring corner while it happily proves silly boring theorems.

I don't know much about AI. But have people trained AIs to judge music, or other forms of art? And score it along axes like

  • this will be popular among the public (likes on youtube etc)
  • this will be considered "original"
  • this is technically difficult, but executed perfectly
  • etc...

I think this is generally an important issue, but maybe an AI will wander into a corner that we currently consider silly and boring, but is actually beautiful beyond human comprehension? One somewhat naïve way to go about this is that because I (and probably a lot of other mathematicians) intuitively consider long proofs that do not introduce novel concepts to be silly and boring, so if we could penalize longness and somehow recognize lack-of-novelty to penalize that as well, I think that would be a significant improvement over the purely stupid approach of training just a theorem prover.

EDIT: Also there's been work on training AIs to learn human preferences. For example, gwern trained GPT-2 to learn how to make good music. Also there's this Google Scholar search I just did which turns up a lot of more results that are kind-of-like-this. (I'm pretty sure there's more work/research in this area but I'm too lazy to find out.)


Last updated: Dec 20 2023 at 11:08 UTC