## Stream: Machine Learning for Theorem Proving

### Topic: New Technologies in Mathematics Seminar at Harvard

#### Jason Rute (Mar 03 2021 at 13:11):

In general, the Harvard New Technologies in Mathematics Seminar is focusing this semester on topics related to interactive theorem proving and machine learning. There have already been recorded talks by Josef Urban, Christian Szegedy, and others.

#### Jason Rute (Mar 03 2021 at 13:16):

Today's talk is by me. In that talk, which is in 7 hours, at 3pm EST (UTC-5) on Zoom, I'm going to discuss our machine learned Lean gptf tactic and the PACT method we use to train the current version. There will be a demo. Here is the abstract and Zoom link. I also think it will be recorded and put on YouTube. I think the best place to discuss this will be at the #lean-gptf stream (but we will be happy to field questions anywhere).

#### Jason Rute (Mar 05 2021 at 23:34):

As for my talk, the video is up on YouTube and the slides are here.

#### Jason Rute (Mar 05 2021 at 23:36):

We'd love for you to try out the tool. Join the discussion at #lean-gptf (or go straight to https://github.com/jesse-michael-han/lean-gptf for setup instructions.)

#### Floris van Doorn (Jan 25 2022 at 16:01):

Tomorrow there is a talk at the New Technologies in Mathematics Seminar that some might find interesting.
It is given by Alex Davies from Deepmind and titled "Machine learning with mathematicians"

#### Jason Rute (Jan 28 2022 at 04:40):

@Johan Commelin asked a good question at this talk about what sort of mathematical objects are representable in a neural network (NN). Here is my response to him (edited for clarity). I don't have a blog, but he thought it should be more widely shared.

In short, if it can be represented on a computer in a practical way, it likely can be represented in a neural network. But some structures are more easy than others. Here are some examples:

• Real numbers: A fixed sized array of real numbers is the easiest thing for a NN to work with both for input and output. (But even here, we assume they can be well represented as floating point numbers, so they can't be astronomically large or microscopically small or microscopically close together, at least not without additional rescaling.)
• Categorical values: A fixed list of labels (e.g. red-green-blue, up-down-left-right, Z mod p for a fixed p) is also easy to represent. For input, you can just assign each label a vector (this is called an embedding and the NN can learn the embedding for you--it's also basically equivalent but more efficient than one-hot encoding if you have heard of that). For output, this is also fairly standard. The NN makes a probability distribution over all the labels.
• Boolean values: These are a special case of categorical values, but can be handled even more easily and efficiently.
• Complex numbers: These can be represented just as pairs of real numbers (for imaginary and real parts).
• Images: There are good NN architectures for representing 2d images, or any 2-D grid of vectors which is "picture-like" (meaning the rows and columns represent evenly spaced positions in the plane).
• Matrices: A lot of mathematical objects can be represented as matrices. On one hand this is really easy to represent since it is just an array of reals (or bools or categorical values), but also many practical matrices may be huge and sparse. This presents challenges. There are frameworks for dealing with sparce arrays in NNs which may help here. (Also see about manifolds below.)
• Integers: Integers can be tricky. Sometimes it is best to just treat them as reals. Other times it is best to think of them as categorical values (but that can get tricky if your integers don't fall in a fixed list). There are other representations as well for integers using sin and cos functions that make it easy for the network to compare the relative distance of two integers (c.f. positional encodings).
• Points in a manifold with group structure: While I said you can represent complex numbers as real and imaginary parts, you could also represent it as the magnitude and argument, but note for the argument a naive representation from 0 to 2pi would treat arguments of .01 and 2pi -.01 quite differently. Hopefully, the NN would adapt, but there is also a growing body of literature of representing manifold data (where the manifold has a group structure) in a way which represents its symmetries. For example, there are nice ways to represent points on a sphere. The main property you want your NN to have in this case is that if you rotate the input then the output will be the same (the NN is invariant to rotations) or the output will be similarly rotated (the NN is equivariant to rotations). The speaker next week is an expert on this topic.
• Graphs: As you saw from Alex’s talk, there are good architectures for working with medium size directed graphs as input. These are called graph neural networks. A lot of data, like trees or logical formulas can be represented as graphs if you think about it the right way.
• Sequences: Arbitrary length sequences (including time series data) can be represented as input or output with sequence models like recurrent neural networks or transformers.
• Unordered sets: A set can be represented as a sequence, but also some architectures like Transformers are actually set models already. (Transformer encoders are invariant under permutations. They only become sequence models through the addition of positional encodings.)
• Symbolic/Text data: A special case of sequence-to-sequence models is where the input is just text or symbols. This isn't always the most refined way to input data, but it is one of the most flexible. It is a great catch-all for working with symbols, formulas, terms, sequence data, structured data, and the like. Also, the Transformer sequence models are some of the most powerful NN architectures around.
• Other: Basically a neural network isn't much more than a differential function and all these architectures are quite composable. While it is good when possible to take advantage of established architectures, it is entirely possible to invent a new NN component for a special use case. Again, I think any data which is representable on a computer can be put into a NN with a bit of work.

#### Zygimantas Straznickas (Jan 28 2022 at 05:06):

Great list! Some other stuff off the top of my head:

#### Alok Singh (S1'17) (Jan 28 2022 at 06:49):

i usually think of them as graphs (if finite/discrete i/o) or flows (if smooth/manifold)

#### Kevin Buzzard (Jan 28 2022 at 07:49):

A lot of the things on this list are a very long way from the mathematical objects I think about in practice (eg abstract rings, infinite topological groups, vector spaces over an arbitrary field). Does this matter?

#### Johan Commelin (Jan 28 2022 at 08:10):

What I took away from the talk (and this list) is that we should try to find invariants that are on the list (e.g. dimension, Euler-characteristic, etc...) when working with such abstract objects that you mention.

#### Johan Commelin (Jan 28 2022 at 08:12):

Part of the talk was explaining how DeepMind helped Geordie Williamson make progress on the combinatorial invariance conjecture about Khazdan-Lustzig polynomials. That was quite impressive.

#### Kevin Buzzard (Jan 28 2022 at 08:17):

Yes I talked to Geordie about the work recently. It certainly is impressive!

#### Johan Commelin (Jan 28 2022 at 08:25):

In cohomology of algebraic varieties (the field of maths that I really care about) there are dozens of integers floating around: Betti numbers, Hodge numbers, dimension of subspace of algebraic classes, Euler characteristic, Kodaira dimension, etc...

What I don't know is: how easily can we produce huge datasets of examples? And if we can, how do we make sure that these examples are "generic" instead of all belonging to "the easy subset"?

#### Patrick Massot (Jan 28 2022 at 08:28):

This is clearly a big obstacle. It's not a coincidence that those people investigated knot theory and combinatorics. Only a tiny proportion of maths satisfies the requirements of having numerical invariants that are easy to compute for lots of examples.

#### Kevin Buzzard (Jan 28 2022 at 08:41):

I agree -- this is the main problem with the method. Computing torsion in cohomology of Shimura varieties of dimension greater than one is very painful. Theorems get proved about these things by the pencil and paper crowd more often than examples are computed by the computer algebra crowd.

#### Jason Rute (Jan 28 2022 at 12:10):

It is certainly the case that for this line of work, you need to find areas of math which are amenable to computer representations and gathering large numbers of examples. It's hard for me to know how realistic this is. I've seen enough databases of mathematical objects both gathered by crowdsourcing (e.g. OEIS) and brute force enumeration (e.g. databases of all groups up to some size) that I think for some areas of math it is an untapped resource. (My old area of math however specifically dealt with uncomputable objects, so not as much one can do there. :smile:)

#### Jason Rute (Jan 28 2022 at 12:26):

Johan Commelin said:

What I don't know is: how easily can we produce huge datasets of examples? And if we can, how do we make sure that these examples are "generic" instead of all belonging to "the easy subset"?

The authors discuss this in the appendix (and it is the little detail that I really think shows they are thinking about this approach carefully). Indeed, you have to be careful since whatever dataset you pick will have statistical biases. (And ML models are statistical models.) For example, if your dataset is all groups up to order 1 million, then 99% have order 1024. :thinking: The trick the authors mention is to gather lots of different datasets, specifically looking for ones likely to generate counterexamples. Then (if feasible) rerun your analysis on each of these datasets, retraining the model from scratch. For example, If you are working with groups, you may want to also work with a database of simple groups (or even subfamilies of simple groups like cyclic groups), or families of commutative groups, or a database of groups with more diverse orders. One thing Alex said was a missed opportunity was once they found their family of counterexamples to their knot conjecture, they could have made that into its own dataset and re-trained the model on that adversarial dataset.

#### Alex J. Best (Jan 28 2022 at 15:50):

Jason Rute said:

For example, if your dataset is all groups up to order 1 million, then 99% have order 1024.

While I get the point you are making I would guess that 99% of groups of order at most 1 million will have order $2^{\lfloor\log_2(1000000)\rfloor}$ :grinning:

#### Junyan Xu (Jan 28 2022 at 15:56):

Johan Commelin said:

In cohomology of algebraic varieties (the field of maths that I really care about) there are dozens of integers floating around: Betti numbers, Hodge numbers, dimension of subspace of algebraic classes, Euler characteristic, Kodaira dimension, etc...

You'd be interested in Yang-Hui He's series of work, summarized in the recent slides:
https://dl3.pushbulletusercontent.com/NAms43DGGWVf7s3Q6MRpE0lB70zUbDJq/slides_He.pdf
"Universes as Big Data, or, Machine-Learning Mathematical Structures"
His work isn't cited in DeepMind's main Nature paper, but is cited in this commentary https://www.nature.com/articles/d41586-021-03512-4 (edit: it's cited in both)

By the way, as a mathematician I find the peer review opinions (published with the Nature paper as supplementary material) quite informative: https://dl3.pushbulletusercontent.com/d27MoJhGAfW82vzBRFybShB1vaoYWUTM/5_6159202253170279593.pdf

#### Junyan Xu (Jan 28 2022 at 15:58):

Jason Rute said:

For example, if your dataset is all groups up to order 1 million, then 99% have order 1024.

Has anyone counted them weighted by 1/#Aut? May help with training as well.

#### Junyan Xu (Jan 29 2022 at 00:07):

Geordie Williamson is currently speaking about his collaboration with DeepMind at UOregon's colloquium: https://math.uoregon.edu/seminars/colloquium
He's really into neural nets ...
"Representation theorists should understand matrices. I stare at these (weight) matrices, and I can't understand them."

Last updated: Dec 20 2023 at 11:08 UTC