Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: New paper: transformers for symbolic regression

Zygimantas Straznickas (Jan 15 2022 at 17:08):


Symbolic regression, i.e. predicting a function from the observation of its values, is well-known to be a challenging task. In this paper, we train Transformers to infer the function or recurrence relation underlying sequences of integers or floats, a typical task in human IQ tests which has hardly been tackled in the machine learning literature. We evaluate our integer model on a subset of OEIS sequences, and show that it outperforms built-in Mathematica functions for recurrence prediction. We also demonstrate that our float model is able to yield informative approximations of out-of-vocabulary functions and constants, e.g. bessel0(x)≈sin(x)+cos(x)πx√ and 1.644934≈π2/6. An interactive demonstration of our models is provided at this https URL.

Jason Rute (Jan 15 2022 at 19:01):

Nice! I’ve been curious what Lample has been up to.

Junyan Xu (Jan 16 2022 at 16:48):

I don't expect their setup (with an upper bound for recurrence degree) or a normal human to be able to guess the $10000 sequence or recurrence relations from games like this one in Winning Ways, but it also does no good on the look-and-say sequence with recurrence degree 1. Probably their operator family isn't powerful enough to express runlength.

Kevin Buzzard (Jan 16 2022 at 16:55):

yeah I tried it with a sequence which had come up in my research and it didn't get it (I gave it 20 strictly increasing positive integer terms all of which were odd and it predicted even numbers for the next three, the first one of which was not greater than the last of my inputs). But this is very much an experimental science! As a number theorist if I see a conjectural formula for my sequence which is 96% right then as far as I'm concerned it's wrong, whereas I would imagine that people who work in less black and white areas might have different opinions.

Jason Rute (Jan 17 2022 at 04:44):

Junyan Xu said:

it also does no good on the look-and-say sequence with recurrence degree 1.

Do they represent the input sequence in decimal? The look and say sequence would be quite difficult for a human if given in any base but decimal (and without giving the sequence name). 1, 9, 17, 649, 37521, 103561, 2921617, 153949833, … Also it is a hard sequence to write a formula for this sequence, even in decimal, without string operations (which I don’t think they use in this paper).

Jason Rute (Jan 17 2022 at 05:01):

@Kevin Buzzard Does your sequence have a nice formula (or do you think/hope it does)? I think this system is only designed to give short formulas right now. Also, as a design choice they decided to have it give solutions even if it doesn't find an exact fit. For some cases (like floats in physics) I could see this being useful. But for math, I agree usually an approximate solution is meaningless, and you probably should just interpret it as saying "no solution found". I could also imaging the increasing and odd issue are also consequences of that, if it is optimizing for the wrong thing (like say least squares regression) when it can't find an exact fit. (I still need to read the paper, so I might be mistaken on my understanding of how this works.)

Jason Rute (Jan 17 2022 at 05:04):

I also wonder if there is any feedback mechanism where the model looks at the output and scores how well it thinks it follows the pattern. That would be a way for it to catch say the increasing and odd issue, but I don't think many ML models do this sort of thing.

Kevin Buzzard (Jan 17 2022 at 07:58):

My expression had a closed form involving valuations of factorials. I'm just about to go to an airport but a simpler example might be: let a(n) be the number of powers of 2 showing up in the prime factorisation of factorial(n).

Zygimantas Straznickas (Jan 17 2022 at 17:42):

Looking at the paper, the (integer section of the) model only supports expressions inductively defined by operations abs, sqr, sign, relu, add, sub, mul, intdiv, mod. So it will fail on a recurrence that can't be generated this way. And using their currentl method, adding more operations will likely increase training time exponentially.

(also, fun language-model-engineering detail: they're representing integers in base 10000)

Junyan Xu (Jan 17 2022 at 20:11):

Yeah the operations are too restricted, and I think in a world with Codex, they should try something Turing complete, maybe assembly, and use their transformer to generate (or "synthesize") programs that generates the input sequence, instead of just formulas. Since @Jason Rute mentioned the base, I looked and found that b=10000 is what they use, and as a result their vocabulary has ~10000 words. In regards to look-and-say, b=10000 is worse than b=10 but better than a random base like 2, but indeed even for b=10 there isn't a simple formula. Look-and-say has asymptotic doubly-exponential growth as a integer sequence, but their output doesn't capture this, probably because the input is integer instead of float.

Last updated: Dec 20 2023 at 11:08 UTC