Zulip Chat Archive

Stream: Equational

Topic: Employing the newest AI tools


Jose Brox (Dec 10 2025 at 00:07):

At the end of his new blog entry on the ETP, @Terence Tao states:
"Interestingly, modern AI tools did not play a major role in this project (but it was largely completed in 2024, before the most recent advanced models became available); while they could resolve many implications, the older “good old-fashioned AI” of automated theorem provers were far cheaper to run and already handled the overwhelming majority of the implications that the advanced AI tools could. But I could imagine that such tools would play a more prominent role in future similar projects."

If we wanted, could we try to "redo" the project with Aristotle or some other advanced AI tool, would it be feasible? Can we make some proof of concept (asking the tool not to consult the ETP data)? Could them solve our open finite implication?

Shreyas Srinivas (Dec 10 2025 at 01:12):

In principle applying aristotle on the open implication (677 vs 255) should be fair game. Redoing the whole project is a painful task.

Terence Tao (Dec 10 2025 at 01:45):

Maybe the list at #Equational > Outstanding equations, v1 would be a good benchmark for this sort of thing.

David Renshaw (Dec 10 2025 at 02:01):

Note that the open implication was added to Formal Conjectures: https://github.com/google-deepmind/formal-conjectures/blob/main/FormalConjectures/Other/EquationalTheories_677_255.lean

Bruno Le Floch (Dec 10 2025 at 07:01):

Is it worth compiling a list of the hardest implications (distilled from #Equational > Which implications are harder for ATPs ), and the hardest non-implications (those listed by Terry Tao, but also those with the largest finite counterexamples), and a few with higher order (whether 13102 has finite models, some from the Higman-Neumann subproject)?

Terence Tao (Dec 10 2025 at 15:39):

Nice idea! I might start a rudimentary blueprint chapter on this later today.

Jose Brox (Dec 10 2025 at 20:19):

Will this list be for others to try, or will we try to work on this?

Jose Brox (Dec 10 2025 at 20:19):

Regarding the subject of this thread, I think we should consider two related but different questions:
1) Can advanced AI tools replicate some part of the ETP from the scratch, without using any of our data? How far will they be able to get?
2) Given our current data, can advanced AI tools progress further? Solve the open questions? Produce some nontrivial order-5 results by themselves? Can they pose interesting new questions?

Alex Meiburg (Dec 10 2025 at 21:43):

A couple people, myself included, have independently tried Aristotle on the 677->255, to no avail. :)

Jose Brox (Dec 10 2025 at 22:08):

Alex Meiburg ha dicho:

A couple people, myself included, have independently tried Aristotle on the 677->255, to no avail. :)

Those tries, were leveraging the ETP data?

Alex Meiburg (Dec 10 2025 at 22:11):

No, and I'm not sure how we could leverage the /data/. As in, I don't know of any facts (in Lean or in the Equation Explorer or anything) that one would expect to help with proving 677-/->255. Since it's already very "minimal", I don't think the other implications would help much.

I do think we could try to leverage the other facts that we've worked out on Zulip, like the implications of left- and right-cancellativity, surjectivity, idempotence, the necessity of nonlinearity, and all that jazz. I tried to give these 'hints' to Aristotle a couple different ways, but it did not seem to make effective use.

Bruno Le Floch (Dec 10 2025 at 22:12):

An easier tractable problem could be to prove minimality of counterexamples such as the 15-element witness to not(E1518⊨E47), or the 11-element witness to not(E854⊨E10). For the latter, a surprising fact is that there are (at least) hundreds of thousands of counterexamples with 11 elements.

Another idea would be to seek the implication graph for surjunctive magmas, which is intermediate between the infinite and finite graphs, so really it reduces to 822 implications, or even less when accounting for transitivity.

Alex Meiburg (Dec 10 2025 at 22:13):

677-/-255 is a "funny" problem, that we fully expect there to be a small (as in, something you could write down explicitly on a sheet of paper) example, that will be immediately easy to verify. But just finding that example seems hard. The flow of our research on it seems rather different from 'standard' math, then, as a result... I dunno. Aristotle seemed kind of confused by the setup. There's not much for it to "do" until it comes up with the example, and then boom, it would be trivial to prove.

David Renshaw (Dec 10 2025 at 22:16):

I wonder whether it would be a suitable problem for an AlphaEvolve-like setup.

Jose Brox (Dec 10 2025 at 22:49):

Alex Meiburg ha dicho:

No, and I'm not sure how we could leverage the /data/. As in, I don't know of any facts (in Lean or in the Equation Explorer or anything) that one would expect to help with proving 677-/->255.

It is possible that we don't know that facts we already have would help proving the implication, for me that's one of the main reasons of the usefulness of LLMs, so I think ideally we shouldn't filter them a priori, but let an AI tool to integrate all our data, then reason with it (I don't know how Aristotle works). If the implication is true, perhaps some combination of other proofs we already have, or very similar, can do the trick, and the AI can find it. If the implication is false, perhaps the AI finds some unexpected relationship between models of different laws to make the right model, or can read our paper and blueprint and find the right model with our techniques, etc.

Terence Tao (Dec 11 2025 at 00:44):

I've started a blueprint chapter on "hard implications". It has three categories so far; more are on the way in a PR that is in the process of being merged right now.


Last updated: Dec 20 2025 at 21:32 UTC