Zulip Chat Archive

Stream: Equational

Topic: https://arxiv.org/pdf/2504.10101

Martin Dvořák (Apr 18 2025 at 13:33):

image.png
This sentence doesn't seem right. Should we tell the authors?

Shreyas Srinivas (Apr 18 2025 at 14:17):

Could you post the arxiv link in the chat?

Shreyas Srinivas (Apr 18 2025 at 14:18):

Links in topics are not clickable

Andrés Goens (Apr 18 2025 at 14:19):

https://arxiv.org/abs/2504.10101

Andrés Goens (Apr 18 2025 at 14:20):

(you can copy-paste it apparently, although I changed the pdf for abs because I find annoying that I don't necessarily want to download the pdf immediately)

Bryan Gin-ge Chen (Apr 18 2025 at 14:21):

(In the web client at least, hovering over a topic title with a link in it pops up an icon that lets you visit the link.)
Screenshot 2025-04-18 at 10.21.21 AM.png

Shreyas Srinivas (Apr 18 2025 at 14:22):

Yeah that statement is plain wrong. Published at an ICLR workshop at that

Shreyas Srinivas (Apr 18 2025 at 14:25):

In general, I don't think LLMs or foundational models played any technical part in finding proofs or refutations or writing blueprints. There was something about link predictions, but I don't think we used those predictions anywhere. Maybe someone else can correct me.

Shreyas Srinivas (Apr 18 2025 at 14:35):

They might have aided some of us on small stuff especially for cursor or copilot users.

Terence Tao (Apr 18 2025 at 15:06):

It occurs to me that our paper needs to state what we don't do in addition to what we do. So, we didn't up extensively using LLMs to locate or formalize proofs (other than through autocomplete tools like Github copilot), and didn't use any formal -> informal translation tools. Would these remarks fit in the "project management" section, or go elsewhere?

Terence Tao (Apr 18 2025 at 15:07):

I think the main use of LLMs was to help design GUI interfaces around the project (e.g., Equation Explorer was first built using Claude)

Terence Tao (Apr 18 2025 at 15:09):

Though I guess we did have some cases where Vampire or some other ATP produced a formal proof that say a greedy algorithm worked, and we had some way to produce human readable output, that humans then converted into a human readable proof

Alex Meiburg (Apr 18 2025 at 15:23):

I'm actually wondering if they might have gotten a couple wires crossed with the PFR project (or another blueprint-driven project); I think for Equational Theories, it was a fairly small fraction of ideas that went Blueprint -> Lean, a lot went in the other order.

Alex Meiburg (Apr 18 2025 at 15:24):

something like Zulip / Python / Vampire -> (Lean <-> Blueprint) -> Paper

Alex Meiburg (Apr 18 2025 at 15:26):

But I think was most likely just a case of them thinking sufficiently often about LLMs transforming science, that when they see "computer-assisted mathematics" they assume the assistance must be in the form of an LLM.

Shreyas Srinivas (Apr 18 2025 at 18:09):

Terence Tao said:

Though I guess we did have some cases where Vampire or some other ATP produced a formal proof that say a greedy algorithm worked, and we had some way to produce human readable output, that humans then converted into a human readable proof

Given the venue, I don’t think they are talking about our ATP usage in particular.

Shreyas Srinivas (Apr 18 2025 at 18:12):

Secondly, even if they were including classical ATPs like Vampire, we took vampire proofs and semi automatically translated them into lean proofs. The ways we guided vampire were determined on Zulip by us. I don’t recall deep neural networks helping informalise the lean proofs in any way

Shreyas Srinivas (Apr 18 2025 at 18:29):

I took the time to skim the paper a bit. Overall they are looking at our project as a case study for how LLMs can be aligned to decompose various tasks in various domains in a meaningful way. The inaccuracy in that line doesn’t seem to be replicated elsewhere. But it would have to be amended to not imply that the informal blueprints were derived from formal tools using software automation

Terence Tao (Apr 19 2025 at 18:04):

Alex Meiburg said:

something like Zulip / Python / Vampire -> (Lean <-> Blueprint) -> Paper

Inspired by this, I tried (with some LLM assistance) to create a diagram to illustrate the flow of proofs in the project. Here is my first attempt:
image.png
If this looks helpful, I can incorporate some version of this into the paper.

Shreyas Srinivas (Apr 19 2025 at 18:34):

It’s missing GitHub comments on pull requests.

Shreyas Srinivas (Apr 19 2025 at 18:35):

Also for these diagrams I think PowerPoint is simpler

Shreyas Srinivas (Apr 19 2025 at 18:35):

Or even Ipe

Shreyas Srinivas (Apr 19 2025 at 18:36):

I think it would be nice to put this in the project management as a subsection about “how it worked in practice”

Terence Tao (Apr 19 2025 at 19:54):

Hmm, that may fit better on a separate diagram that describes the project workflow process, as opposed to the current diagram which is more capturing the flow of proof ideas (though there is overlap, in particular many of the arrows in the current diagram would often be the goal of an individual project task).

Here by the way is a slightly tweaked version of the previous diagram with different types of arrows: boldface for mainly human-directed tasks, dashed for automated tasks, and semi-dashed for semi-automated tasks that still require significant human input. I couldn't figure out a good place to put in PRs which are part of how the arrows actually get implemented, but didn't have much direct impact on the generation of proof ideas and their conversion to a formal proof.

image.png

Terence Tao (Apr 19 2025 at 19:57):

There were also some edge cases not covered by the above diagram, for instance if a formalization of a human proof revealed a gap, which then stimulated further human discussion. So it's more of a diagram of how things mostly worked, rather than how they universally worked.

Alex Meiburg (Apr 19 2025 at 20:04):

I think it's a useful diagram to help others understand how it worked in practice.
Weren't there some cases of "ATPs and other software" going straight to "Lean formalization"? Like, it finds a proof by saturation, and this was then written to Lean? Or did that happen differently / very rarely

Terence Tao (Apr 19 2025 at 22:07):

We had some proofs written by egg or duper, which could compile in Lean given the right imports, but I think we decided to use a vanilla version of Lean without these imports and so some semi-automated transcription was required, but perhaps someone who was more involved on that side of the project can correct me.

Terence Tao (Apr 20 2025 at 00:22):

I've added the figure for now to the paper, will have to figure out exactly where to place it and discuss it later. Perhaps this will be clearer when the project management section is closer to complete.

Shreyas Srinivas (Apr 20 2025 at 11:40):

Has one of us already written to the authors?

Shreyas Srinivas (Apr 20 2025 at 12:00):

Otherwise I would suggest a coordinated email on behalf of all of us, with maybe some on CC.

Terence Tao (Apr 20 2025 at 18:50):

Given that it is just half a sentence used as a data point in a broader paper that we want to clarify, it maybe doesn't need such major coordination; a single informal email I think from any of us would suffice.

Shreyas Srinivas (Apr 22 2025 at 12:10):

Would anybody like to volunteer to write this email?

Shreyas Srinivas (Apr 24 2025 at 08:27):

Since we have no volunteers, I will write to them in a few hours. If someone has already done so, please let me know asap

Last updated: May 02 2025 at 03:31 UTC