Zulip Chat Archive

Stream: Equational

Topic: Contributing partial progress

Nicholas Carlini (Sep 29 2024 at 16:42):

It might be useful in this project to have some organized way to contribute "partial progress" towards some end goal in a way that's visible. Right now, for automatically generated proofs, it's hard for someone else to know what partial progress has been made in a particular direction because only the final Lean proofs are added to Generated/.

For example, in a few different places I've mentioned that I've enumerated all 4^(4*4) magmas and can derive a bunch of refutations from this. But I don't have the Lean knowledge to actually formalize this yet (or the time right now), and so there's not a good way for me to add this to someone other people can see.

This means other people have been duplicating work on this problem, when it would be better to continue from where I left it off. So it would be great if I could put these partial results somewhere so someone else could pick up on it and finish formalizing it, much like is being done in #19 for the finite polynomials.

I've noticed other people have made similar partial progress on generating candidate refutations using different approaches, too, but don't have anywhere to "broadcast" their partial progress.

I don't have any great solutions, but off the top of my head I could imagine:

Committing a PR with the partial solution to Generated/ (or a new location? WIP/)
Submitting a PR to create a branch on your local fork of the project, and merging new changes into that fork until the work is complete.
Track partial progress on an issue

For generated solutions, it would also be great to have a way to share data. Enumerating over all 2^32 magmas takes some time, but after it's done, I can reduce it down to a set of just 500k "unique" magmas (in that: they partition the equations differently). So sharing this file would save others from the compute time of running the brute force search again.

Daniel Weber (Sep 29 2024 at 16:43):

I believe that's exactly what you should use conjecture for (maybe that's not a great name) --- statements we're fairly sure are true, but don't have a proof in Lean yet

Shreyas Srinivas (Sep 29 2024 at 16:43):

Currently any issue for which a PR has been proposed is in the "In progress" column of the project dashboard taskview.

Shreyas Srinivas (Sep 29 2024 at 16:44):

So if you coordinate on this issue, and comment on it, you can already avoid the redundant work part

Joachim Breitner (Sep 29 2024 at 16:45):

Nicholas Carlini said:

For example, in a few different places I've mentioned that I've enumerated all 4^(4*4) magmas and can derive a bunch of refutations from this. But I don't have the Lean knowledge to actually formalize this yet (or the time right now),

As long as you put them in a file similar for the polynoimals, the actual expression in Lean will be rather similar.

(Not addressing the problem of how to efficiently avoid duplicate work, but that’ll get better as the initial rush ebbs down.)

Nicholas Carlini (Sep 29 2024 at 16:46):

Daniel Weber said:

I believe that's exactly what you should use conjecture for (maybe that's not a great name) --- statements we're fairly sure are true, but don't have a proof in Lean yet

Ah that's great. The way to do this would be to submit the source to Generated/ like the other code, and then Conjectures to theorems/?

Shreyas Srinivas (Sep 29 2024 at 16:46):

One thing you don't want to do is put everything in one PR

Shreyas Srinivas (Sep 29 2024 at 16:47):

If you can break the task into smaller chunk and post separate issues, one of the maintainers can look at it and make it a project task.

Shreyas Srinivas (Sep 29 2024 at 16:48):

And yeah you can use conjecture and proof_wanted to leave stubs

Shreyas Srinivas (Sep 29 2024 at 16:50):

As I understand it, GitHub projects also allows creating roadmaps. I am not on my machine so I can't explore it right away

Nicholas Carlini (Sep 29 2024 at 16:50):

What's the best way to host data? 500k tables isn't enormous, but I don't think I want to be committing 66MB to the git repo. (Although in this case, it'll probably shrink considerably once I dedup against the ones already resolved in #19).

Joachim Breitner (Sep 29 2024 at 16:50):

Nicholas Carlini said:

Ah that's great. The way to do this would be to submit the source to Generated/ like the other code, and then Conjectures to theorems/?

I thought you wanted to avoid the lean part :-). Just put a .txt file in a similar format somewhere, and I’ll pick it up from there

Joachim Breitner (Sep 29 2024 at 16:51):

If the file is still too large, then maybe not yet. Can you use the list of known implications to reduce it to list only interesting facts (in this case you have to list both equations satisfied and non-satisfied in your format)

Shreyas Srinivas (Sep 29 2024 at 16:55):

We could try integration with kaggle or zenodo

Terence Tao (Sep 29 2024 at 17:13):

Yes, the conjecture keyword is suitable for auto-generated claims whose statement is formalized in Lean, but the proof is not yet available (and some subsequent task would be to convert them to a theorem). It is functionally similar to proof_wanted on the front-end, but at the back-end we would like you to use conjecture instead of proof_wanted because the former contains some metadata that will allow us to easily scoop up all the conjectures present in a given dependency tree of Lean files. (We are also currently talking about similarly converting theorem to another functionally similar keyword that would carry similar metadata, see #Equational > Lean script for extracting current set of implications )

David Renshaw (Sep 29 2024 at 17:22):

We are also currently talking about similarly converting theorem to another functionally similar keyword that would carry similar metadata

Current plan is to keep these using the theorem keyword and to add an attribute @[equational_result]. (Discussed in the linked thread.)

Nicholas Carlini (Sep 29 2024 at 17:30):

Joachim Breitner said:

I thought you wanted to avoid the lean part :-). Just put a .txt file in a similar format somewhere, and I’ll pick it up from there

I've reduced the set of 500k tables down to just 500 "covering tables". This loses some interestingness data (about implications A=/>B that we can only refute due to a very small number of tables) but we can recover that later. I'll submit a PR with this partial data shortly.

Joachim Breitner (Sep 29 2024 at 17:37):

That's sensible, thanks.

Joachim Breitner (Sep 29 2024 at 17:43):

Is the data set with the polynomial based ones also reduced in the same way?

And do any of the two subsume the other?

Nicholas Carlini (Sep 29 2024 at 17:44):

It is "mostly" reduced. I did a bit more fancy reducing on this one

Nicholas Carlini (Sep 29 2024 at 17:44):

For the polynomials, I process them in order, and only add a new polynomial if it covers something not seen previously.

Joachim Breitner (Sep 29 2024 at 17:45):

But this does not yet prune based on known implications, right?

Nicholas Carlini (Sep 29 2024 at 17:45):

But it is possible that adding one polynomial then makes a previous one redundant. I didn't handle that case for these. I added that logic for this new set because it brings me from 500k -> 1.5k for the initial prune, then down to just 500 after several rounds of pruning.

Nicholas Carlini (Sep 29 2024 at 17:45):

Sorry what do you mean?

Joachim Breitner (Sep 29 2024 at 17:47):

We know Equation92 implies Equation91.
So if a magma satisifies 92, we don't have to also include 91 in the list of satisfies equations.
Likewise, if another magma does not satisfies 91, then we don’t have to include 92 in the list of refuted equations.
I expect pruning the list of equations to prove for a given example magma this way will bring file size and processing time down significantly.

Joachim Breitner (Sep 29 2024 at 17:48):

Although we can do that pruning when converting from the data files to the lean files, if you’d rather keep the data files complete.

Nicholas Carlini (Sep 29 2024 at 17:50):

I don't think trimming those out works . Suppose I have a magma with equations A, B. Suppose also A=>B. I need to keep them both in, because this set tells me lots of A =/> X and B =/>X for many X. If I removed B, you lose all of these. Right?

Shreyas Srinivas (Sep 29 2024 at 17:50):

There are already ruby scripts in the scripts folder for doing this

Shreyas Srinivas (Sep 29 2024 at 17:53):

https://github.com/teorth/equational_theories/blob/main/scripts/transitive_reduction.rb

Joachim Breitner (Sep 29 2024 at 17:55):

Nicholas Carlini said:

If I removed B, you lose all of these. Right?

No, why? Your data set says “M satisfies A and does not satsify X”. That suffices, doesn’t it? Downstream tools (those drawaing the graphs) will know that A → B, and thus also B /→ X – no need to waste CPU time on proving that M satisfies B.

Joachim Breitner (Sep 29 2024 at 17:56):

Shreyas Srinivas said:

https://github.com/teorth/equational_theories/blob/main/scripts/transitive_reduction.rb

That’s for implications, isn’t it? Not quite what we are looking for here – we want to reduce antiimplications given a set of known implications.

Daniel Weber (Sep 29 2024 at 17:58):

I can write something in Lean once equational#93 is merged, what API should it have?

Nicholas Carlini (Sep 29 2024 at 17:58):

Okay you're probably right. I don't do this now yet. If you want to reduce it further and can derive the same results that seems worth doing

Joachim Breitner (Sep 29 2024 at 17:59):

Right. But if your data files (reduced by you) are of reasonable size, then it makes sense to make that reduction when converting to lean.

Nicholas Carlini (Sep 29 2024 at 17:59):

Yeah they are. Will have the PR up shortly

Joachim Breitner (Sep 29 2024 at 18:00):

I’m running out of time for today.

Once https://github.com/teorth/equational_theories/pull/110 is merged then https://github.com/teorth/equational_theories/pull/19 (which converts the polynials) is ready, I’d say. It doesn’t do this reduction yet, but maybe we can merge that already and iterate on main.

Nicholas Carlini (Sep 29 2024 at 18:03):

Code is up at https://github.com/teorth/equational_theories/pull/111

Joachim Breitner (Sep 29 2024 at 18:32):

Here is something that you maybe can do already, similar to your covering reduction.
Say you have one M1 satisfying [1,2] and refuting [3,4].
And M2 satisfies [1] and refutes [2,3,4]. Then testing 3 and 4 doesn't really tell us anything new, And it would suffice to look at equation 1 and 2 here.

I suspect that this will reduce the data set quite some more.

Even without this it would be good to include in the data set both lists, satisfied and refutes, so that it's clear what it's saying even when the list of equations is extended (which I believe there are plans for). Or does that make them too long (even with the a above reduction)?

Nicholas Carlini (Sep 29 2024 at 18:34):

Yeah that's a good optimization to apply. I don't think it would make them too long, I just didn't for now because right now the sets are guaranteed to be complement of each other.

Last updated: May 02 2025 at 03:31 UTC