Zulip Chat Archive

Stream: Equational

Topic: Thoughts and impressions thread


Terence Tao (Sep 26 2024 at 21:49):

I am thinking that for posterity (in particular, in a future paper reporting on the outcomes of this project) it would be good to have a thread in which we record, in approximate real time, our impressions of the project as it evolves. For PFR we had a retrospective thread after completing the primary goals, but I would like to get ahead of the game and try to record thoughts and impressions while they are still fresh. I will make some observations below, and plan to return to this thread regularly to add more thoughts as they arise; meanwhile I encourage all other participants to also add thoughts whenever they come to mind.

  • The initial response has been quite enthusiastic, and the pace is significantly faster than I envisaged. The project is receiving a lot of technical support at the back end, and formalization and mathematical tasks are being completed extremely rapidly. Here I think this project has an advantage over other Lean projects in that the atomic tasks to complete are very small and most of them do not require extensive expertise in either Lean or math.
  • It was important early on to have an agreed upon numbering system for the equations under study, which will hopefully stay stable for the duration of the primary goal of the project. Even refactoring eleven equations led to a lot of typos, I am glad I did not postpone the standardization of the numbering system until the project beame far larger!
  • While automated tools, e.g., for autocompleting graph implications, will be absolutely necessary as the project scales larger, in the immediate term it seems that manual management of the project (e.g., through the now-standard "outstanding task" thread, and also through manual updating of the latest version of the Hasse diagram) is working for now. In the very short term, it's easier to deploy humans than to deploy automated tools :big_smile:
  • Having some API for Magma relations will eventually be helpful. We are already beginning to see some general statements about equational laws (e.g., the observation that any equation of the form x = f(y,z,w,u) is equivalent to the equation x=y) that might be worth formalizing, but with our current setup we can only handle a few equational laws at a time.
  • The existing infrastructure of github, zulip, blueprint, and also my blog, is holding up well so far. I anticipate some scaling issues as the project gets bigger, though.
  • This is perhaps invisible to other users, but I am finding Github Copilot very useful for the mundane task of entering in new Lean theorems to prove, or updating the blueprint to incorporate the latest PR'ed results. Basically it provides an easy way to translate between Lean and LaTeX: for instance to convert Lean to LaTeX I paste in the Lean code as a comment, start typing the LaTeX, and it usually autocompletes to what I want, particularly if several examples of similar text is already in the file. The text in both Lean and LaTeX is very standardized (to assist in machine readability), and this seems to increase the effectiveness of Github Copilot.

That's all my thoughts for now. More later, but in the meanwhile please feel free to add your own comments in this thread.

Shreyas Srinivas (Sep 26 2024 at 21:52):

  1. We can try setting up a trello board or some equivalent for managing the outstanding tasks queue. I'll take a look into good options for this.
  2. I find that this format of collaboration is great for those who are quick thinking on their feet, but for those who work at a slower pace, most tasks get taken and finished. So it favours a certain competitive mindset geared for speed.

Shreyas Srinivas (Sep 26 2024 at 21:56):

A little bit of digging later, I think we can create a Github Project to include the repo in. @Pietro Monticone I want to make sure this does not affect the blueprint

Shreyas Srinivas (Sep 26 2024 at 21:57):

To see what I am aiming for check this video from this time point : https://youtu.be/oPQgFxHcjAw?t=231

Terence Tao (Sep 26 2024 at 22:00):

Re the speed issue: I'm hoping that by adding more equations into the mix at a suitable rate, there will be enough tasks for people with different paces to participate in. In particular I anticipate that while the "A" and "B" tasks may be quite fast (and thus snapped up by people with a more competitive streak), the "C" tasks are more complex and require a more thoughtful approach, and perhaps suited for people with a different mindset. Hopefully we can also add some other types of tasks to the list than what we have currently.

Shreyas Srinivas (Sep 26 2024 at 22:03):

@Terence Tao : Do you want to give the task interface in that video a try?

Shreyas Srinivas (Sep 26 2024 at 22:03):

It seems early enough in the project to experiment a bit

Shreyas Srinivas (Sep 26 2024 at 22:03):

The idea is to set up a Kanban board

Shreyas Srinivas (Sep 26 2024 at 22:03):

connected to issues and PRs.

Bhavik Mehta (Sep 26 2024 at 22:10):

A concern I have is that it's not currently clear (from eg the Hasse diagram) which non-implications have been proven. For instance, there's an absence of an edge between equations 46 and 4, and this could mean any one of "both implications are unknown", "one implication is false and the other is unknown", "both implications are false".

Bolton Bailey (Sep 26 2024 at 23:30):

Just a thought: It might be nice if (in lieu of / addition to) the usual dependency graph, there was also a page for the Hasse diagram as it evolves. @Bhavik Mehta raises a good point that there is a wrinkle in that some relations will not be known. Perhaps we can develop some visual language for the 3 * 3 = 9 possible states of knowledge about any pair of equations, just as we have for usual theorems in the dependency graph.

Edward van de Meent (Sep 26 2024 at 23:38):

it might also be good to have (perhaps auto-generated) proof_wanted statements for some smallest set of statements (saying that certain implications don't exist) which are sufficient to prove "every possible implication is in the (reflective)transitive closure of the proven implications"

Vlad Tsyrklevich (Sep 27 2024 at 15:18):

I am really held-up by how slow lean is just loading a file with just the 4.7k definitions, with mathlib factored out--it takes me a minute locally. Unless I am missing some obvious optimization, this really slows down my bruteforce efforts. I could batch things more, but that has it's own problems.

Yakov Pechersky (Sep 27 2024 at 15:23):

Undoubtedly this effort could be solved by a brute force approach. But does that generate mathematical insight, or insight on how to collaborate on a mathematical project using Lean and human and computer contribution? I might be missing the point, but I understand the effort to be more about trying to explore _how_ people contribute and one sets up interfaces for contribution, interpretation of proven or unproven goals, high level autogenerated overviews. And not so much about just populating the Hasse graph.

Yakov Pechersky (Sep 27 2024 at 15:25):

Taking my thought to the extreme, if at the end of the day, one could have the full Hasse diagram and a certificate that it was complete, but a total black box on how it was generated; or no diagram but a high level organizational framework of how to contribute, what would one choose?

Joachim Breitner (Sep 27 2024 at 15:25):

I am really held-up by how slow lean is just loading a file with just the 4.7k definitions, with mathlib factored out--it takes me a minute locally. Unless I am missing some obvious optimization, this really slows down my bruteforce efforts. I could batch things more, but that has it's own problems.

Do you need to load the file? The equations can be put in a separate file, and then you can import them. I’m doing that on my branch, and is fast enough.

Terence Tao (Sep 27 2024 at 15:53):

Yakov Pechersky said:

Taking my thought to the extreme, if at the end of the day, one could have the full Hasse diagram and a certificate that it was complete, but a total black box on how it was generated; or no diagram but a high level organizational framework of how to contribute, what would one choose?

Well, if these were the only two choices, I would select the latter; but I think this is a false dichotomy. Every automated effort to generate implications can be documented - for instance, as a separate chapter in the blueprint - and this can be a useful resource for any future efforts of this type.

And even though right now all the implications we are filling in are "easy", I do expect that we will start grappling with increasingly nontrivial mathematical tasks once all the low hanging fruit is cleaned out. I am pretty sure that these implications actually become mathematically undecidable past a certain level of complexity, and if the project ends up generating a hard core of say a thousand very nontrivial implications, this could be a useful benchmark to have for other applications, e.g., testing out modern AI reasoning capabilities.

Terence Tao (Sep 27 2024 at 15:58):

Recording some very quick thoughts:

  • This project has moved far, far quicker, and scaled up much much faster, than I had expected - only 48 hours in and already a large fraction of the implications are likely to be resolved soon! I thought the 3-week PFR project was fast, but this is an insane additional level of speed. It seems that projects that consist of extremely large numbers of independent pieces, each of which are easy to understand and attackable by a variety of methods, are a very good use case for these crowdsourced projects, and can really leverage the "wisdom of crowds" in ways that a highly centralized project cannot.
  • In particular, this is soon going to be a project that cannot be directed by just a small number of people; I think we will have to decentralize it into a loosely coordinated set of smaller groups working on different aspects of the problem. I am not sure exactly how we can facilitate that transition, but am looking forward to how the dynamics of the project evolve.
  • automated tools to visualize the status of the project are going to be needed quite soon. Certainly the hand-drawn Hasse diagram that is supposed to guide the "outstanding tasks" is rapidly reaching the limits of usability.

Vlad Tsyrklevich (Sep 27 2024 at 16:14):

Joachim Breitner said:

I am really held-up by how slow lean is just loading a file with just the 4.7k definitions, with mathlib factored out--it takes me a minute locally. Unless I am missing some obvious optimization, this really slows down my bruteforce efforts. I could batch things more, but that has it's own problems.

Do you need to load the file? The equations can be put in a separate file, and then you can import them. I’m doing that on my branch, and is fast enough.

Ah, I think I need to learn more about the build process. I'm new to lean and I suspect I may just be ignorant of the right way to do this. I'll investigate now

Vlad Tsyrklevich (Sep 27 2024 at 16:47):

Vlad Tsyrklevich said:

Joachim Breitner said:

I am really held-up by how slow lean is just loading a file with just the 4.7k definitions, with mathlib factored out--it takes me a minute locally. Unless I am missing some obvious optimization, this really slows down my bruteforce efforts. I could batch things more, but that has it's own problems.

Do you need to load the file? The equations can be put in a separate file, and then you can import them. I’m doing that on my branch, and is fast enough.

Ah, I think I need to learn more about the build process. I'm new to lean and I suspect I may just be ignorant of the right way to do this. I'll investigate now

Well, I just learned about lean -c and that gave a 10x speed-up for brute force.

Nicholas Carlini (Sep 27 2024 at 16:55):

I didn't see this thread until now. I've generated 19,000 proofs of the form

theorem Equation4652_implies_Equation4640 (G : Type*) [Magma G] (h : Equation4652 G) : Equation4640 G := λ x y z => h x y y z
theorem Equation4652_implies_Equation4642 (G : Type*) [Magma G] (h : Equation4652 G) : Equation4642 G := λ x y z => h x y z x
theorem Equation4652_implies_Equation4646 (G : Type*) [Magma G] (h : Equation4652 G) : Equation4646 G := λ x y z => h x y z y
theorem Equation4653_implies_Equation4596 (G : Type*) [Magma G] (h : Equation4653 G) : Equation4596 G := λ x y z w => h x x y z w

Hopefully this doesn't duplicate what you're doing?

Nicholas Carlini (Sep 27 2024 at 18:14):

@Vlad Tsyrklevich I've increased the number of automatic solutions to 54k and put them here https://github.com/teorth/equational_theories/pull/23

Terence Tao (Sep 28 2024 at 13:28):

Thoughts on day 3 of the project. (Can't believe that it has only been three days.)

  • After an extremely hectic two days, things are beginning to stabilize. The new Github project/issues workflow to manage tasks like completing the subgraph of implications is significantly more convenient on my end than the older approach of manually updating an "Outstanding tasks" thread, which was becoming extraordinarily time-consuming. It is perhaps too early to say for certain, but at this point I would recommend other collaborative Lean projects transition over to a Github project for task management.
  • I wanted to highlight a very insightful comment on my blog from Tristan Stérin (founder of the BusyBeaver challenge) on how to manage these sorts of "high-entropy" crowdsourced formal projects.
  • Once we have automated tools to compile all the known implications and anti-implications from all submitted Lean files, and some way to display this data in various formats on the main web pages of the project, it seems that we can decentralize the operation of the project quite significantly; while some subprojects such as the Subgraph project would still require some central direction, I think we could be ready to open the door to arbitrary groups using arbitrary approaches to generate new portions of the implication graph, without the need for much coordination (other than what is listed in the new CONTRIBUTING.md file).
  • I am looking forward to seeing how the proof_wanted approach to capturing all non-Lean contributions to the implication graph (both human-generated and computer-generated) integrates with the Lean contributions (formalized via theorem rather than proof_wanted). Figuring out a viable general workflow to combine formal contributions, human contributions, and other computer-generated contributions (such as AI-generated contributions) has been one of my main motivations for this project.

Shreyas Srinivas (Sep 28 2024 at 14:20):

For manual management of the task table, I find the github mobile app very convenient. Tapping on the status lets me assign issues to users, change the status of the task and link PRs ver y easily.

Shreyas Srinivas (Sep 28 2024 at 18:58):

One key difference between this project and PFR appears to be that we have converged on a small number of tasks which are individually large. In contrast, in PFR, each theorem could be a task.

Terence Tao (Sep 28 2024 at 18:59):

I'm hoping to have a mix of tasks of different sizes. For instance in the Subgraph project there will be several small tasks, such as filling in a few easy proof_wanted claims, which would be comparable to PFR tasks I think.

Terence Tao (Sep 29 2024 at 00:17):

One passing thought: there have been surprisingly few bottlenecks so far - even though there are still some important pieces of infrastructure to implement, progress is still speeding along in other areas without such infrastructure. In contrast, for instance, in PFR we were somewhat bottlenecked at the beginning by the lack of an API for Shannon entropy, which was quite essential for most of the rest of the project.

One reason for this, I think, is that this project naturally combines small agile contributions with large slow contributions. On the first day or two most of the progress came from human formalizers, at a scale that could be managed by human organizers. Now, on day 3, the relatively simple coded algorithms are generating data, roughly in time for the automated organizational tools to come on line. The automated provers are just beginning enter the arena, and by the time we have the full infrastructure to accept, visualize, and analyze very large contributions, I expect the AI tools will also be ready. If we had to rely on only one or two of these types of contributions there would likely be quite noticeable bottlenecks.

Terence Tao (Sep 30 2024 at 00:20):

Day 4 thoughts, impressions, and reports:

  • The project seems to be successfully decentralizing; in particular, there is now a lot of activity going on now that I am not fully aware of. This thread would be a good place for other participants to report on any progress not otherwise mentioned here, especially that which would be relevant to various other components of the project. (I guess this would be called a "stand-up meeting" in software engineering?)
  • For instance, I was delighted to recieve equational#116 from @Franklin Pezzuti Dyer in which two old Putnam problems, which essentially asked to establish three implications between laws already in the database, were added to Subgraph.lean. I have since added them to the blueprint as well, and will also update the Hasse diagram accordingly.
  • As another spinoff activity, in equational#80 we managed to find a formula for the number of equational laws with n operations (up to relabeling, symmetry, and triviality), and will submit the resulting sequences to the OEIS. Technically this will be the first external output of this project! Also, since the proof of these formulae is already in the blueprint, I am planning to open some tasks to formalize their proofs in Lean, since the infrastructure to do so (as well as a suitable pool of contributors) is already in place...
  • Speaking of which, I certainly plan to write a paper on this project, probably submitted to a journal on formalized mathematics, which will discuss both the mathematical results of the project but also the workflow and experiences. This is still well into the future, after the primary goal of our project is completed, but I again encourage participants (or just casual observers) to share their impressions on this thread, as this may end up being part of the final paper. (See this similar retrospective paper on the Polymath8 project for a possible model.)
  • The task management workflow is now quite a painless experience for me, freeing up my time to deal with other aspects of the project, such as discussing how to transition smoothly from the current "semantic" Equation-based foundation of the project to a "syntactic" Law-based foundation. I am very grateful to the other moderators (@Pietro Monticone , @Shreyas Srinivas , and @Joachim Breitner ) for taking the lead on running so much of the project!

Amir Livne Bar-on (Sep 30 2024 at 06:35):

A bit about my experience in this project, and some thoughts:

  • I saw Terence's blog post a few hours after it was published. Writing down the list was a tricky programming question that felt like it had a simple solution, and these are always fun to find. Since surprisingly nobody else had listed the equations by then, I worked on it. After doing that I saw that this project had other small and interesting puzzles: proving implications manually, figuring out which implications to prove.
  • Some of my work has been proof-of-concept / barely working scripts: implication graph analysis, and a small file of auto-generated proofs. This is something I leaned working on software - if you put out such a proof of concept it encourages other better qualified people to work on more full and correct implementations, and on other ideas in the area. The results I got here are mixed. I feel like it caused graph analysis in Lean happen a bit earlier, but definitely had not impact on the other automation efforts.
  • The project definitely blew up very quickly in each of the days since. Would be interesting to see some metrics about this - e.g. plotting messages or active participant in Zulip, or PRs/issues/comments in Github.
  • There is amazing potential for beautiful visualizations of the implications graph, and maybe for other areas in the project. Once somone comes up with the right idea these will probably be implemented very quickly, and would bring the project much wider attention.

Terence Tao (Oct 01 2024 at 05:14):

Not much of an update for Day 5. While there is plenty of activity on many fronts, it is beginning to decentralize quite smoothly, and I no longer feel like I have to be involved in most of the different directions here. I mostly have been focusing on some more theoretical aspects to the project, such as slowly setting out tasks for the metatheory of equations and for formalizing the counts for the number of equations.

We have already established enough easy implications from the original set of ~22 million that there are only ~3 million unsolved implications remaining (up to equivalence). I am expecting some contributions from AI-assisted efforts to start coming in soon, as well as large-scale automated proving runs. We have made a lot of progress in assembling a stable infrastructure to receive these sorts of new inputs, and I am looking forward to see how these (as well as other more elementary implication chasing methods) reduce the count further. Also, we now have some data sets online with which others can start performing some data visualization and/or data analysis; hopefully there will be some interesting features to the implication graph that we will start to see.

Terence Tao (Oct 02 2024 at 05:24):

Day 6. Looks like we are getting most of the way to having an automated generation of implications, statistics, and visualizations set up, which will be useful both for contributors who want to conduct runs on the remaining outstanding implications, and also for more casual observers who could use some "eye candy" and quick summaries of progress. Also, I hope that now we have a few partial data sets, that some interesting data visualization and data analysis from new participants will emerge.

Much of the time I devoted to the project today was over "big-endian/little-endian" type issues, such as which orientation of ordering on laws (or Hasse diagrams) to use, or which symbol to use for the Magma operation. In informal mathematics these are utterly trivial problems, but for a formal project it is important to settle on a standard, and it is much easier to modify that standard early in the project rather than later. So this is the sort of task that has to happen now. I am finding the polling feature of Zulip to be useful for this, and hopefully this type of participation will help decentralize the project and help the contributors feel like they are active stakeholders.

I did have some speculation over at #Equational > Ideas for unknown implications regarding a possible future "assembly line" type workflow in which different teams work independently on sequentially connected components of the project, but perhaps we have not yet scaled up our infrastructure and participation to reach that point yet.

Terence Tao (Oct 03 2024 at 02:42):

Day 7. Based on the polls, I requested one minor refactor of the codebase (reversing the order on laws), though this is relatively low priority and has not yet been implemented. The most striking progress today has been firstly on the front end, where we now have a nice dashboard for our results, and some significant new progress in whittling down the number of implications; at one point we thought we had achieved a huge reduction, proving or conjecturing 99.958% of all implications; while this was a reporting error, we still have a respectable 95% or so of implications resolved now once the most recent PRs resolve.

Another cute result in the literature is now formalized within the project: it turns out that Equation1571 is equivalent to being an abelian group of exponent 2 (which then obeys a whole slew of additional equations). This is an old result of Mendelsohn and Padmanabhan from 1975.

Terence Tao (Oct 04 2024 at 01:35):

Day 8. Several contributions made the verified component of implications inch up to 87% or so, with a large PR of conjectural implications still in review. We now have a very nice experimental tool to explore the implication graph, which should help locate interesting areas of the graph to focus on to make further progress.

I've set up branch protection, and have had to learn the discipline of having to submit PRs and wait for CI like everyone else. It took some getting used to, but I think I have gotten the hang of it now, and presumably will stop regularly breaking the build going forward :grinning_face_with_smiling_eyes: .

We rolled out the first major refactor today - a change in the magma operation symbol to make the Lean code compile significantly faster. Hopefully there will not be too many secondary impacts of this simple, but rather broad, change.

A future research question was resolved by @Daniel Weber : it is now established (and proven in Lean in equational#244) that the implication Equation3994 -> Equation3588 is false, but that all counterexamples were infinite. This is an early example of the type of result that is enabled by this sort of systematic search, as this implication was flagged as a possible candidate via the Vampire run.

Terence Tao (Oct 05 2024 at 03:16):

Day 9. Progress continues on several fronts. Most visibly, the completion rate of the implication graph is now an impressive 99.866%! (We will soon need more digits of precision on our percentage displays). This is largely thanks to the Vampire run, the positive implications of which have now been formalized in Lean. I'm very grateful to all the contributors who worked behind the scenes to build in mere days the infrastructure to efficiently accept large automated Lean contributions, reduce them to something that can be compiled by Lean in reasonable time, and interpret the output.

In fact I found myself with some surplus time for the first time in a week, as most components of the project are now proceeding more or less autonomously, and I could now just participate in the aspects where I felt I could make a contribution, rather than having to "put out fires".

Soon we will have exhausted most of the low hanging fruit, and it seems we will have about ten thousand challenging implications that are beyond the reach of automated provers or small finite magmas. This is where I expect the project to get more interesting, with human analysis of key examples becoming an important component. This is also a potential period in which AI assistance may also become useful.

There are some interesting discussions over at #Equational > Future directions on how one can use the increasingly complete data we have to explore other questions in universal algebra, and also some nascent attempts to experimentally probe the graph for insights. I am hoping we identify some anomalous structures centered around some equational laws that are mathematically interesting, but have not received much attention in previous literature.

Terence Tao (Oct 05 2024 at 03:46):

p.s. I briefly mentioned this project in a recent interview (partially paywalled) at https://www.theatlantic.com/technology/archive/2024/10/terence-tao-ai-interview/680153/

Notification Bot (Oct 05 2024 at 23:10):

38 messages were moved from this topic to #Equational > Authorship, Securrity, and Red Teaming Project vulnerabil... by Shreyas Srinivas.

Shreyas Srinivas (Oct 05 2024 at 23:10):

Shreyas Srinivas said:

Terence Tao said:

As long as the CI time is still on the order of minutes instead of hours, I'd say it's worth it to have the extra guarantee. (Note to future self reading this thread: when writing the final paper, make a note of the extra layers of checking we have beyond the core Lean compiler, for instance we also have a separate check that no extra axioms are being inserted (equational#242).)

It might be good to "red team" and brainstorm for any other potential ways in which this project could be compromised, either by a glitchy AI or by some mischevious human.

I think this message still belongs in this thread.

Discussion thread for the above

Shreyas Srinivas (Oct 06 2024 at 00:52):

@Terence Tao : If these day to day recollections are meant to be public you could cross-post them as wiki pages on the repository

Shreyas Srinivas (Oct 06 2024 at 00:53):

On the repository home page, the wiki tab is right next to the Projects tab

Terence Tao (Oct 06 2024 at 00:56):

Ah I had never taken a look at that feature. That may indeed make more sense than using the Zulip, since they aren't really designed for two-way engagement. I'll play around with that some.

I had also been thinking of making a page discussing some equations of interest (sort of like a tourist guide to the equation landscape), and the wiki would be a good place for that too.

Shreyas Srinivas (Oct 06 2024 at 00:57):

github lets you use mermaid which a rudimentary diagramming tool : https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-diagrams

Shreyas Srinivas (Oct 06 2024 at 00:59):

This could be useful in illustrating how the project is going for example

Shreyas Srinivas (Oct 06 2024 at 00:59):

It isn't too powerful but good enough for basic diagrams

Terence Tao (Oct 06 2024 at 01:31):

I am moving my daily log over to the wiki and will cease posting it here. I would instead like to encourage others to leave any thoughts here, especially those which might be useful for posterity (and when the time comes to write up a report on this project).

Shreyas Srinivas (Oct 06 2024 at 01:33):

I think if you link it here, it would stimulate the discussion on each day's log

Terence Tao (Oct 06 2024 at 01:34):

OK. I can post a daily link then. (I was thinking to try to reduce spam, but at the current rate of discussion, one comment/day is a negligible portion of the daily volume.)

Shreyas Srinivas (Oct 06 2024 at 01:35):

While the wiki is good at accumulating and organizing information, zulip is still better at encouraging discussion

Terence Tao (Oct 06 2024 at 02:02):

Day 10 log

Shreyas Srinivas (Oct 06 2024 at 10:37):

The dashboard graphic is hard to interpret at the moment. Maybe a key is needed go explain the axes, the colours etc. and what the lines and density of colours mean.

Shreyas Srinivas (Oct 06 2024 at 10:38):

CC : @David Renshaw

Shreyas Srinivas (Oct 06 2024 at 22:24):

(But if we were to do it all over again, we should probably have planned out the Github organizational structure and roles more thoughtfully.)

Found this on the day 11 log. Based on all that I have seen of organically grown open source projects, no amount of pre-planned role assignment will completely eliminate these decision making processes on the maintainers end. Roles evolve as the project evolves. That being said, experience in one project often translates into a set of broad guidelines for future projects.

Terence Tao (Oct 07 2024 at 19:45):

Day 12

Terence Tao (Oct 08 2024 at 20:36):

Day 13

Shreyas Srinivas (Oct 08 2024 at 20:53):

One small point: I do think some improvements to the process are possible. Just that it won't be wise to build the tool from scratch. For example if someone can suggest a way to make claim and PR proposal work with a graphical user interface that would eliminate the syntax mistakes people make on the issue comments from time to time, that's a significant improvement.

Shreyas Srinivas (Oct 08 2024 at 20:55):

The best tooling improvements would build upon our current solution (or equivalent CI based solutions on Trello or other similar platforms)

Shreyas Srinivas (Oct 08 2024 at 22:46):

Given how many separate CI actions we have to setup, one potential project is to build a domain specific language that helps us express the CI processes at a fairly high level and compiles it down to github actions. A lean library to interface with github's API would also be a nice contribution

Shreyas Srinivas (Oct 08 2024 at 23:40):

Github projects have default workflows that are fairly limited but have nice GUIs to set them up. Anything custom currently requires a CI .yml file.
An extended version of the graphical tools that allow custom actions to be created via the similar flowcharts and that allow code snippets in the flowchart nodes would be really cool (cc: @Iva Babukova) . The flowchart could then compile down to GitHub action scripts

Terence Tao (Oct 09 2024 at 19:32):

Day 14

Shreyas Srinivas (Oct 09 2024 at 22:42):

Terence Tao said:

Day 14

I think at least a handful of us are CS researchers. I am doing my PhD in computer science for instance

Terence Tao (Oct 09 2024 at 22:44):

Thanks, I have reworded the entry slightly to reflect this.

Terence Tao (Oct 10 2024 at 22:33):

Day 15

Shreyas Srinivas (Oct 11 2024 at 09:52):

Are we sure we don't want to put the data in zenodo or equivalent services for math?

Shreyas Srinivas (Oct 11 2024 at 09:53):

Pros:

  1. They maintain academic artefacts well.

Cons:

  1. It is not advisable to upload something frequently to them

Shreyas Srinivas (Oct 11 2024 at 09:54):

We should only do this at the end of the project, once we are sure all the data files are correctly generated.

Will Sawin (Oct 11 2024 at 12:24):

Terence Tao said:

Day 15

With regard to the age of automated theorem provers, it does seem that there is more that they could do that would be helpful. For each ansatz humans come up with to solve some of the problems (e.g. translation-invariance), it should be possible to run the automated theorem prover on each currently unknown implication and see if they can be proven for magmas satisfying the ansatz. Any negative result one could then try to formalize and every positive result would guide humans as to where to apply that ansatz.

Kevin Buzzard (Oct 11 2024 at 12:49):

Didn't automated theorem provers already do over 99% of the work? And you want them to do more? :-)

Terence Tao (Oct 11 2024 at 14:38):

I think uploading our final database (if not the entire github repo) to a standard repository such as Zenodo is a good idea. I was also thinking of submitting the database to https://mathbases.org/ (the site is not hugely active, but perhaps it just needs a bit more support and attention). Other thoughts would be appreciated. I've added these suggestions at least to the (currently very stubby) "Data management" section of https://github.com/teorth/equational_theories/wiki/Plan-of-paper .

Shreyas Srinivas (Oct 11 2024 at 14:41):

To be honest I can see a valuable nice user experience report which could be prepared for the upcoming CAV deadline in January. The report would foucs on the ATP tools (and thus include the ATP authors). The only catch is these tools have been around for a while now. Regular CAV submitters can confirm

Terence Tao (Oct 11 2024 at 19:07):

Day 16

Kevin Buzzard (Oct 12 2024 at 10:52):

I'm only an observer of this project rather than a contributor but I was wondering if I'd understood the endgame correctly. Is the status that there are now only a few hundred implications whose status is unknown, all are expected to be false, and it could well be the case that this (novel?) technique of explicitly building a magma on Nat bit by bit will create a counterexample to each of them? Or could there be more twists in the story?

Daniel Weber (Oct 12 2024 at 11:03):

I'm not sure how well we could expect this method to work for all implications — I couldn't get it to work on 1659, for instance (although I also can't conclusively say it doesn't work there). Other than that it sounds true, although there still could be some implication which turns out to be true with a really complex proof

Zoltan A. Kocsis (Z.A.K.) (Oct 12 2024 at 11:13):

@Kevin Buzzard On Oct 7 Weber conjectured that all remaining potential implications were false. I pushed back partly because I've been trying to construct countermodels of one of the 65 -> ... implications for a few days at that point, and bounced back hard. This drew attention to 65, and since Tao and others developed the technique to build countermodels to 65 -> ... implications I think most everybody working on unknowns has been working under the assumption that the remaining ones are false. Afaict the currently existing metatheory does not suggest any remaining unknowns to be true, and ATPs fail to prove them.

However, I'm not sure we expect the 65->... countermodel construction specifically to settle all the remaining unknowns. There are several parallel avenues being explored for settling the remaining 761 unknowns (really about half of that, thanks to duality), see e.g. threads about confluence.

Will Sawin (Oct 12 2024 at 15:00):

Kevin Buzzard said:

I'm only an observer of this project rather than a contributor but I was wondering if I'd understood the endgame correctly. Is the status that there are now only a few hundred implications whose status is unknown, all are expected to be false, and it could well be the case that this (novel?) technique of explicitly building a magma on Nat bit by bit will create a counterexample to each of them? Or could there be more twists in the story?

Since different techniques are being tried in parallel, truly understanding the power of any of the techniques requires applying it to implications that have already been disproven by other techniques. This is probably only feasible to do if the techniques are automated in some way (as might be possible to do for some of them).

Terence Tao (Oct 12 2024 at 20:58):

Day 17

Shreyas Srinivas (Oct 12 2024 at 21:38):

Agreed with that last line. The last day or two(in particular after adding the external checkers) has been the first time I have not had to review PRs and put out fires, since the project began

Shreyas Srinivas (Oct 12 2024 at 21:39):

One particular lesson I learnt was that maintenance is a full time job

Shreyas Srinivas (Oct 12 2024 at 21:40):

Maintainers don't get to spend too much time getting deeply involved on individual tasks.

Terence Tao (Oct 12 2024 at 21:42):

I imagine that any future collaborative project that was any larger than this one would have to be funded and have one or more paid project maintainers. This feels like close to the maximum size of an all-volunteer effort, unless the tooling to create a working collaboration platform somehow becomes much more automated than it is now.

Shreyas Srinivas (Oct 12 2024 at 21:49):

No tooling will ever be able to handle quality issues completely

Shreyas Srinivas (Oct 12 2024 at 21:53):

It is possible that in 10 years time half as many people are needed as now, but I'd never count out the need for humans. For one thing, someone who has the capacity to review the decisions of any automation has to be able to take responsibility for any issues that might arise and fix them. Further the cost of training and running any tools including AI tools would mean that they get priced at a point that's a good fraction of the cost of hiring human maintainers

Shreyas Srinivas (Oct 12 2024 at 21:54):

Further, human maintainers of technical content will have to have a good conceptual grasp of at least some of the conceptual and technical matter

Shreyas Srinivas (Oct 12 2024 at 21:54):

For mathematics, this means people with sufficient mathematical knowledge of some part of the project's conceptual domain

Shreyas Srinivas (Oct 12 2024 at 21:56):

For example I have studied model theory in the past so I found it relatively easy to fix a completeness proof that was not going through in the project's beginning. I wouldn't have felt confident about doing that without knowing what things like free algebras, completeness and soundness mean.

Mario Carneiro (Oct 12 2024 at 22:09):

Terence Tao said:

I imagine that any future collaborative project that was any larger than this one would have to be funded and have one or more paid project maintainers. This feels like close to the maximum size of an all-volunteer effort, unless the tooling to create a working collaboration platform somehow becomes much more automated than it is now.

To be fair, mathlib is arguably a larger collaborative project which has worked for quite a while on volunteer effort, although it does get some funding these days, mainly for the CI machines

Shreyas Srinivas (Oct 12 2024 at 22:10):

Even in this project more maintainers might have ensured better PR reviews of the automated contributions. If projects get more parallel, it will be difficult to ensure that there are no large overlaps in what the independent groups of people are doing

Terence Tao (Oct 12 2024 at 22:10):

I wonder what allowed mathlib to scale up to the size it has. Perhaps the rate of increase was slower than for our project? And it decoupled more strongly into non-interacting pieces?

Shreyas Srinivas (Oct 12 2024 at 22:10):

Then we will need some maintainer discipline. For example requiring review approvals from two different maintainers before merging PRs

Mario Carneiro (Oct 12 2024 at 22:11):

I think there isn't really an obvious limit here. You just need to make sure that all roles grow proportionally

Shreyas Srinivas (Oct 12 2024 at 22:11):

Arguably mathlib maintenance resources are already stretched thin given the size of the review queue

Mario Carneiro (Oct 12 2024 at 22:11):

there are a lot of mathlib maintainers, but there are more contributors too from which to "hire" maintainers

Mario Carneiro (Oct 12 2024 at 22:12):

as long as these things stay in balance you can keep scaling

Mario Carneiro (Oct 12 2024 at 22:12):

this is of course not trivial

Shreyas Srinivas (Oct 12 2024 at 22:12):

@Mario Carneiro : one thing to note is mathlib grew over a longer time frame

Terence Tao (Oct 12 2024 at 22:13):

I see. So I withdraw my claim about hitting a hard upper bound. We could in principle grow larger if we had more maintainers and some more structure with the PR process.

Mario Carneiro (Oct 12 2024 at 22:13):

of course, growth doesn't happen overnight (unless you make a big splash in the right place and time)

Shreyas Srinivas (Oct 12 2024 at 22:13):

This project started two weeks ago from scratch with at first 3 and then 4 maintainers

Shreyas Srinivas (Oct 12 2024 at 22:13):

By day 3 I was poring over huge PRs

Mario Carneiro (Oct 12 2024 at 22:14):

@Terence Tao is good at making big splashes :)

Shreyas Srinivas (Oct 12 2024 at 22:14):

And figuring out the best design for automating the CI with Pietro (whose CI scripting has been critical to the project throughout)

Shreyas Srinivas (Oct 12 2024 at 22:15):

The point is mathlib had time to grow its maintainer size and its tools and processes

Mario Carneiro (Oct 12 2024 at 22:15):

right now I think maintenance of the repo is pretty good, although I'm not doing the day to day

Shreyas Srinivas (Oct 12 2024 at 22:16):

It is currently maintained by people who have been doing this for years now.

Shreyas Srinivas (Oct 12 2024 at 22:16):

And there is a lot of internal institutional knowledge

Mario Carneiro (Oct 12 2024 at 22:16):

seems like things are under control, although I don't think it would be if contributions doubled

Mario Carneiro (Oct 12 2024 at 22:16):

but in that case you just get more maintainers

Shreyas Srinivas (Oct 12 2024 at 22:16):

I already see many parallel PRs

Terence Tao (Oct 12 2024 at 22:17):

Interestingly the traffic stats at https://github.com/teorth/equational_theories/graphs/traffic (or https://github.com/teorth/equational_theories/graphs/code-frequency ) are not showing massive growth. Which is fine by me. I think we have about the right size now to finish off the project.

Shreyas Srinivas (Oct 12 2024 at 22:17):

Mario Carneiro said:

but in that case you just get more maintainers

Who started out as contributors and gained experience reviewing in mathlib's case.

Mario Carneiro (Oct 12 2024 at 22:17):

exactly, it's a virtuous cycle

Henrik Böving (Oct 12 2024 at 22:17):

Terence Tao said:

Interestingly the traffic stats at https://github.com/teorth/equational_theories/graphs/traffic (or https://github.com/teorth/equational_theories/graphs/code-frequency ) are not showing massive growth. Which is fine by me. I think we have about the right size now to finish off the project.

Note that the traffic page is only visible to contributors of your project

Shreyas Srinivas (Oct 12 2024 at 22:18):

500000 lines of code at its peak

Shreyas Srinivas (Oct 12 2024 at 22:18):

That's huge

Terence Tao (Oct 12 2024 at 22:18):

Here's the traffic:
image.png

Terence Tao (Oct 12 2024 at 22:19):

Commits:
image.png

Shreyas Srinivas (Oct 12 2024 at 22:19):

Mario Carneiro said:

exactly, it's a virtuous cycle

Virtuous cycles need time. When collaborating over a single project lasting a few weeks, it is a different story. You need to start out with good maintainers.

Terence Tao (Oct 12 2024 at 22:20):

Code frequency (though I am not 100% sure what this is measuring, this could be an artefact of the big data dumps)

image.png

Shreyas Srinivas (Oct 12 2024 at 22:20):

Terence Tao said:

Commits:
image.png

Important to note that these are PRs where the commits are squashed from dozens of actual commits

Shreyas Srinivas (Oct 12 2024 at 22:22):

So the initial starting point in the graph here is the lines of code in the blueprint template itself.

Terence Tao said:

Code frequency (though I am not 100% sure what this is measuring, this could be an artefact of the big data dumps)

image.png

Shreyas Srinivas (Oct 12 2024 at 22:22):

Further a lot of the ATP pull requests added tens of thousands of lines

Mario Carneiro (Oct 12 2024 at 22:24):

Shreyas Srinivas said:

Mario Carneiro said:

exactly, it's a virtuous cycle

Virtuous cycles need time. When collaborating over a single project lasting a few weeks, it is a different story. You need to start out with good maintainers.

Okay? There are lots of very good people contributing to the project right now

Mario Carneiro (Oct 12 2024 at 22:24):

it's not like we're aggregating the contributions of highschoolers here

Joe McCann (Oct 12 2024 at 23:10):

Not to throw in my two cents here, but my career background is in software engineering, and this really just seems to be what happens with experimental projects. In this first iteration there is no way of knowing if it will work out, so you can't preemptively over-engineer the backend infrastructure. As such a substantial amount of tech debt has been amassed, but is that really all that problematic here? This specific version of the problem is nearly (in terms of quantity, maybe not time :skull:) completed.

If you would want to have a similar collaborative effort in the future, then a lot of what is learned during this endeavor can help design a better framework for the next one. For example, imo if ATPs are generating 10s of thousands of lines of code, then that shouldn't be committed to a repo. Rather, held in a side database or something (similar to how code is maintained for ML models)

Terence Tao (Oct 13 2024 at 00:02):

Made a brief update on the project on my blog. Also from the comments on this blog I found this paper of Phillips and Vojtěchovský which accomplishes something similar (but smaller-scale) to this project for equational laws for Moufang loops that are not too complicated (in the end they have 14 laws up to equivalence to consider).

Shreyas Srinivas (Oct 13 2024 at 00:18):

I think the link to the project dashboard might be more informative than the link to the pull requests, because that is basically what we are doing differently this time compared to PFR

Vlad Tsyrklevich (Oct 13 2024 at 13:06):

Terence Tao said:

Made a brief update on the project on my blog.

minor correction: at one point you say "Equation 1681" but you (correctly) link to 1689.

Terence Tao (Oct 13 2024 at 17:43):

Day 18

Mario Carneiro (Oct 14 2024 at 15:18):

I haven't seen any positive results in a while. Are there any positive results that weren't found by the vampire pass?

Zoltan A. Kocsis (Z.A.K.) (Oct 14 2024 at 15:20):

Most people suspect there are none left.

Zoltan A. Kocsis (Z.A.K.) (Oct 14 2024 at 15:21):

It's been that way for a while, see e.g. this thread.

Mario Carneiro (Oct 14 2024 at 15:21):

that is to say, vampire has found all positive implications we know?

Mario Carneiro (Oct 14 2024 at 15:22):

or were there additional implications which were found using other techniques

Zoltan A. Kocsis (Z.A.K.) (Oct 14 2024 at 15:24):

I am fairly sure there were no positive results announced since the discussion on the thread (in particular my suspicion that there might be true 65 ->... implications was definitively refuted).

Mario Carneiro (Oct 14 2024 at 15:26):

I see. So actually the cactus plot I suggested isn't going to be very interesting

Terence Tao (Oct 14 2024 at 15:26):

There could be a selection bias because none of our techniques are capable of establishing positive results unless they have an "obvious" proof, in which case they would have been picked up by Vampire already. There are some unusually difficult positive implications such as 1689 => 2 that were known already in the literature (though I believe even this one fell to Vampire and some other ATPs eventually). The situation may clear up once we exhaust everything that can be done with current techniques and are faced with a new batch of holdouts.

Mario Carneiro (Oct 14 2024 at 15:27):

Selection bias seems unlikely when you are doing an exhaustive enumeration

Terence Tao (Oct 14 2024 at 15:28):

Well, if you exhaustively enumerate the unknown implications with a technique that can only disprove implications, but never prove them, you are going to get a very noticeable bias in the reported resolutions even if a significant fraction of the implications are in fact true.

Mario Carneiro (Oct 14 2024 at 15:29):

I would not be too surprised if the final situation at the end of the project looks much like it does now: All positive results are easy, most negative results are determined (by a variety of methods at varying levels of human and computational difficulty), and then some small set of holdouts that no one can determine

Mario Carneiro (Oct 14 2024 at 15:29):

I'm also quite skeptical that one of those holdouts will turn out to be positive and very hard

Mario Carneiro (Oct 14 2024 at 15:30):

but I can definitely believe that we will not solve all of them - there might be a secret FLT hiding in there

Mario Carneiro (Oct 14 2024 at 15:31):

(I say "we" but y'all are doing the real work, I'm just watching)

Terence Tao (Oct 14 2024 at 15:31):

That's the most likely scenario for me also, but I am holding out hope for discovering "mathematics from Mars" : some nontrivial deep positive implication that conventional mathematics simply missed.

I'll set up a poll so that people can "place bets". One sec...

[For future reference: the current number of proven implications is 10,657 + 8,167,622 = 8,178,479.]

Terence Tao (Oct 14 2024 at 15:33):

/poll What do you think the final implication graph will look like?
All currently open implications will resolve to false
Most currently open implications will resolve to false, a few will be extremely hard and not resolved (within, say, a year of the project launch)
At least one open implication will be found to be positive
A significant fraction of currently open implications will remain unresolved, even after a year

Mario Carneiro (Oct 14 2024 at 15:34):

how many unknowns are there ATM?

Terence Tao (Oct 14 2024 at 15:35):

https://teorth.github.io/equational_theories/dashboard/ reports 540 unknowns, or 518 if one excludes conjectured results (implications announced here on the Zulip and recorded as conjectures).

Terence Tao (Oct 14 2024 at 15:36):

These numbers will tick down a bit soon, for instance once equational#569 lands [EDIT: it just got merged, so probably the numbers will soon be 538 and 516, because of duality]

Pietro Monticone (Oct 14 2024 at 15:37):

Extraordinarily (almost painfully) naive question since I know next to nothing about automated theorem proving: how much of the “space of available ATP tools” are we currently covering in this project?

Mario Carneiro (Oct 14 2024 at 15:37):

Are model checkers being used?

Terence Tao (Oct 14 2024 at 17:04):

Day 19. (The stats are already out of date, but I'm not timestamping these entries with any precision finer than 24 hours.)

Andrés Goens (Oct 14 2024 at 17:14):

Mario Carneiro said:

Are model checkers being used?

It's not obvious (to me at least) how you would use a model checker for this, can you elaborate?

Mario Carneiro (Oct 14 2024 at 18:10):

FWIW my estimate for "significant fraction" is 80 implications

Mario Carneiro (Oct 14 2024 at 18:10):

although there seems to be a lot of confidence that the existing methods are good enough to finish the job

Terence Tao (Oct 14 2024 at 18:15):

Well today could be a particularly optimistic day, as we've had a lot of progress - for instance, the number of open implications has just dropped to 364 (one of the highest-yield equations got its theory formalized). It's good to have at least one contrarian viewpoint though, we don't want to get stuck in groupthink :grinning_face_with_smiling_eyes:

Cody Roux (Oct 14 2024 at 18:15):

If Z3 counts as a model checker, then yes, if you mean, e.g. SPIN then no

Cody Roux (Oct 14 2024 at 18:17):

Vampire is quite good at proving implications between equations, basically unfailing completion is semi-complete and works quite well.

Of course, it can't find arbitrarily long proofs.

Mario Carneiro (Oct 14 2024 at 18:19):

Andrés Goens said:

Mario Carneiro said:

Are model checkers being used?

It's not obvious (to me at least) how you would use a model checker for this, can you elaborate?

Isn't this literally a task of finding countermodels for a non-theorem?

Mario Carneiro (Oct 14 2024 at 18:20):

I'm not sure but I don't think tools like Z3 are optimized for the non-theorem case

Cody Roux (Oct 14 2024 at 18:21):

Z3 is very optimized for SAT, which is what a non-theorem is!

Cody Roux (Oct 14 2024 at 18:21):

Of course it's more optimized for things that involve natural numbers arrays etc

Shreyas Srinivas (Oct 14 2024 at 23:52):

I added a section to the paper draft notes on the role of maintainers and the skills they need: https://github.com/teorth/equational_theories/wiki/Plan-of-paper#maintenance

Shreyas Srinivas (Oct 15 2024 at 00:02):

In particular, I want to emphasise that maintenance requires some level of mathematical, lean, and scripting competence.

Zoltan A. Kocsis (Z.A.K.) (Oct 15 2024 at 15:06):

Starting today, I won't have much time to contribute to the project. This will likely be the case for the next 2-3 weeks, so given the current pace of progress, I don't expect to return to my former level of involvement until the article-writing phase begins in full. A quick thank you to everyone I've interacted with, including fellow mathematicians, repo maintainers and onlookers: it's been great collaborating with all of you.

I thought I'd write down a few short highlights, so I'll be able to remember if someone asks me how I felt working on this project in spring 2024. Hopefully, some of you will enjoy reading it.

Some things I enjoyed

  • Asterix: I was probably the first to attempt constructing countermodels for 65->..., and ended up getting crushed like the Romans. I mentioned a 65->... unknown to Daniel Weber as a possible source of counterexamples when he initially suggested that all remaining unknowns might resolve false. Watching Joseph Myers, Pace Nielsen, Will Sawin, Daniel Weber, and Terence Tao collaborate on it was impressive, and Tao eventually built a model. The techniques they developed (translation-invariant magmas, greedy constructions) are now essential to the current refutations. For instance, the magmas I used to refute the remaining implications from 1659, 1661, and 1701 are all modified translation-invariant magmas. Bernhard Reinke's countermodel that refuted the implications from 1648 is also a bona fide translation-invariant magma. Pace Nielsen's yet-to-be formalized proof is a greedy construction. I do have a bit of regret that I did not join in on the 65->... effort, but after working on it previously, I had enough, and I was also busy developing the Finite Magma Explorer at the time, so I guess I had a valid excuse.

  • Central groupoids: I added a Lean definition of Knuth's A2, a nonnatural central groupoid, and following Terence Tao's advice, used it to settle anti-implications of the form 168 -> .... Central groupoids turned out to be fascinating structures, and somewhat aligned with my own research interests, so I expect to revisit them later. Bhavik Mehta has since implemented natural central groupoids, and I'm eager to see them added to the repository!

  • All4x4Tables: Working with Nicholas Carlini, Joachim Breitner, and others on All4x4Tables was a highlight. The moment when outcomes.png finally turned dark red from bright pink after re-integrating Joachim Breitner's transitive closure pruning with my planning script, became one of my favorite moments from this project.

Some things I look forward to

  • Will there be a clear countermodel for 1112->8? Besides the 65->... family, this was another that gave me a hard time, and I'd really like to see a straightforward countermodel this time.

  • Is there a true unknown in the current set? I agree with the others now that there probably isn't, but I'm still curious.

  • Does the full implication graph have nontrivial automorphisms apart from duality?

Some tools I loved

  • Vampire: I already knew Vampire was good, but it exceeded my expectations anyway. Not only did it handle the hardest implication proofs, but it also excelled at generating large finite counterexamples, far beyond what I expected.

  • Elm: Building the Finite Magma Explorer in Elm was a breeze. And I think the FME turned out great with it: the only bug reported so far turned out to be in the high-performance JavaScript worker.

  • Graphiti: I ended up using Graphiti more than any other tool in the repositories, including the ones I built. It really helped me focus on meaningful implications. Without Graphiti, I doubt I would have built the three infinite models that together reduced the unknown count by hundreds.

  • Lean 4: I'm not new to interactive theorem proving by any means: I formalized the main theorem of my PhD thesis, worked a lot with Isabelle/HOL at CSIRO, contributed to HOL4 and previously briefly used Lean3 as well. This was my first time using Lean 4, however, and it turned out to be a massive improvement over Lean 3. It still won't become my primary proof assistant, but I agree that it's a really good fit for this project, and I'm happy to see such progress from its earlier version.

Some things I still plan to do

  • Add linear magmas to the Finite Magma Explorer. I may also port the FME's filtering feature to the Equation Explorer.

  • Contribute various sections/paragraphs to the article.

  • If you want something done, and think I'm the best person for the job, by all means ping me here or on the Github issue. I won't have much time, but I'll have a bit, and this will help me prioritize.

Terence Tao (Oct 15 2024 at 15:10):

Thanks so much for your really valuable contributions, and for your final thoughts and impressions!

Terence Tao (Oct 15 2024 at 18:04):

Day 20

Jose Brox (Oct 16 2024 at 00:20):

Mario Carneiro ha dicho:

Are model checkers being used?

Does Mace4 count as a model checker? Also, if I understood correctly from this project, Vampire should have model checker subroutines.

Vlad Tsyrklevich (Oct 16 2024 at 07:55):

One thing that I think would be a nice outcome of this project would be a REPLICATING.md with steps on how to replicate the automated parts of this project to show future interested parties how to use the tooling that's been built. So if you were interested in studying some variant of this project, you already had a rough guide on what to run/modify to get your idea going. It would be nice to be able to get from 0 knowledge about the graph, to at least where Vampire left us (~8k unknowns with another ~20k conjectures if I remember correctly).

Joachim Breitner (Oct 16 2024 at 08:23):

I would even say it’s a somewhat embarrassing omission that we don't have a reproducible artifact (in the sense applied, for example, by many CS conferences these days), with not just instructions that likely won’t work if they are untested, but executable code that also documents and pins all the tools used (I'd favor a nix derivation, but a docker file is also common), and that’s tested on CI (maybe for the slow search tool the CI would test that it would work on small inputs).

But that’s a serious untertaking and maybe not compatible with the initial experimental speed of the project (and maybe also not as severe if we think of some of this work as one shot – we found the examples, we have the proofs, that’s ok.)

Shreyas Srinivas (Oct 16 2024 at 10:07):

Since we aren't collecting any statistical information or performance metrics, only proofs, I assumed the repo would be sufficient

Shreyas Srinivas (Oct 16 2024 at 10:19):

Here by "statistical" I mean data coming from a random source. Our stats are all collected on lean theorems which have or have not been proved

Shreyas Srinivas (Oct 16 2024 at 10:21):

One stats we could have collect reproducibly are the ATP stats. But when we set up the project, we didn't expect ATPs to basically dominate the project for a while.

Terence Tao (Oct 16 2024 at 19:20):

Day 21

Joe McCann (Oct 16 2024 at 19:45):

I actually kind of feel the opposite way in that I think we are committing a large amount of files that we don't need to be committing to the repo. For example, if there are a large quantity of proof files that are generated from a script, we could have those generated at build time. Any user who wants to verify can download the repo and build all the scripts themself.

I know this is more of a mathematics project than a software project, but using some additional code management standards could make things easier in the long run.

Shreyas Srinivas (Oct 16 2024 at 19:52):

No I disagree with it. Firstly scripts don't generate perfect proofs. Some human tinkering is needed. Secondly I can trust that a lean proof which is shown correct will stay correct. I can't say the same about a python script generating lean proofs

Joe McCann (Oct 16 2024 at 19:57):

I see what you're saying, and in the event that human tinkering is needed I agree, thats no longer a generated file. Still, a python script generating proofs would still be verifiable within the build pipeline, so faulty proofs wouldn't be pushed or generated.

As an alternative, perhaps proof files could be stored in an external database such as AWS S3. When I worked on serving Machine Learning models, it was inefficient to store them in the repo so we put them in S3 and pulled them onto the server at build time

Shreyas Srinivas (Oct 16 2024 at 20:06):

I don't see how one would verify a python script generating proofs. If we could do that we would also be able to verify lean tactics

Joe McCann (Oct 16 2024 at 20:19):

Python file example.py outputs file proof.lean that contains a proof we can verify correct? Unless I am mistaken. I am considering files in the repo such as equational_theories\Generated\All4x4Tables\Refutation<x>.lean where all the different <x> files are just copy-pastes of each other with different numbers

Shreyas Srinivas (Oct 16 2024 at 20:22):

You can't be sure that output is reproducible

Joe McCann (Oct 16 2024 at 20:56):

as in that the python file will output the same output file every time?

Shreyas Srinivas (Oct 16 2024 at 20:57):

Yes. You would have to be sure that the underlying libraries don't change version, the ATPs are deterministic and timeout at the exact same spot, the system libraries don't have subtle implementation bugs etc. And even this wouldn't be "verification" in the sense that lean offers.

Shreyas Srinivas (Oct 16 2024 at 20:58):

Lean's guarantees are much stronger

Joachim Breitner (Oct 16 2024 at 21:13):

Nobody wants to take lean out of the equation here, you still get the same guarantees. And the lean kernel doesn't care whether the proof terms was produced from a lean source file that got committed to the repo after being generated, or not committed.

Also realistically a python script is more likely to still run with next year's python than a lean tactic with next year's lean :-)

And of course we'd have ensure reproducibility (e.g. using nix to pin the versions of all tools, recording seeds where randomness was involved). An extra concern to pay attention to, but absolutely doable

And if we have data that was produced by a multi-hour search, then yes, thats a valid exception from the otherwise very sensible don't-commit-generated-files rule. But even then it seems sensible to commit the raw generated data (JSON, csv, custom text format) than full lean files.

But nothing here changes the guarantees we get our of the final extract_implications run.

Vlad Tsyrklevich (Oct 16 2024 at 21:36):

Speaking of which, a bug in the closure computation would mean the graph is less complete than we believe. Formalizing what we think we know about the graph state would be cool, though I'm not experienced enough in Lean to know how difficult that would be to accomplish. The trivial 4096**2 check should not be too hard to generate, though I have no idea how difficult it would be to make use of equiv classes and duality to cut it down to something more reasonable to upstream.

Shreyas Srinivas (Oct 16 2024 at 21:38):

Btw, I wonder how many of the auto-generated proofs can be compressed using FactsSyntax if they are repetitive as you say

Shreyas Srinivas (Oct 16 2024 at 21:39):

Compressing them would help organise the files better

Vlad Tsyrklevich (Oct 16 2024 at 21:39):

Facts don't make sense for implications in the same way that it does for non-implications.

Vlad Tsyrklevich (Oct 16 2024 at 21:39):

And there are already very few facts for non-implications

Shreyas Srinivas (Oct 16 2024 at 21:39):

Why?

Vlad Tsyrklevich (Oct 16 2024 at 21:40):

If we cared about reducing theorems, we could make use of duality in the closure computation to reduce implications by another half or so I suppose.

Joachim Breitner (Oct 16 2024 at 21:40):

Yes, I am pondering this a bit as well: what would be the most convincing self-contained theorem statement that says “this is the correct full implication graph”, and then craft a proof that lean can actually handle in terms of size, which is an interesting engineering challenge. Ideally the theorem would not even refer to a long list of equations, but refer to a lean function that (hopefully obviously correctly) generates the equations.

Shreyas Srinivas (Oct 16 2024 at 21:42):

Another thing I noticed for counterexamples, which has also been discussed here is that depending on the equations, counterexamples have to satisfy certain properties. A trivial example is for 1102 and 511, where the counterexample cannot be commutative or associative. Do we have a way of organising these properties of counterexamples?

Vlad Tsyrklevich (Oct 16 2024 at 21:42):

Shreyas Srinivas said:

Why?

In the non-implications case, proving some magma satisfies Eqns A+B+C+... [length M] but not X+Y+Z+... [length N] lets you prove M*N refutations using M+N checks, but implications are different, you have to prove everyone on a given hypothesis, so there is no speed-up. Though if you're talking about just syntactically, we could definitely do some compression, some of trivialbruteforce theorems are incredible repetitive, but I'm not sure it matters much. The directory is <1mb, it's trivial compared to the size of e.g. mathlib

Joachim Breitner (Oct 16 2024 at 21:42):

Vlad Tsyrklevich said:

If we cared about reducing theorems, we could make use of duality in the closure computation to reduce implications by another half or so I suppose.

Which I think we should. (Is there an emoji for “broken record” :-))

Shreyas Srinivas (Oct 16 2024 at 21:42):

Can we cut short the search for counterexamples by proving such theorems

Shreyas Srinivas said:

Another thing I noticed for counterexamples, which has also been discussed here is that depending on the equations, counterexamples have to satisfy certain properties. A trivial example is for 1102 and 511, where the counterexample cannot be commutative or associative. Do we have a way of organising these properties of counterexamples?

Shreyas Srinivas (Oct 16 2024 at 21:44):

I am not an ATP expert, whence my question.

Vlad Tsyrklevich (Oct 16 2024 at 21:54):

Joachim Breitner said:

Ideally the theorem would not even refer to a long list of equations, but refer to a lean function that (hopefully obviously correctly) generates the equations.

My thought was to have sub-theorems for every equivalence class (modulo duality), and then a primary theorem that checks that every sub-theorems equiv class members ⋃ implies ⋃ refutes covers every member 1-4694 and that every sub-theorem has 4694 implies+refutes in the range [1-4694] seems fairly explicit without being one massive disjunction. I'm not familiar enough with Lean/theorem-proving to know how one might express the idea otherwise in a way that is still 'believably equivalent' to the explicit statement.

Joachim Breitner (Oct 16 2024 at 22:01):

Hmm, that's not a bad idea, using the cardinality (and disjointedness) of the implies and refutes sets to easily check locally that “this equation is fully understood with the following data”. From that we can then have a smaller definition “equation n is fully understood” (basically hiding the data behind an exista). And then finally a “all equations are fully understood” theorem is very tractable.

Sounds like a modest amount of meta code to generate these proofs, and maybe some mild proof engineering (e.g. using a better data structure than linked lists with linear indexing). Fun task indeed!

Terence Tao (Oct 16 2024 at 22:04):

See also equational#343, which would allow verification in the sense of an interactive theorem prover: one inputs a pair of equations X and Y, and the interactive tool outputs a low-level proof of the relevant implication or anti-implication, hopefully with only minimal dependencies in Lean and some reasonably tractable way to export the proof to other proof assistant languages.

Shreyas Srinivas (Oct 16 2024 at 23:36):

We can do one thing to improve reproducibility of the results although it isn't foolproof

Shreyas Srinivas (Oct 16 2024 at 23:36):

We already have a pinned python version and venv setup

Shreyas Srinivas (Oct 16 2024 at 23:36):

We can further pin the versions of the python dependencies and use nix to pin versions of the ATPs

Shreyas Srinivas (Oct 16 2024 at 23:39):

Each "generated" sub-folder could add its own nix config so the task is distributed

Shreyas Srinivas (Oct 16 2024 at 23:39):

(deleted)

Amir Livne Bar-on (Oct 17 2024 at 06:12):

One thought about generating Lean proofs from raw outputs (generated by e.g. non-deterministic programs) is that it could be a tactic

Amir Livne Bar-on (Oct 17 2024 at 06:23):

And another way to approach proving the full graph is to verify the "extract" script in stages:

  1. Prove the correctness of the SCC reduction algorithm (or switch to an already verified one if there is one in mathlib)
  2. Prove that paths in the graph correspond to correct mathematical statements
  3. Verify that the output is generated correctly (everything up to println _.compact)

Then we can take the output file as a trusted source in further proofs.

Amir Livne Bar-on (Oct 17 2024 at 07:02):

Something I don't understand though is how each "generated" directory can be run on its own. Pruning to reduce the number of theorems works across directories and proof methods.

Michael Bucko (Oct 17 2024 at 07:32):

Amir Livne Bar-on schrieb:

that it could be a tactic

I already do it like this to some extent. I try to bootstrap the algorithm using a (very) long (sometimes article-sized) prompt (a prompt can be seen as a natural language-based program; it's mostly about the quality of one's thoughts), then continuously improve / re-evaluate.

One could essentially build Cursor for math. Or simply have the vscode-lean4 plugin re-use the LLM key from Cursor settings.

Btw. since this is a multi-step process, I usually think of this approach in terms of agentic math.

Also, does there exist like a global e-graph? If yes, I'd love to explore it (or help create it).

Notification Bot (Oct 17 2024 at 08:24):

A message was moved from this topic to #Equational > A final end-to-end theorem in Lean by Joachim Breitner.

Eric Taucher (Oct 17 2024 at 12:44):

I would like to provide some feedback and suggestions regarding the content found in

https://github.com/teorth/equational_theories/blob/main/README.md

  1. The link to Logic Programming in F# refers to the F# rewrite of the OCaml code from John Harrison's "Handbook of Practical Logic and Automated Reasoning." However, it may give the impression that the original code from the book is available in the mentioned GitHub repository in F#. The actual code for the book can be found at this link, and it is written in OCaml.

  2. I noted the book in a previous response because it contains validated code for the Knuth-Bendix algorithm, which can be found in the following file: completion.ml.

  3. It is also important to note that the "Handbook of Practical Logic and Automated Reasoning" utilizes this code to explain and demonstrate a wide range of theorem proving techniques from the ground up based on logic and higher order logic instead of a type system. The code is an excellent introduction for understanding HOL-Light.

Cody Roux (Oct 17 2024 at 16:02):

Sadly Mathlib doesn't even have directed graphs AFAIKT, so I'm pessimistic about SCCs.

Shreyas Srinivas (Oct 17 2024 at 16:02):

Mathlib has digraphs

Shreyas Srinivas (Oct 17 2024 at 16:02):

It's new and not API rich. But it's there

Shreyas Srinivas (Oct 17 2024 at 16:03):

docs#Digraph

Cody Roux (Oct 17 2024 at 19:38):

Oh cool! Feel like I looked just a few months ago

Terence Tao (Oct 17 2024 at 21:15):

Day 22

Vlad Tsyrklevich (Oct 17 2024 at 21:32):

The comment about reasonable compilation time reminded me that while poking around about the bbchallenge docs I ran across this line:

Skelet1.v takes about 10min to do some computation.
BB52Theorem.v takes about 12h (and about 4GB memory usage)

Mario Carneiro (Oct 18 2024 at 00:50):

IIRC mathlib has an SCC implementation inside tfae

Terence Tao (Oct 18 2024 at 22:33):

Day 23

Eric Taucher (Oct 19 2024 at 13:58):

Ran into a common problem when projects are developed with different operating systems such as Windows, Linux, Mac OS, etc. One of the scripts created a command line that is larger than the maximum length for a command line on Windows, so the command failed. It is noted in this topic with the specific problem being solved with the help of @Richard Copley starting with this reply.

Terence Tao (Oct 19 2024 at 22:30):

Day 24

Terence Tao (Oct 20 2024 at 22:12):

Day 25

Kevin Buzzard (Oct 20 2024 at 22:48):

"[Anand Rao formalized] more of the theory of lifting magma families" -- maybe a missing link there?

Terence Tao (Oct 20 2024 at 23:04):

Corrected, thanks!

Terence Tao (Oct 21 2024 at 19:06):

Day 26

Terence Tao (Oct 23 2024 at 04:06):

Day 27

Mario Carneiro (Oct 23 2024 at 13:44):

Mario Carneiro said:

FWIW my estimate for "significant fraction" is 80 implications

My estimate has now been beaten. I'm still betting against total resolution, if only to keep the fire on those doing the actual work, but I revise my estimate down to 5 implications.

Terence Tao (Oct 23 2024 at 17:23):

Day 28

Terence Tao (Oct 23 2024 at 17:25):

Actually, given duality, and the general (though not absolutely universal) trend that when we make a breakthrough on an equation, we resolve all of its implications, the final count of open implications is likely to be some subsum of 38+10+4+2+2.

Kevin Buzzard (Oct 23 2024 at 17:27):

Mario Carneiro said:

Mario Carneiro said:

FWIW my estimate for "significant fraction" is 80 implications

My estimate has now been beaten. I'm still betting against total resolution, if only to keep the fire on those doing the actual work, but I revise my estimate down to 5 implications.

This depends on how you count, right? There could be one key implication which is conjecturally false but its falsification could resolve 80 other implications.

I would love for you to be right. My guess is still "everything will get done" but it's very interesting that what looks to me like new techniques are being developed to resolve these last few questions and some are still holding out. And if the ultimate conclusion is a few stinkers that everyone is stuck on then that's a genuinely interesting conclusion. My guess is just informed by the fact that there are too many smart people thinking about the remaining problems right now :-)

Mario Carneiro (Oct 23 2024 at 17:28):

My count is by total number of unresolved implications. I fully expect the number to go down in large chunks, as one proof causes a whole bunch of implications to be proved

Terence Tao (Oct 25 2024 at 00:06):

Day 29

Terence Tao (Oct 25 2024 at 23:15):

Day 30

Terence Tao (Oct 26 2024 at 17:07):

Day 31

Terence Tao (Oct 27 2024 at 18:47):

Day 32

Mario Carneiro (Oct 27 2024 at 19:09):

Where are positive implications stored? Is there a tool where I can give it 854 -> 378 and it spits out a closure path?

Shreyas Srinivas (Oct 27 2024 at 19:10):

So currently we have the equation explorer, but I don't think it performs the search you are asking

Mario Carneiro (Oct 27 2024 at 19:11):

what is the mechanism by which it is marked as proved there?

Shreyas Srinivas (Oct 27 2024 at 19:11):

There is a script in ./scripts for handling transitive implications. Iirc @Vlad Tsyrklevich is the author.

Mario Carneiro (Oct 27 2024 at 19:12):

ah, seems this particular proof is directly in VampireProven.Proofs12

Shreyas Srinivas (Oct 27 2024 at 19:12):

I just don't think it is used on the explorer tool unless that changed this week

Terence Tao (Oct 27 2024 at 19:15):

We also have equational#741 that @Amir Livne Bar-on contributed. I guess we should have promoted it a bit more, the script here isn't integrated with anything yet.

Amir Livne Bar-on (Oct 27 2024 at 19:19):

If you're using it, take note that it tries to avoid duality inferences, which sometimes results in very long inference chains. You can change the cost from 1000 to 1 to if you're not interested in this.

Terence Tao (Oct 28 2024 at 17:20):

Day 33

Terence Tao (Oct 29 2024 at 19:23):

Day 34

Julian Berman (Oct 29 2024 at 20:12):

Is 1485 still seeming like it takes the "final boss" honors, or has that become a closer race as it seems the notes for today mention there's just one implication left for it?

Matthew Bolan (Oct 29 2024 at 20:13):

It's a hard implication!

Kevin Buzzard (Oct 29 2024 at 20:56):

Let's temporarily conjecture that all remaining unknown implications are false. Is it now easy to explicitly state a small number (three?) of anti-implications which, if proved, would resolve all the remaining questions? It's very hard to follow this channel and extract this information as a casual observer because it's extremely active.

Matthew Bolan (Oct 29 2024 at 20:59):

Three is too small still. 6 of the implications would be resolved by showing 1485↛1511485 \not \to 151 (Actually, I could have chosen any of the 6 there.)

Terence Tao (Oct 29 2024 at 21:06):

OK, here is the current state of affairs. The 20 remaining implications fall into five clusters:

  • 1485 cluster (Weak central groupoids): There are six open implications here (previously there were 38!), but we know them to be all equivalent, so resolving one (e.g., 1485 => 151) would resolve them all.
  • 854 cluster: The number of open implications here has been whittled down to six (previously 16, then 10), with three equivalence classes of dual pairs: 854 => 413, 854 => 1055, and 854 => 1045. (But 1055 implies 1045, so in principle just two questions need to be resolved here.)
  • 1729 (Ramanujan) cluster: There are four open implications, with two equivalence classes: 1729 => 917 "Hardy", and 1729 => 817. But 917 implies 817, so in principle just one question needs to be resolved.
  • 1323 cluster: A "mirror" verson of Ramanujan, but it just has two open implications: 1323 => 2744 and its dual.
  • 1516 cluster: Again, just two open implications (down from four): 1516 => 255 and its dual.

So the number of questions we have to resolve is somewhere between six and eight. In terms of raw numbers of questions to resolve, perhaps 854 has now overtaken 1485 in amount of work remaining.

To continue the gaming analogy, perhaps 1485 is a "final boss" that we were making some significant headway defeating, but then it transformed into an even more difficult form to protect its one final open implication (up to equivalence). We're not yet stuck on any of the five, but the remaining questions are proving highly resistant to all our previous techniques.

Terence Tao (Oct 31 2024 at 19:22):

Day 36

Terence Tao (Nov 01 2024 at 13:05):

Day 37. (Already out of date, but no time to update today.)

Terence Tao (Nov 01 2024 at 15:00):

I think we need more digits of precision on the dashboard.
image.png

Terence Tao (Nov 02 2024 at 17:17):

Day 38

Terence Tao (Nov 02 2024 at 17:32):

This is the Graphiti of the entire remaining graph of unresolved implications.

Terence Tao (Nov 03 2024 at 19:18):

Day 39

David Michael Roberts (Nov 04 2024 at 07:28):

Just out of curiosity, how many equivalence classes of equations are there? That is, how many elements is the partial order that comes from collapsing the preorder on the raw set of equations?

David Michael Roberts (Nov 04 2024 at 07:29):

I can't recall if this was mentioned before and even if it was, I don't know how I could find it

Vlad Tsyrklevich (Nov 04 2024 at 07:30):

About 1.4k

Vlad Tsyrklevich (Nov 04 2024 at 07:31):

It’d be nice to have a page on graph stats, could throw whatever facts people think of in there, I’ve got some scripts locally to compute some facts like that that I could contribute

Zoltan A. Kocsis (Z.A.K.) (Nov 04 2024 at 07:31):

@David Michael Roberts We have 1415 equations up to equivalence (out of the original 4694, i.e. not counting the "sporadic" others).

David Michael Roberts (Nov 04 2024 at 07:34):

Thanks, both!

Vlad Tsyrklevich (Nov 04 2024 at 08:09):

Vlad Tsyrklevich said:

It’d be nice to have a page on graph stats, could throw whatever facts people think of in there, I’ve got some scripts locally to compute some facts like that that I could contribute

Found the script, facts I was previously computing, showing the # of implication edges on main (10657), condensed edges (5027), condensed+reduced (4824), reduced alone (9821), and the minimum possible (8373).

Implications edges: 10657
Condensed vertices: 1415
Condensed edges: 5027
Reduced condensed edges: 4824
Reduced uncondensed vertices: 4694
Reduced uncondensed edges: 9821
Minimum equivalent graph edges: 8373

David Michael Roberts (Nov 04 2024 at 08:40):

What's the distinction between condensed/reduced?

Vlad Tsyrklevich (Nov 04 2024 at 09:00):

Condensed is equivalence class collapsed, and reduced is additionally with transitive reduction applied (so there are 5027-4824 redundant edges in the condensed graph on main)

Terence Tao (Nov 04 2024 at 22:19):

Day 40

Terence Tao (Nov 05 2024 at 22:07):

Day 41

Shreyas Srinivas (Nov 05 2024 at 22:09):

The issues we are dealing with today are not related to the incorrect setup issues we solved two days ago

Shreyas Srinivas (Nov 05 2024 at 22:10):

They have to do with the fact that we updated the lean toolchain (and consequently batteries and mathlib) from v4.13.0 to v4.14.0-rc1. There have been changes in the libraries upstream which can cause breakage in our project

Shreyas Srinivas (Nov 05 2024 at 22:12):

This is not unusual, and to some extent even expected, when lean toolchain upgrades happen.

Terence Tao (Nov 05 2024 at 22:14):

Ah, thanks for clarifying, I have edited the entry.

Shreyas Srinivas (Nov 05 2024 at 22:14):

The changes are usually of the following nature: renaming of definitions or theorems, changes to their location in the library, changes in the type signatures of definitions, changes in default tactic configuration, etc.

Vlad Tsyrklevich (Nov 05 2024 at 23:27):

Unfortunately as I noted in that thread, finite implications are not yet complete due to a bug in my calculations. It'll be another day or two before I can confirm everything well enough to verify the current state.

Pietro Monticone (Nov 06 2024 at 00:16):

Terence Tao said:

Day 41

The fixed bump has been merged.

Maxime Rinoldo (Nov 06 2024 at 23:44):

As a developer who's interested in math in general but doesn't have any higher education or background to understand the details of this particular topic, watching over this project for the last month or so has been surprisingly addictive, I suspect because it has such a quantifiable way of assessing progress towards a precise endgoal that seems reachable.

Here's a visualization of the remaining claims, based on Terence Tao's daily snapshots of the dashboard:

remaining_claims.png

I'm looking forward to you guys getting these numbers down to zero!

Terence Tao (Nov 08 2024 at 03:09):

Day 43. I am planning to update less frequently, so may miss some developments. Corrections and additions welcome as always.

Will Sawin (Nov 08 2024 at 19:59):

At what time of the project was the last positive implication proved? From the thoughts and impressions thread it looks like the last informal proof was on day 9 with formalizing the positive implications of the Vampire run. But when was the Vampire run completed?

Vlad Tsyrklevich (Nov 08 2024 at 20:31):

The relevant commit is from Oct 5.

Terence Tao (Nov 09 2024 at 05:47):

In the case of implications for finite magmas rather than all magmas, the most recent positive implication was proven informally on Nov 4, though not yet formalized.

Terence Tao (Nov 09 2024 at 20:17):

Day 45

Mario Carneiro (Nov 09 2024 at 20:20):

Mario Carneiro said:

Mario Carneiro said:

FWIW my estimate for "significant fraction" is 80 implications

My estimate has now been beaten. I'm still betting against total resolution, if only to keep the fire on those doing the actual work, but I revise my estimate down to 5 implications.

Beaten again!

Terence Tao (Nov 11 2024 at 16:10):

Day 47

Terence Tao (Nov 13 2024 at 15:56):

Day 49

Shreyas Srinivas (Nov 13 2024 at 16:06):

Terence Tao said:

Day 49

I'd call my approach a stateful heuristic instead of data structure based because the former conveys the idea a bit better than the latter. More accurately, I am trying to construct operations which are more complex from simpler ones by adding some tracking data to the operands

Terence Tao (Nov 13 2024 at 16:08):

For equations such as 1729, I am finding that implementing different rules for squares and non-squares seems to help a lot in this regard.

Terence Tao (Nov 13 2024 at 16:09):

Except that at some point one may have to start with a base model where everything is a square...

Shreyas Srinivas (Nov 13 2024 at 16:12):

So far the bigger issue on my end has been handling uncomfortable inputs to the tracking parameters and still satisfying 1729. If I can come up with a junk value construction that takes care of them, by falling back to a safe operation always satisfied by 1729, then we should have a refutation already. In the mealy machine analogy, this is equivalent to having a reject state that is safe to enter and loop in forever.

Shreyas Srinivas (Nov 13 2024 at 16:16):

I'll try to explain with an example in the 1729 thread in a short while (update : it will have to be tomorrow).

Terence Tao (Nov 15 2024 at 17:10):

Day 51

Vlad Tsyrklevich (Nov 15 2024 at 17:18):

Is there interest in adding the finite graph stats to the dashboard? I have lost track of some of the numbers myself and figure an easy-to-access place might be nice

Terence Tao (Nov 15 2024 at 17:39):

I think this would be a great idea, given how many passive participants there are following developments via the dashboard and elsewhere. I opened equational#838 for this.

Bruno Le Floch (Nov 15 2024 at 21:05):

Terence Tao said:

Day 51

Strictly speaking, "It is known that any such law must have order" at least 8.

Terence Tao (Nov 17 2024 at 18:00):

Day 53

David Michael Roberts (Nov 17 2024 at 20:31):

@Terence Tao typo: "some of them can be proven to be are group division axioms"

Vlad Tsyrklevich (Nov 17 2024 at 20:47):

Dashboard stats are live for the finite graph.

Jose Brox (Nov 17 2024 at 21:02):

Vlad Tsyrklevich ha dicho:

Dashboard stats are live for the finite graph.

It could be interesting to also add the difference between finite implications and all implications (raw number and percentage).

Terence Tao (Nov 19 2024 at 18:45):

Day 55

Shreyas Srinivas (Nov 19 2024 at 18:48):

It seems apt that 1729 vs 817 should be one of the hardest problems yet.

Vlad Tsyrklevich (Nov 19 2024 at 21:26):

Just a thought I had: in addition to collecting the small/smallest finite magma that satisfies a given law, it would also be interesting to collect a small/smallest magma that satisfies only that law and all it's implications, but no other laws. This seems fairly obvious, so perhaps this has already been proposed as an idea? I suppose there is also an infinite model analog of this as well with a vaguer idea of what qualifies as simple/simplest for laws whose implications vary between the general and finite graphs.

Terence Tao (Nov 19 2024 at 21:30):

If one really wants no other laws (of any length), then you are precisely asking for the free magma for that law (of one generator, I guess).

Matthew Bolan (Nov 19 2024 at 21:49):

One generator seems wrong, for example the free magma on one generator for x=xxx = x \diamond x is trivial, but this law itself admits non trivial models. I think in general one needs the free magma on countably many generators (though for some magmas finitely many will do, for example I think groups only need 2)

Mario Carneiro (Nov 19 2024 at 22:15):

the proof in the repo about the universality of FreeMagmaWithLaws can be strengthened to work over finitely many generators, where the number of generators required is the number of variables appearing in the law to be refuted

Terence Tao (Nov 21 2024 at 18:32):

Day 57

Bruno Le Floch (Nov 21 2024 at 22:37):

Is there somewhere a list of non-formalized implications whose informal proof involves an intricate greedy algorithm that could be usefully simplified before formalization? (E.g., 1729->817 I suppose.)

Maxime Rinoldo (Nov 22 2024 at 02:42):

Congrats everyone :partying_face:

remaining_claims.png

Do you think there's some importance to why these last few implications proved to be so challenging, meaning, this exhaustive search allowed you to find objects with a particularly interesting mathematical structure that may be worth studying even beyond this project, or was it just a given that if you consider such a large graph with millions of implications, some of them were bound to happen to be the most resistant ones?

If it's the former, is there any reason these laws weren't discovered and studied before, unlike, say, the central groupoid law, or is it simply that they are complicated enough that it wouldn't be obvious from looking at them that they would yield anything especially interesting, hence the need for a comprehensive search?

Terence Tao (Nov 22 2024 at 04:52):

Here is a summary of the equations with unformalized implications, and the proof techniques used in the informal proofs (in some cases we have two approaches):

  • 1076, 73, 63, 3308, 1722 (translation-invariant greedy OR ordinary greedy)
  • 713, 1289 (large greedy OR ordinary greedy)
  • 1447 (large greedy OR explicit construction)
  • 1692* (translation-invariant greedy involving an infinite tree)
  • 879* (translation-invariant greedy)
  • 1323, 1516, 1729* (partially explicit, partially greedy "squares first" approach)

(*): claimed

I'd say it is the last three (1323, 1516, 1729) that could benefit the most from finding a simpler proof. (1692 and 879 could also be simplified, but the formalization of their existing proofs are already in progress.)

Terence Tao (Nov 22 2024 at 04:59):

@Maxime Rinoldo I don't think we have good answers yet as to why some equations were particularly hard. We do have some implications that had "immunities" of various types - immune to translation-invariant constructions, to linear models, or to finite models, for instance. These immunities seem to not be strongly correlated to each other, though we haven't systematically worked this out. The last few equations 1323, 1516, 1729 had all three immunities (translation-invariant, linear, finite) but this could just be a random alignment. On the other hand we were stuck for a long time on 1485 (which fell to a finite model) and 854 (for which finite models could resolve some implications), so immunities don't tell the whole story.

I still find 854 fascinating - it is in an unusual "liquid" state between "solid" laws that have very rigid, symmetric models and "gaseous" laws that have very free-form, unstructured models. 854 magmas obey a large number of other interesting laws and come with other structures, such as a directed graph and a partial ordering, that indicate structure; on the other hand one can create lots of finite models that have a lot of flexibility. But while they have some internally interesting structure, they don't seem to be connected to more mainstream mathematical structures, which is perhaps why they have not been explored before.

Zoltan A. Kocsis (Z.A.K.) (Nov 22 2024 at 08:01):

Terence Tao said:

These immunities seem to not be strongly correlated to each other, though we haven't systematically worked this out.

Well, at least finite immunity implies linear immunity by a model-theoretic argument, right?

Matthew Bolan (Nov 22 2024 at 08:28):

I think it implies finite dimensional linear immunity. It can't imply infinite dimensional linear immunity, for example 1117 does not imply 2441 by an infinite dimensional linear counterexample, but I believe 1117->2441 is known for finite magmas.

Zoltan A. Kocsis (Z.A.K.) (Nov 22 2024 at 08:31):

Right, that makes sense.

Pace Nielsen (Nov 22 2024 at 14:38):

The infinite tree argument for 1692 almost certainly can be simplified. If it would be useful to the person formalizing it, I can take another look. (It was written back when we were first exploring translation invariant arguments, and we have more experience under our belts now.)

Terence Tao (Nov 23 2024 at 18:39):

Day 59

Mario Carneiro (Nov 23 2024 at 18:41):

I'm going to mention again something that worried me before: There has been some lack of emphasis on the formalization part which makes me think that it might not actually get done or the people who came up with the conjectures will get bored and move on before the details are all written down. The project isn't done yet!

Shreyas Srinivas (Nov 23 2024 at 18:41):

Yes please.

Shreyas Srinivas (Nov 23 2024 at 18:42):

This formalisation also doesn't appear to be trivial for the greedy search heuristics

Shreyas Srinivas (Nov 23 2024 at 18:42):

Can we revisit the core idea and modularise+ formalise the heuristic search itself?

Shreyas Srinivas (Nov 23 2024 at 18:43):

(at least the parts which aren't heuristic)

Mario Carneiro (Nov 23 2024 at 18:44):

I think it's fine to have those heuristic methods if they delivered results, I don't think it's necessary to redo those unless the proofs are crazy

Terence Tao (Nov 23 2024 at 18:47):

@Bernhard Reinke perhaps you can weigh in on this, since you are formalizing 3308 using the "ordinary greedy" approach. Would it benefit you (or other formalizers) to have a more precise blueprint for the other equations 1076, 73, 63, 1722, 713, 1289 that can also be treated by this approach (maybe also 879, though that one seems on track to be formalized by the translation-invariant greedy method), or is the current description sufficient? Would it make sense to try to automate this process more, or would the ad hoc approach of adapting each Lean proof for one equation to the next suffice?

Shreyas Srinivas (Nov 23 2024 at 18:48):

Maybe we could make a tactic of it?

Mario Carneiro (Nov 23 2024 at 18:49):

probably a theorem is sufficient. The hard part is finding the appropriate common part

Mario Carneiro (Nov 23 2024 at 18:50):

Having 10 similar looking proofs in front of you is an easy way to figure out what the common part is

Terence Tao (Nov 23 2024 at 19:01):

The three large proofs (for 1323, 1516, 1729) each have blueprint chapters that provide a full human-readable proof, but it is not broken down into atomic pieces the way it would for a more "conventional" formalization project. In practice, individual formalizers have taken on entire implications at once rather than have a team of formalizers work on small modular components of an implication. We could switch to that model if it turns out that the three large proofs are too daunting for anyone formalizer to attempt on their own.

Mario Carneiro (Nov 23 2024 at 19:04):

I'm most worried about the ones that don't have "human" blueprint chapters but are instead the result of a SMT solver or exhaustive search

Terence Tao (Nov 23 2024 at 19:05):

All of the outstanding "large greedy" implications that were obtained via a SAT solver now also have human-readable proofs using an alternate "ordinary greedy" method.

Mario Carneiro (Nov 23 2024 at 19:05):

how many 'conjecture' proofs are not yet in the blueprint?

Terence Tao (Nov 23 2024 at 19:09):

1722, 1323, 1516, 1729 are in the official blueprint. The treatment of the other nine equations are currently PDF files or Zulip messages, linked to from the issues associated to each equation. I could migrate them to the blueprint if there is sufficient interest. 1076, 73, 63, 1722, 713, 1289, 3308 all have rather similar proofs, so I only put one representative argument (1722) in the blueprint currently. I have also not prioritized 1692 and 879 as they are already being formalized and there are currently no requests for clarification.

Mario Carneiro (Nov 23 2024 at 19:11):

Is there a way to do the 7 similar proofs in the blueprint (i.e. taking advantage of the similarity to do all of them in one go)?

Mario Carneiro (Nov 23 2024 at 19:11):

It is probably easier to do the refactoring there in the latex before doing it in lean

Matthew Bolan (Nov 23 2024 at 19:20):

They are all greedy algorithms of the type used to resolve 854, but I believe none of them are quite as complicated as 854 was. Some of them share laws in common, such as right/left injectivity, idempotence, or no idempotents, but for most of them you are arguing that some custom greedy algorithm preserves some set of laws pretty specific to that equation. Links to the arguments for all outstanding conjectures can also be found in one place in Conjectures.lean. Note that several have been proven multiple times via different techniques.

Matthew Bolan (Nov 23 2024 at 19:27):

I also list seeds for the 7 remaining vanilla greedy algorithms here. I suppose these technically count as having been the result of an SMT solver/exhaustive search, but I think the formalizer only needs to have the tables.

Johan Commelin (Nov 23 2024 at 19:36):

Is there a dashboard for the remaining proofs that need to be formalized?

Shreyas Srinivas (Nov 23 2024 at 20:07):

In the dashboard: https://teorth.github.io/equational_theories/implications/

Shreyas Srinivas (Nov 23 2024 at 20:07):

All the unknowns are conjectured.

Shreyas Srinivas (Nov 23 2024 at 20:09):

That being said, I think many of these are indirectly implied. I believe the last but one log has a list of anti-implications that need to be proved.

Shreyas Srinivas (Nov 23 2024 at 20:10):

But I can't check on my phone right now

Matthew Bolan (Nov 23 2024 at 20:17):

I believe Conjectures.lean contains a minimal set of anti-implications which need to be formalized, as well as links to their associated constructions. Terry described the remaining constructions to be formalized earlier in this thread. Of these, I believe 879, 1692, 3308, and 1447 are currently being worked on.

Andrés Goens (Nov 23 2024 at 21:48):

Of these, I believe 879, 1692, 3308, and 1447 are currently being worked on.

1447 is done :) (equational#892)

Bernhard Reinke (Nov 24 2024 at 07:17):

Terence Tao said:

Bernhard Reinke perhaps you can weigh in on this, since you are formalizing 3308 using the "ordinary greedy" approach. Would it benefit you (or other formalizers) to have a more precise blueprint for the other equations 1076, 73, 63, 1722, 713, 1289 that can also be treated by this approach (maybe also 879, though that one seems on track to be formalized by the translation-invariant greedy method), or is the current description sufficient? Would it make sense to try to automate this process more, or would the ad hoc approach of adapting each Lean proof for one equation to the next suffice?

I think there current status is fine, but having them in the blueprint would be slightly more convenient (but that shouldn't be a big priority). As for the automatization, I can say more once I have finished 3308. I think there is some boilerplate that can benefit from a proper lean tactic: for the completion steps in the algorithm, I start with a partial solution defined over and define a new partial solution over ℕ ⊕ Fresh, where Fresh is an enumerative type. This makes it easier (in particular in case 1) to verify that the new partial solution satisfies the laws again. Then I have to push down the new partial solution on ℕ ⊕ Fresh to a solution on along a bijection between and ℕ ⊕ Fresh. The fact that I can move partial solutions along equivalences of types is something that is reminiscent of the transfer tactic in lean3. Maybe this can be made in a proper lean4 tactic again.

Terence Tao (Nov 24 2024 at 07:32):

I don't know if this is any better but one could perhaps have a partial solution defined over ℕ × Fresh and at each stage, locate an n that is larger than any first coordinate of the partial solution and glue in {n} × Fresh.

Terence Tao (Nov 26 2024 at 15:59):

Day 62

Notification Bot (Nov 27 2024 at 14:31):

A message was moved here from #Equational > Formalising Counterexamples from the Greedy Method by Shreyas Srinivas.

Terence Tao (Nov 28 2024 at 16:44):

Day 64

Matthew Bolan (Nov 28 2024 at 16:46):

I don't deserve credit for helping to streamline the code, I don't know any lean so it is not something I currently can even attempt. I only spoke in that channel due to the kerfuffle with my seeds.

Terence Tao (Nov 28 2024 at 16:49):

OK, I edited the text accordingly.

Terence Tao (Nov 28 2024 at 16:51):

By the way, it really is a strength of the structure of this project that one can make really useful contributions without having to be an expert in all of (a) universal algebra, (b) Lean, (c) programming, (d) Github etc..

Matthew Bolan (Nov 28 2024 at 17:39):

Yes, I quite like that. Personally I had to wait quite a while before I felt comfortable talking, it was only that interesting group theory problem that drew me out of hiding since it was pure math and didn't need any lean knowledge. I'm glad I did--I've picked up a lot of skills from everyone here, and it's definitely already made me a better programmer/mathematician!

Matthew Bolan (Nov 28 2024 at 17:39):

And now I'm quite tempted to learn lean...

Terence Tao (Dec 01 2024 at 16:36):

Day 67

Jose Brox (Dec 01 2024 at 19:41):

Terence Tao ha dicho:

Day 68

Technically, 10h is the maximum time allowed for the Prover9 run, for each candidate. When it fails to find a proof, that candidate possibly runs for 10h (unless there is some memory or similar exception). I have 17 candidates like that already (a several days run). But since for this candidate a proof was found, the search lasted less than 10h (I didn't track how much). A new search for the proof, with parameters better suited for this particular candidate, has lasted 970s (so 16 minutes more or less). It may be also interesting that this is a proof in two steps: A -> B, A & B -> C turns out to be much better than A -> C (at least with the same parameters).

Terence Tao (Dec 04 2024 at 16:23):

Day 70

Vlad Tsyrklevich (Dec 05 2024 at 07:14):

Vlad Tsyrklevich said:

It’d be nice to have a page on graph stats, could throw whatever facts people think of in there

I got around to this and some new info about the graph is now visible on the dashboard, as well as exports for a number of lake exe extract_implications runs.

Terence Tao (Dec 07 2024 at 16:42):

Day 73

Jose Brox (Dec 10 2024 at 15:49):

Let Graphiti export its outcome: I have used Graphiti for the first time (very nice!), as a filter to find some particular identities. In this sense it would be nice to have the option of seeing/downloading the list of equations/id numbers and their relations (in text format) to export it to other programs.

Vlad Tsyrklevich (Dec 10 2024 at 16:12):

You can extract it from the DOT file with a tiny bit of text processing, see the download link in the top bar

Vlad Tsyrklevich (Dec 10 2024 at 16:13):

The DOT file is the input I give to graphviz to generate that graph, so it has everything you see in the graph

Terence Tao (Dec 10 2024 at 16:15):

Day 76

Jose Brox (Dec 10 2024 at 16:18):

Jose Brox ha dicho:

Let Graphiti export its outcome: I have used Graphiti for the first time (very nice!), as a filter to find some particular identities. In this sense it would be nice to have the option of seeing/downloading the list of equations/id numbers and their relations (in text format) to export it to other programs.

Oh, my problem has been that I didn't see the top bar at all! (My sight is bad and I tried looking for the option at the starting page, not at the results one). Maybe the colors of the letters and/or background in the bar could be changed to improve readability? (EDITED: or increase the default letter size in the top bar).

Vlad Tsyrklevich (Dec 10 2024 at 19:34):

increased font size in equational#1017

Terence Tao (Dec 13 2024 at 16:30):

Day 79

Harald Husum (Dec 14 2024 at 21:08):

I want to share some thoughts from the perspective of someone who is not a professional mathematician. Although I have some undergrad level mathematics education, being an engineer, I'm really just following and participating in this project as an enthusiast. I find mathematics research fascinating from an outside-looking-in perspective and am the kind of person that is really grateful for e.g. Quanta Magazine for making topics in pure math research accessible to the lay person (being one myself).

For me, the most beautiful thing about this project is its openness. Reading articles about mathematics research paints a picture of how it is done, but leaves a lot to the imagination. It is clear that a large fraction of cooperation between mathematicians happens in relative closed fora, be it email conversations or in-person meetings.

Contrast this with the collaborative work we see going on here, in threads like #Equational > FINITE: 677 -> 255. Seeing the details of how mathematical arguments are shared and iterated on in real time, with false starts, dead ends, negative results, and all, is enlightening for me. It makes me understand a lot more about how the frontier of mathematics is pushed forward.

This is interesting to me as an enthusiast, but I think it is providing much broader value beyond satisfying my intellectual curiosity. To young aspiring mathematicians, I believe the communication here is leading by example in terms of how to conduct research in general and collaborative research specifically. It paints an honest picture of the hard work one often needs to put in before one arrives at a positive result. This can be lacking in a more traditional workflow where all you often see is the final published paper proving some result. It also illustrates quite effectively how much can be achieved when passionate researchers share ideas and build upon each other's work in a relatively selfless manner, and it gives a concrete example of what that can look like.

Furthermore, this Zulip channel makes explicit immense amounts of knowledge that would remain tacit if the only artifact this project produced was a paper and some lean code. I have a feeling that this is democratizing a lot of expert knowledge in a relatively effective way. I imagine this is particularly true for a lot of the negative results that might be hard to publish by traditional means.

Finally I want to commend the professional mathematicians participating so publicly in this project. For some I imagine it can be quite a change from what one is used to. Working in public means you are showcasing not only your best work but also your mistakes and flaws. This can require a certain amount of confidence, I am sure. But, following my reasoning above, I think you are all doing a great service to the field. Thank you!

Terence Tao (Dec 16 2024 at 16:21):

Day 82

Bruno Le Floch (Dec 16 2024 at 17:17):

254 should be 255 there.

Terence Tao (Dec 22 2024 at 04:37):

Day 88

Bruno Le Floch (Dec 22 2024 at 09:01):

4692 should be 4694. "that this problem was completely solved"→"solved this problem completely"

Terence Tao (Dec 27 2024 at 07:18):

Day 93

Terence Tao (Jan 03 2025 at 23:03):

Day 100 - a rather quiet milestone

Terence Tao (Jan 11 2025 at 20:29):

Day 108

Bruno Le Floch (Jan 12 2025 at 23:09):

A tooling question: How can I determine the list of all non-trivial equivalence classes among our 4694 equations (under usual implication, the main topic of the project, nothing fancy)?

Motivation: I want to test my hunch that when there is more than one equation with the lowest order in an equivalence class, the equivalence class defines an interesting structure.

Full list of order ≤2 and some higher-order ones:

Vlad Tsyrklevich (Jan 13 2025 at 04:59):

I exported the implications and processed it using src/graph.rb. Here is the equation with the lowest number in every non-trivial equivalence class.

Non-trivial equivalence classes

Bruno Le Floch (Jan 13 2025 at 08:53):

Thanks for the list! After installing ruby, and after gem pry install I got ruby graph.rb to run in a terminal. This gives an interactive mode with prompts starting with [1] pry(main)> , [2] pry(main)> etc, prompting me for commands. Clearly it knows about a Graph class with some methods (Graph.initialize complains that it is a private method while Graph.from_csv wants a file argument). Where is the data, and how to get to your list using graph.rb please?

Vlad Tsyrklevich (Jan 13 2025 at 09:00):

Yes, sorry for not sharing that originally. I wasn't sure if you just wanted the data. First you can run lake exe extract_implications --json > /tmp/implications.json or download this JSON from the dashboard (here) if you don't have a working lean build env set-up. Then cat /tmp/implications.json| ruby -rjson -e 'JSON.parse($stdin.read)["implications"].each { |s| puts s["lhs"][8,10] + "," + s["rhs"][8,10] }' | sort -u > /tmp/implications.csv will turn it into a CSV. Then you can use in pry:

graph = Graph.from_csv("/tmp/implications.csv");nil
result = []
graph.scc.each { |scc| next if scc.length == 1; result.push scc.sort[0] } ;nil
puts result.sort.inspect

Bruno Le Floch (Jan 13 2025 at 10:29):

Thank you, that was very helpful. I needed these details because I was not interested in all non-trivial equivalence classes, but only those that has multiple lowest-order equation, because it is too common to have higher-order equations equivalent to lower order ones. I consider my hunch mostly wrong: some cases have good description but others not so much.

Non-trivial equivalence classes of eqs of minimal order ≤3

Ibrahim Tencer (Jan 14 2025 at 07:44):

Bruno Le Floch said:

  • Eq14[x = y ◇ (x ◇ y)] equivalent to its dual Eq29[x = (y ◇ x) ◇ y] has shown up a lot in this project, although I don't know how to think of these magmas well.
    • [14, 29] Putnam law x=y*(x*y) has shown up a lot in this project, it's related to Steiner triples, although I don't know how to think of these magmas well.

What is the relation to Steiner triples? This is a bit of a tangent but I was also wondering about what these magmas look like. A few partial results:

Terence Tao (Jan 14 2025 at 17:46):

Ibrahim Tencer said:

Bruno Le Floch said:

    • [14, 29] Putnam law x=y*(x*y) has shown up a lot in this project, it's related to Steiner triples, although I don't know how to think of these magmas well.

What is the relation to Steiner triples?

If one lets R(x,y,z)R(x,y,z) denote the relation xy=zx*y=z, then the Putnam law is saying that this relation is cyclic: R(x,y,z)    R(y,z,x)R(x,y,z) \implies R(y,z,x). Thus, we have a collection of directed 3-cycles xyzxx \to y \to z \to x with the property that every directed edge xyx \to y belongs to exactly one 3-cycle in the collection. For comparison, a Steiner triple system is a collection of undirected triangles xyzxx-y-z-x such that every undirected edge xyx-y belongs to exactly one triangle in the collection. So I think the Steiner triple systems precisely correspond to the abelian Putnam magmas.

From a quick glance at https://ajc.maths.uq.edu.au/pdf/71/ajc_v71_p485.pdf it appears that Putnam magmas are also known as "Mendelsohn directed triple systems", if I have parsed the definition correctly.

Matthew Bolan (Jan 14 2025 at 18:03):

I think one also needs to ask an abelian Putnam magma be idempotent for it to correspond to a Steiner triple system (and the idempotent Putnam magmas are Mendelsohn triple systems).

Bruno Le Floch (Jan 14 2025 at 19:16):

Matthew is right that you need idempotence to really get triple systems. In general I think we have to specify the squaring map without 2-cycle (namely with (x2)2=x(x^2)^2=x iff x2=xx^2=x) and then we get triple systems with omission of all edges (x,xx)(x,x\diamond x) and (xx,x)(x\diamond x,x).

Eric Taucher (Jan 17 2025 at 21:24):

Looking for ideas on how to find patterns in the solution arrays from tools like MiniZinc, etc. While it is easy to check for properties such as Idempotence, Commutative property, etc. what are ways to search for any pattern in the data that would hold across all of the solutions for one solution set. I did ask a few LLMs which noted:

Terence Tao (Jan 19 2025 at 05:11):

Day 115

Andrés Goens (Jan 27 2025 at 07:47):

In #Equational > GREEDY-GREEDY HYBRID: Refute 1516->255, @Pietro Monticone, @Lorenzo Luccioli and @Marco Petracci
have found somse issues with the refutation 1515->255. This seems quite interesting with regards to one of the core goals of the project, namely how Lean (or, more generally, proof assisstant) enables large scale collaboration and trust. I did not follow this refutation closely, but it seems from #Equational > 1516 -> 255 that several people were involved, so this was a collaborative effort to arrive at the proof.

My questions are:

  • have these problems been caught because of Lean, and not otherwise? In other words, if this had been a pen-and-paper only proof, would you have been happy with the level of detail that was previously in the blueprint, or
  • Have these problems arrisen because of the use of Lean in the background, i.e. by knowing that they would still need to be formalized you were happy with a more high-level proof/without the same level of scrutiny?
  • Are these problems still very much in the realm of "clearly fixable", where any mathematician with enough background/familiarity would have immediately known how to fix them, or was there something were it was not so clear?

This is of course not just for this refutation (I think some issues have been found and fixed a few times before, e.g. with refutations from 1722), so I think this question is more generally aimed at the times this has happened. Sorry for asking people to talk about their mistakes :flushed: this is absolutely not meant to focus on where the problems came from (we know we all make mistakes), it's more about understanding the role of Lean (proof assistant software) in this project

Terence Tao (Jan 27 2025 at 18:05):

These are good questions. In a hypothetical alternate universe where we were trying to complete the Equational Theories Project without a formal proof assistant language such as Lean, I would like to believe that when the time came to write the (now much lengthier) paper (which would roughly speaking correspond to the current blueprint), there would be someone checking the proofs and catching the issues (some of which were simply typos or bad communication, but some of which did require non-trivial fixes to the structure of the argument). But I think that because we knew that the proofs would be formalized in Lean eventually, that this sort of careful human checking was not really done for this particular implication, especially since it was one of the last implications to be resolved, and so understanding the proof in fine detail in order to apply it to remaining implications was not as important as with earlier proofs in the project (where for instance deconstructing the early greedy algorithm arguments was very helpful in deploying versions of that argument to other implications).

This implication is one of the most complex implications obtained in the whole project (only the 1729->817 one is more complex, I think). There are certainly many simpler implications where a proposed proof was given, and then retracted after someone else in the project pointed out a difficulty well before any attempt at formalization, but this one I think reaches a threshold of complexity where casual "peer review" of the other participants was too challenging. Part of that was my fault when writing up the blueprint; I did not break things up into atomic lemmas in a manner suitable for modular inspection of the argument, but instead described a rather complicated greedy algorithm as a large sequence of adjustments to a simpler algorithm, which was closer to how I arrived at the algorithm in the first place, but not ideal for verification.

Shreyas Srinivas (Jan 27 2025 at 18:07):

This process of describing algorithms by incremental tweaking is common in stoc, focs , and soda papers

Matthew Bolan (Jan 27 2025 at 18:20):

When there were around 10 equations with implications left to be formalized I made a point to go through the proofs of each carefully, this is when I found the last few errors in 1722 for example. However, I did not check 1729 or 1516 due to the reasons Terry suggests, and I did not check 1323 because it seemed that Bruno had already done so somewhat carefully.

Lorenzo Luccioli (Jan 28 2025 at 08:28):

The problems mentioned in this message all came up during the formalization process. I actually only realized these issues when the formalization was almost done. Partly because they happened to be in the parts of the proof we tackled last, and partly because I was hoping that formalizing the rest would give me some new insight to fix them autonomously.

We did review the proofs before starting the formalization, but it was mostly to understand the general strategy and figure out how to approach the formalization, so we weren’t as thorough as we would’ve been if we weren’t planning to go over every detail when translating it into Lean anyway.

I’m sure we could’ve caught issue 3 with a more careful review, but I’m less confident about 2. There were a lot of cases to check (in the last implementation, there were at least 8, since the 3 hypotheses introduced structures that could each take 2 forms). On top of that, some of those cases needed to be split even further. Out of all of them, only one case was explicitly mentioned in the informal proof (and that one worked fine), and just 2 cases ended up being problematic. For the rest, it wasn’t too hard to show that either the hypotheses became contradictory or the result was true anyway. So, I think without the formal framework, it would’ve been easy for this mistake to slip through a review.

As for how tricky the problems were: apart from issue 1, which was easy to fix, the other two weren’t straightforward—at least for us. That might be because we didn’t have much experience with similar proofs, as we did not contribute to other greedy proofs, and as Terry pointed out earlier, this proof was already pretty complicated. So it’s also possible we were just a bit overwhelmed by the complexity of both the proof and the greedy structure implementation.

Terence Tao (Jan 30 2025 at 17:21):

Day 127

Terence Tao (Feb 06 2025 at 02:27):

Day 133

Bruno Le Floch (Feb 12 2025 at 22:20):

What is the rough timeline for finishing up the paper? Same question for the blueprint? I have some constraints until March 1st that make it difficult to finish writing up my blueprint chapter on the spectrum question, and chapter on Higman-Neumann characterizations of groups with Jose Brox, until early March.

Terence Tao (Feb 12 2025 at 22:31):

The blueprint is somewhat secondary to the project and I think it will be fine for it to be in a somewhat disorganized state, as long as the Lean codebase and the paper is in good shape. On the Lean side, there are still two difficult anti-implications to be formalized, and I imagine that that process might also run into March, and even after that there might be some refactoring to get a satisfactory "end to end" theorem in Lean that really does completely formalize the description of the implication graph without using any human-proved but not formalized results (e.g., duality). But that would definitely be the endgame stage where we would like to wrap up the paper writing while our memories of the project are still relatively fresh.

Terence Tao (Feb 25 2025 at 17:40):

Day 153

Terence Tao (Mar 22 2025 at 02:35):

Day 177

Terence Tao (Apr 14 2025 at 16:01):

Day 201. Quite a long journey! But we still have to write the paper.


Last updated: May 02 2025 at 03:31 UTC