Zulip Chat Archive

Stream: Seymour

Topic: Comments from peer review


Martin Dvořák (Nov 14 2025 at 13:10):

Unfortunately, we got rejected from CPP. Below are the comments from the peer review.

Martin Dvořák (Nov 14 2025 at 13:10):

Paper summary

The Paper present a formalization in Lean4/Mathlib of composition direction of Seymour's Theorem for regular matroids. Recall that matroids are a combinatorial abstraction of the notion of independence (as in linear independence in vector spaces, algebraic independence over a field or acyclicity in graph). Seymour’s Regular Matroid Decomposition Theorem says that each regular matroid can be obtained from graphic matroids, their duals, and copies of R10 by taking 1-, 2-, and 3-sums. They only proove the composition direction stating that matroid constructed as above are regular.

The main difficulty to formalize such broad abstractions as matroids is that they are several equivalent definition which may rely or not on some finiteness assumption. Hence the authors start the paper by making a case for their choice of approach: They choose to follow a less general one that avoid using too much of not yet formalized graph theory. In return it requires proof to go across three levels of abstraction (matrices, standard representation matrices, and binary matroids in general).

Comments for authors

The paper is well written and present a substantial formalization effort a large part of which deserve to be integrated in Mathlib. The technical choices are clearly discussed. Moreover the authors take care of giving some feedback and advice about Lean Mathlib and the aesop tactic (in appendix).

However, aside the choice of the mathematical approach, translating the definition into formal one seems to be relatively straightforward, with few genuinely novel or challenging aspects beyond the usual dependent type issues (e.g., handling matrix dimensions in block matrices). Also it is far from being the first formalization in matroid theory. As such, the paper offers only incremental progress rather than a significant new contribution. Taken together, the work does not reach the level of originality or depth expected for acceptance.

We also have a few suggestions concerning the presentation: it is quite technical with a lot of repetition in the presented code that could be abbreviated (see eg: Set.drop?, Matrix.toMatrixDropUnionDrop, standardReprSum3 or the three identical Matroid.IsSum?of.isRegular theorems). On the other hand the reader not specialist of matroids, may have difficulty to follow the logical progression of the paper. I would strongly recommend the following changes:

  • The authors seem to assume that the reader is familiar with the notion of 1-, 2-, 3-sums. They should give a broad intuitive idea of what they are right at the beginning of the introduction.
  • The author should insert couple of paragraph in the introduction explaining the structure of the paper as well as his logical progression
  • at the beginning of each section, the author should add a little paragraph recalling the reader where he is in this logical progression.

Martin Dvořák (Nov 14 2025 at 13:11):

Paper summary

This paper presents a formalization of the composition (forward) direction of Seymour's decomposition theorem for regular matroids in Lean 4. This direction proves the structural property that any 1-, 2-, and 3-sum of two regular matroids is a regular matroid.

This new library establishes formal definitions and proves properties for a range of necessary concepts, including totally unimodular matrices (some of which were contributed to Mathlib), vector matroids and their standard representations, the definition of regular matroids via totally unimodular representations, and the 1-, 2-, and 3-sum operations.

Using this, the authors provide a machine-checked proof of the composition direction. An interesting aspect of this formalization is its generality: the proof holds for matroids that have finite rank but may possess infinite ground sets. In addition to proving the composition direction, the authors also provide a formal statement of the corresponding decomposition direction (that all regular matroids can be decomposed into graphic, cographic, and R10R10 matroids) to complete the formal statement of Seymour's theorem.

Comments for authors

The paper is an interesting read, and describes the key definitions and main statements for the theorem in question. The "trusted code" is presented clearly. The pictures of matrices are nice and helpful also! For me, there are some balances which I think were not struck quite right, however.

  • The definition of regular matroid presented here is not the typical one given, which I was somewhat surprised to see. More surprising is that this distinction and alternate definition was not explicitly mentioned, and I would have liked to see comments on why this particular definition was chosen, as opposed to the usual "representable over any field" version: it's a nontrivial theorem that they're the same! Consulting Truemper, I see that perhaps this was simply due to following the source very closely, and arguably therefore this is covered around line 235, but a difference in definition for a word which is mentioned in the title is a big enough difference to deserve some more detail.
  • The paper spends much time on definitions, and much less time on the difficulties (or lack thereof) of formalising the theorem itself. It is nice to display these definitions, but I would have liked to see quite a bit more detail in Section 11. For instance, line 1134 writes that the proof for 1-sums is nearly identical to the source, but a (sketch) proof of what that is could have added to the presentation here. And a sketch proof of the titular theorem also seems worth mentioning. For 3-sums, a high level description of the differences between the informal and formalized proof is given, but it is tricky to contextualise this without referencing (or already knowing) the informal proof; and a slightly less high level description would also be interesting to read. More generally, I'd like to know if there are any insights gained from this formalization that could be applicable elsewhere.
  • Relatedly, it seems there is little discussion of the design choices made when writing some of the definitions. For instance, matrixSum1 is a function, while Matroid.IsSum1of is a predicate, but why was standardReprSum1 chosen to be a (partial) function too? As it's partial, it seems more natural to use a predicate, which checks the precondition too. Lots of time is taken to detail the definitions, so I would like to read the authors' thoughts on what they learnt (and any wrong steps) taken towards getting these ones.

Minor comments:

  • Giving a summary of the structure of the paper near the end of the introduction would be valuable for signposting.
  • Around line 190, explicitly mentioning Oxley and Truemper would be nice here, to avoid the reader jumping to the references.
  • Around line 290, there's a slight ambiguity in the informal descriptions: where exactly is B2 quantified in (iii)? A little thought or foreknowledge reveals the answer, but I think a nicer way to present this would be to declare the word base, then define independence, then write the base axioms with reference to the notion of independence (closer to how Mathlib does it: using indep to define the maximality axiom)
  • When defining independence, there's also a slight ambiguity: does "any base" mean for all bases, or for some base? Again, an experienced reader will know the answer, but some readers may be encountering matroids for the first time here!
  • The choice of variable names is unfortunate in this section, the informal version sometimes uses different letters from the formal version, meaning we're left with many, many different letters around: M, E, B, X, I, J, alpha, P, Y. It may be worth looking for some overlap if it can be found here.
  • Around line 360, it may be worth explicitly mentioning that m,n are arbitrary (finite) types, not necessarily natural numbers: some readers expect matrices to be indexed by naturals! This is referenced again around line 808. This may also give you the opportunity to recall what Fin is, and why it's relevant to your formalisation (you mention Fin 2 and Fin 3 fairly often later)
  • Certain variables are declared repeatedly, most notably {𝛼 : Type*} and [DecidableEq 𝛼]. Consider omitting these, or mentioning them only once near the start, to save you space and to save the reader the overhead (a kind of LaTeX-variable!). You may also consider ignoring DecidableEq assumptions altogether in the presentation, especially since at no point is the computability aspect mentioned in your discussion.
  • Line 432: how does this differ from Subtype.toSum?
  • Line 510: The source [12] only mentions the standard representation once, so giving an informal description of what it is here would help a lot: you write "standard representation matrix B", but this isn't a well-known concept to most readers of papers in this journal! More generally, [12] is a big reference and it's used often, so for points like this which are specific, writing something like [12, p32] or [12, Sec 3.2] is quite a bit more useful.
  • Line 591, "as it would be hard to verify that total unimodularity is preserved in our prior approach." I'd very much like to know why this would be hard! In particular, I'm more interested in this than getting another reminder that {𝛼 R : Type*} [DecidableEq 𝛼].
  • Line 860: mentioning the dimension of D0 explicitly would be useful.
  • Section 10 overall gives me quite a lot of code, some of which is quite repetitive. I'm not sure that the contents of the definition of Matrix.toMatrixDropUnionDrop, for instance, is all that valuable: perhaps a link to the full definition together with the informal description and a summary could have sufficed. Or, if there's a particular point you'd like to stress which is made by this definition, you should explain it more explicitly.

Martin Dvořák (Nov 14 2025 at 13:11):

Paper summary

This paper formalizes the statement of Seymour's matroid decomposition theorem in Lean and proves the composition direction (that 1-, 2-, and 3-sums of regular matroids are regular). The authors have developed supporting formalizations around totally unimodular matrices, vector and regular matroids, standard representations, and the 1-, 2-, and 3-sums of vector matroids. The formalization builds upon the Matroid library in Mathlib.

It presents proof-engineering ideas that helped simplify the formalization compared to existing proofs, avoiding the combinatorics of ΔΔ-sums and re-signings by introducing a "canonical" signing.

The supplementary materials appear to be accurate formalizations of the presented material. It is structured and appears to be reusable for future work toward proving the full theorem.

Comments for authors

This formalization work is a notable milestone and appears to be valuable for future developments toward a full proof of Seymour's decomposition theorem. At the same time, the paper currently reads more like a technical companion to the formalization rather than an articulation of the mathematical or formal-methods contributions. Much of the exposition describes what was implemented, with comparatively little attention to why the result is interesting, what conceptual or engineering challenges arose, or what was learned that could transfer to other formalizations.

Below are some suggestions to help strengthen the paper in a revision:

  • State the main theorem precisely. The theorem is given informally, but the paper does not present a clear mathematical formulation of the composition directed being formalized. An exact statement would help clarify the scope of the formalization. An example of the ambiguity is that the full matroid decomposition appears to be, Matroid.RankFinite.isRegular_iff_isGood, and the composition direction of it is Matroid.IsGood.isRegular, which not only shows the regularity of 1-, 2-, and 3-sums of regular matroids (as shown in the paper), but the regularity of graphic matroids (proved in the code), and of cographic matroids (stubbed out with sorry). A clear statement of the theorem in the paper would help prevent confusion.
  • A stronger narrative of the mathematics of the theorem would help motivate why formalizing the theorem is difficult or interesting. The paper mentions some applications of the theorem, but it would help to understand what is important about regular matroids. Furthermore, a description of what structural phenomena that the various sums capture would be elucidating.
  • A discussion of the decomposition direction, even if only qualitative. The paper hints that this is the natural next step, but a section outlining what remains to be formalized, what obstacles are expected, and how the current formalization positions that work would be beneficial to the work.
  • The narrative around "compromises" can be a discussion of reasonable scoping decisions, rather than weaknesses. This can also help with the clarity of the rest of the presentation; for example, the paper could add the assumption that matroids have finite ground sets, and whatever generalizes to the infinite matroids of Bruhn et al. (the Matroid in Mathlib) can be briefly mentioned in a short section.
  • Relatedly, reflect more on design choices and trade-offs. Section 2 briefly compares Oxley vs Truemper as foundations but does not analyze the consequences to future developments. It explains the immediate convenience, but not the appropriateness, or the potential design tradeoffs. The brief sentence about experiments in proving equivalences being hard could be expanded upon to paint a picture of what was hard.
  • Differentiate routine formalizations from novel ideas. Sections 4-10 contain many code listings, but it's not clear what required new mathematical ideas, formalization ideas, or design choices. For example, in section 7 it is unclear whether totally unimodular signings are a new idea that simplified the formalization or if they are a direct formalization from the literature.
  • Make novelty more explicit relative to prior work. The related work section is detailed but largely descriptive. It leaves it to the reader to determine the novelty of the formalization.
  • Trim low-level implementation details. The Decidable class appears frequently in excerpts, but without explanation. If these are only required to interface with Mathlib, a brief comment suffices and could be omitted, since they distract from the conceptual presentation. If these are required to enable proof by reflection, that is worth explaining as well.
  • I noticed in the code that R10's regularity uses a proof by reflection. That is potentially interesting for CPP, yet it does not appear in the paper. If it is omitted because it does not appear in the paper's formulation of the composition direction, consider still highlighting it as it may interest readers.

Small comments:

  • 345 and elsewhere: "infinite matroid" vs "infinite ground set" is somewhat confusing. My understanding is that infinite matroid is a specific axiomatization of matroids from Bruhn et al., and it might be misunderstood by readers.
  • 361: Citing Schönfinkel (1924) for currying feels out of place. Multi-argument functions are standard in Lean and not specific to matrices.
  • 407 and elsewhere: Avoiding the set->type coercion and using Set.Elem directly is unusual and might merit explanation.
  • 226: "graph theory which is not supported in Mathlib" could be phrased more carefully, since Mathlib has graph theory, but perhaps not the specific graph theory needed for [9]/[12].
  • 1229: Typo, Matroid.IsRegular.isGood_of_rankFinite instead of Matroid.IsRegular.isGood.

Evgenia Karunus (Nov 14 2025 at 13:33):

Interesting comments, thanks for posting, good to know what's generally expected!

Martin Dvořák (Nov 15 2025 at 07:28):

I am confused by the fact that the first reviewer wrote

"Also it is far from being the first formalization in matroid theory. As such, the paper offers only incremental progress rather than a significant new contribution. Taken together, the work does not reach the level of originality or depth expected for acceptance."

but then the same reviewer added a comment:

"The consensus among the referees is that the paper’s rejection stemmed not from its (quite good and substantial) formalization, but from shortcomings in its presentation. Too few research outcomes are presented here, as opposed to merely describing the implementation. One would like to see reflection on what was difficult, novel, or what conceptual insights were gained. We all believe that with significant restructuring, it could become a strong paper. Therefore we encourage the author to resubmit, maybe with work toward the full theorem (which is admittedly more difficult)."

Evgenia Karunus (Nov 15 2025 at 07:48):

I think they implied that the conceptually interesting stuff should be described verbally in a paper, because it's hard/impossible to determine what the novelty is by skimming the code.

Evgenia Karunus (Nov 15 2025 at 07:48):

Meaning - the easy-to-find novelty could be "this is a first formalization of matroid theory", which it isn't.
Another easy-to-find novelty could be "translating the definitions into formal definitions", which also doesn't work because that part seemed straightforward.

So they need a verbal description of what was conceptually interesting and why, because their immediate guesses didn't work.

Alex Nelson (Nov 20 2025 at 16:01):

In addition, after carefully reading the reviewers's comments yesterday several times, I feel they are also implicitly asking the questions, "Who is the audience for this paper? Who are the readers you are writing for?"

Martin Dvořák (Nov 20 2025 at 16:16):

Oh yeah, we were keeping it implicit. In my Ph.D. thesis, which is more long-winded, I extensively talk about who the intended audience is, but in the paper, we kept it implicit. We did, however, have an internal document answering this question; unfortunately, I cannot find it now. From what I remember, we all agreed on writing for a reader who knows Lean but doesn't know what a matroid is. I think the only exception is the appendix, in which we write for the authors of the tools we were using.

Alex Nelson (Nov 21 2025 at 02:07):

How well do you think the article spoke to that audience? And do you think it was a suitable audience for the CPP?

Martin Dvořák (Nov 21 2025 at 08:55):

I don't know. Do you have a suggestion?

Alex Nelson (Nov 25 2025 at 16:02):

Well, I will be frank: when I joined the project, I thought the paper was written for matroid theorists who were already pretty competent at using some proof assistant (not necessarily Lean, and not necessarily based on CoC/dependent-types).

Alex Nelson (Nov 25 2025 at 16:03):

But looking at the paper again (with the benefit of some time passing since I last looked at it), I really can't shake the feeling that there are several different audiences being spoken to, consequently none of them are adequately addressed.

Martin Dvořák (Nov 26 2025 at 16:38):

After taking a step back and rereading the comments, I understand the criticism as follows...

There is huge disconnect between what I wanted to write and what the reviewers wanted to read (and, of course, there is also some disconnect between what I wanted to write and what we wrote in the end, but this gap seems much less important).

From my point of view, I saw our contributions divided into two levels, with the primary level being infinitely more important than the secondary level.

Primary contributions:

  • the definitions
  • the theorem statements
  • there exists a sorry-free proof (the only thing about the proof that matters is what #print axioms shows)

Secondary contributions:

  • what the proof looks like
  • what we learned from it
  • what we could have done differently

My plan was to cover the primary contributions as perfectly as possible, and bring up the secondary contributions only if some space remains.

However, the reviewers had completely different perspective! They would say something like:
"OK, there are some definitions, good, there are some theorems, good. However, all of it is in the code. Give me something that isn't written in the code!"

From the reviewers' perspective, the goal of writing the paper was not to summarize the project for somebody who doesn't open the repository, but to provide some surplus value to somebody who opens the repository and has the paper next to it. And we didn't provide enough of those "extras".

Martin Dvořák (Nov 26 2025 at 17:01):

Put differently, we were answering this question:
"What is objectively true about our project?"
While doing so, we neglected this question:
"What is interesting about our project?"

With the new perspective, our new manuscript should go much further into "what is interesting" and "what was challenging".

That said, I am probably the worst person from the entire team to elaborate on "what is interesting" and "what was challenging".
There wasn't a single moment in the project when I would be approaching it as "let's learn what is easy in Lean and what is surprisingly hard to formalize".
Every single time, I was approaching the project as "let me create a sorry-free proof, no matter the cost".
Of course, I could reflect on what was hard for me, but every time something was harder than I expected it to be, I don't know if it was harder than I expected because it was inherently hard, because Lean made it hard, or because I chose a wrong approach.
For example, I can confidently say that proving standardReprSum3_hasTuSigning was hard for me, but I cannot say why it was hard (and I definitely cannot say whether it would be similarly hard for a different person).

Hopefully, other contributors can reflect on "what was hard" better and provide a better insight into "what is interesting".


Last updated: Dec 20 2025 at 21:32 UTC