Zulip Chat Archive
Stream: mathlib4
Topic: Dropping hyphens
Harald Husum (Aug 26 2025 at 11:03):
I want to shorten "non-zero" to "nonzero" and "non-empty" to "nonempty" across mathlib. I am motivated by The Chicago Manual of Style that encourages doing so: https://www.chicagomanualofstyle.org/qanda/data/faq/topics/HyphensEnDashesEmDashes/faq0079.html
The idea is that hyphens should be dropped if they do not serve a purpose. Among purposes that can be served by a hyphen is:
- Avoiding doubling letters: Some authors prefer "non-negative" over "nonnegative" because of the double "n".
- Respecting proper names: E.g. "non-Euclidean"
- Avoiding ambiguity in larger compounds: E.g "non-zero-divisor".
But in the cases of "non-zero" and "non-empty", the hyphen is just wasted space, as far as I can see.
However, my PRs (#28949 and #28951) have raised a bit of discussion, and I wanted to bring this discussion to the attention of the community to see whether I can get consent.
The opposing viewpoint seems to be that this should be considered a matter of taste and left up to individual authors.
Any takes on this?
Monica Omar (Aug 26 2025 at 11:50):
I personally prefer non-zero and non-empty as they sometimes feel more readable to me. Since it's not incorrect to add hyphens here, I think it shouldn't matter all that much.
Martin Dvořák (Aug 26 2025 at 12:11):
I like to talk about nonzero-sum games.
Kevin Buzzard (Aug 26 2025 at 12:18):
In some sense any change here will be futile unless you also write a linter stopping people from adding the versions you don't like in later PRs.
Kim Morrison (Aug 26 2025 at 12:24):
Individual taste, please. We're in the weeds now in terms of improving readability, and the bother of implementing the rules exceeds the payoff.
Harald Husum (Aug 26 2025 at 14:17):
To be clear, I don't desire to implement this as a rule, @Kim Morrison, and I do not think it is wise to lint against this, @Kevin Buzzard. Either would add friction to the PR process, which clearly outweighs the benefit.
I don't mind the fact that "non-zero"s and "non-emty"s will be slowly accumulating again after the PRs are merged. In my mind, this kind of grammar issue is best dealt with by way of infrequent bulk PRs, done by those who care to, like me. In that way, no additional workload is placed on contributors or reviewers.
Hence, what I am asking for is nothing at all beyond getting these two PRs merged. Just want to make sure my intentions aren't misunderstood.
Jireh Loreaux (Aug 26 2025 at 15:20):
It is additional workload on reviewers though. Every minute I spend reviewing one of these PRs is a minute I can't spend reviewing a PR about mathematics. And note the time devoted to this discussion too. So either we should choose to care and lint against it, or we should choose to not care and not bother. As Kim suggests, I think we should go for the latter.
Note: this is not leveled as a criticism of you. I'm only trying to inform about the reality of the situation.
Jireh Loreaux (Aug 26 2025 at 15:21):
I do appreciate that you are only trying to help out. I don't construe any malintent.
Kyle Miller (Aug 26 2025 at 15:43):
I like this general principle for large collaborative projects:
Contributors should feel free to express themselves using their own local style, so long as (1) the style guide has nothing to say on it, (2) the local style is understandable, and (3) it's not an idiosyncratic style. Furthermore, contributors are not free to make changes to other people's stylistic expression unless it's incidental to a substantive contribution or it is a PR to fix style guide violations.
I tend to be more on the side of "let people do what they want within established conventions, so long as the expression is clear and the style doesn't obscure intention" (even for code formatting).
So, unless The Chicago Manual of Style is adopted as part of the mathlib styleguide, I would suggest closing the PRs. The fact even a single person has spoken in favor of hyphens, paired with the fact I've seen that style before, is enough for me at least to not feel comfortable merging.
(Should we adopt more formally some version of the "don't touch other peoples' style unless it's a style guide violation" principle?)
Monica Omar (Aug 26 2025 at 16:05):
(Should we adopt more formally some version of the "don't touch other peoples' style unless it's a style guide violation" principle?)
Or if it's to expand on the docs or make something clearer; so not just pure style edits
Kyle Miller (Aug 26 2025 at 16:06):
That's meant to fall under "substantitive contribution". Clearly it needs to be clearer :-)
Michael Rothgang (Aug 26 2025 at 16:30):
I would make an exception for clear fixes: a big collection of typo fixes is welcome, in my opinion. Even if these are rules that are often ignored (hausdorff to Hausdorff, for example, or correcting the spelling of person's names).
Michael Rothgang (Aug 26 2025 at 16:31):
Otherwise, I agree: once a change is controversial (and this one clearly is), it's not worth bikeshedding over.
Harald Husum (Aug 26 2025 at 16:41):
Jireh Loreaux said:
It is additional workload on reviewers though. Every minute I spend reviewing one of these PRs is a minute I can't spend reviewing a PR about mathematics.
I am not asking you to review the PR unless you care about writing style. I just had hope that some reviewer cares enough, that is all. If there is such a reviewer, they would presumably not find it to be a burden.
Jireh Loreaux said:
And note the time devoted to this discussion too.
I don't see that I am forcing you to engage with me. I hope you do so out of your own volition. And I really don't think I should feel guilty for raising this discussion, even if you don't share my opinion.
Kyle Miller said:
The fact even a single person has spoken in favor of hyphens, paired with the fact I've seen that style before, is enough for me at least to not feel comfortable merging.
That is a viewpoint I can understand and respect, even if I don't agree with it. I'd much prefer if something like The Chicago Manual of Style was included in the mathlib style guidelines, but I also understand that this might be a controversial opinion.
Kyle Miller (Aug 26 2025 at 16:41):
@Michael Rothgang Typos aren't style — I don't see a reason to make an exception for fixing mistakes. It's worth clarifying the distinction though: mistakes are wrong, style differences are not.
Kyle Miller (Aug 26 2025 at 16:43):
Michael Rothgang said:
Otherwise, I agree: once a change is controversial (and this one clearly is), it's not worth bikeshedding over.
To go a bit further, I'm saying that we can decide ahead of time that changes made only for style are controversial, as a matter of policy. (Style not set out in the style guide.)
Kyle Miller (Aug 26 2025 at 17:09):
Harald Husum said:
And I really don't think I should feel guilty for raising this discussion, even if you don't share my opinion.
Please don't interpret Jireh as saying that you should feel guilty.
You have to understand that every PR is a burden. Every PR has the capacity to occupy any particular reviewer's attention, even if just to decide whether to look more closely — contributors are implicitly asking all reviewers to take a look. The longer the PR sits in the review queue, waiting for reviewers to come to some decision, the more time is spent on it. We accept the burden, so don't feel guilty, but burden is an appropriate word. (Regarding your point about volition, as a mathlib maintainer Jireh has some responsibility to make sure PRs are addressed.)
For the record, I don't believe Jireh doesn't care about writing style, so we can't ask him not to think about this PR. His saying "it should either be linted or we shouldn't care" is a review in itself.
(I think I should say that this thread isn't necessarily about your PRs per se, so I hope you don't take it personally. Your PRs are a good spark for a broader discussion about style and what should be an acceptable contribution. The way the project is organized is that it can be difficult to make decisions to close PRs, and so it would help to have some guidelines. The size of the mathlib PR queue can be quite daunting.)
Jireh Loreaux (Aug 26 2025 at 18:28):
Harald Husum said:
And I really don't think I should feel guilty for raising this discussion
I'm sorry if my message was interpreted that way, it was not my intent, as I tried to make clear that it is not a criticism of you. And I'm not saying this Zulip discussion isn't worthwhile, only that it consumes a nonzero amount of time.
Kyle's responses accurately summarize the situation (i.e., every PR consitutes some burden, and as a maintainer I have a responsibility to review PRs I'm capable to review.)
Harald Husum (Aug 26 2025 at 21:51):
Summarizing this discussion, I see that there isn't consent to be had. Fair enough, I agree that it isn't worth bike-shedding over, so I've closed the PRs in question.
There's a few things I want to comment on, though, now that I have time.
Jireh Loreaux said:
So either we should choose to care and lint against it, or we should choose to not care and not bother.
This strikes me as a false dichotomy. There are quite a few examples of things we do care about and do bother with, but don't lint against. Implementing the naming convention is one example. Fixing typos and grammar errors is another. Is your opinion that there is something unsatisfactory with these examples inhabiting this in-between state?
Kyle Miller said:
Should we adopt more formally some version of the "don't touch other peoples' style unless it's a style guide violation" principle?
I mean I don't disagree in principle. In matters that are purely about style, I don't feel it is appropriate to override other people's work. The problem is that there's a fairly broad grey-zone between purely stylistic issues and outright grammatical errors. I feel like the PRs i proposed in this thread are nice examples of that.
I proposed the PRs and raised this discussion because, in my mind, the use of "non-zero" and "non-empty" lie closer to "grammatical error" territory than to "stylistic plurality" territory (as opposed to nonnegative vs. non-negative, as discussed in my OP, which truly comes down to style). Having witnessed the resulting discussion here, the takeaway is that the community still deems this an expression of allowable stylistic plurality. This is totally fine; the point I'm making is that this conclusion was, to me, by no means given (and even surprising).
If I'm going to continue improving the documentation of mathlib, which I hope to do, I will, repeatedly, encounter this broad grey-zone in some way or another. If I risk being understood as controversial or provoking every time I end up touching this zone, that would seem off-putting and demotivating to me. I might then err on the side of caution and just not contribute.
In summary, if you want to discourage attempts at purely stylistic changes, I think you should also take on the burden of making absolutely clear what is considered style and what is not. That is, you need to find a way to draw a clear line somewhere in this grey-zone. Can you do that?
Jireh Loreaux said:
And I'm not saying this Zulip discussion isn't worthwhile, only that it consumes a nonzero amount of time.
Thank you for clarifying. I read your initial comment as expressing frustration that this had to be a discussion to begin with, but I now understand that was not your intent.
Kim Morrison (Aug 27 2025 at 00:19):
Harald Husum said:
If I'm going to continue improving the documentation of mathlib, which I hope to do, I will, repeatedly, encounter this broad grey-zone in some way or another.
I think the general advice here should be to aim for substantive documentation: why is this implemented the way it is, how do I use this API, show examples of setting something up, etc. These are sorely sorely lacking all across the library!
Kyle Miller (Aug 27 2025 at 01:47):
Harald Husum said:
if you want to discourage attempts at purely stylistic changes, I think you should also take on the burden of making absolutely clear what is considered style and what is not. That is, you need to find a way to draw a clear line somewhere in this grey-zone. Can you do that?
Ok: it's a matter of style if the variations appear in multiple usages in quality published sources.
Your examples of "non-zero" and "non-empty" might be grammatical errors to some people, but I can easily find plenty of usages in textbooks. Even Wikipedia has an article called "Non-zero", and in Null vector it's got both "non-zero" and "nonzero" in successive paragraphs! Wikipedia is not proof of anything, but Mathlib is a math formalization project, not a project researching correct English usage, so if Wikipedia editors don't see "non-zero" as a mistake, I feel comfortable saying the question is unsettled.
Notice that this makes it hard or nearly impossible to prove that something isn't a matter of style. The burden on an author of a mistake-only PR is to prove that they are grammar mistakes, or at least assume the risk that their PR will be closed. If neither is possible, or if it takes too much reviewer energy to evaluate, I'd avoid it.
Substantive improvements are welcome, like Kim said. There are many many more directions with better yield (in terms of benefit-per-(author + reviewer time)) to pursue at the moment. If that's the thrust of your contributions, I would not worry about gray zones.
Being semiserious, if "non-zero" is bothersome, and you think it's worth the energy, you could conduct a campaign to submit PRs that improve the paragraphs that contain "non-zero", and incidentally use "nonzero" instead. If you hide it across multiple PRs that each stand on their own, no one would be the wiser. :slight_smile: More seriously, this is explicitly allowed in the principle I was suggesting. If you rewrite something, you don't have to preserve the style in which it was written.
Harald Husum (Aug 27 2025 at 08:44):
Kim Morrison said:
I think the general advice here should be to aim for substantive documentation: why is this implemented the way it is, how do I use this API, show examples of setting something up, etc. These are sorely sorely lacking all across the library!
I'm sorry to say I don't think I have the knowledge set to do so effectively. I am not a mathematician and my experience with Lean does not reach beyond completing the NNG, so the best I can really hope for, at least in the near future, is to improve upon preexisting documentation.
You could argue that this is reason for me to refrain from contributing, but my sense is that the sum total of my contributions are for the benefit of mathlib, even if it does imply sometimes proposing changes that will probe the boundary of usefulness.
You could also argue that I should have raised the discussion before creating the PRs, but in this case, where the effort needed to create the PRs is low, I feel it substantially helps the discussion to have the PRs ready to go. Perhaps I could have marked them RFC or WIP, but I didn't really expect pushback on this before I got it.
Kyle Miller said:
Ok: it's a matter of style if the variations appear in multiple usages in quality published sources.
I'm a layman, I don't have access to the vast majority of quality published sources in mathematics.
Kyle Miller said:
Your examples of "non-zero" and "non-empty" might be grammatical errors to some people, but I can easily find plenty of usages in textbooks. Even Wikipedia has an article called "Non-zero", and in Null vector it's got both "non-zero" and "nonzero" in successive paragraphs! Wikipedia is not proof of anything, but Mathlib is a math formalization project, not a project researching correct English usage, so if Wikipedia editors don't see "non-zero" as a mistake, I feel comfortable saying the question is unsettled.
I think this is the better argument. I wasn't aware of the Wikipedia article for instance. My queries on the matter led me to the CMOS and their clear recommendation, and every LLM i queried recommended that I go for "nonzero" and "nonempty" in the context of formal mathematics writing, so I assumed I was on safe ground. Obviously I could have done even more research.
Kyle Miller said:
Notice that this makes it hard or nearly impossible to prove that something isn't a matter of style. The burden on an author of a mistake-only PR is to prove that they are grammar mistakes, or at least assume the risk that their PR will be closed. If neither is possible, or if it takes too much reviewer energy to evaluate, I'd avoid it.
I don't really mind closing PRs, like I did in this case. I want there to be a low bar for making PRs, not necessarily merging PRs. It isn't problematic to me if the only review I get is "this seems like a change that is too stylistic in nature, won't review unless a stronger case is made". I just think it is culturally unhealthy if the act of making the PR in the first place is seen as bothersome. I mean, I won't be making PRs that I myself find to be purely stylistic, but eventually I will propose changes that I think are uncontroversial that do in fact turn out to be controversial. I want that recognized as a cost of doing business, so to speak.
Kyle Miller said:
Being semiserious, if "non-zero" is bothersome, and you think it's worth the energy, you could conduct a campaign to submit PRs that improve the paragraphs that contain "non-zero", and incidentally use "nonzero" instead. If you hide it across multiple PRs that each stand on their own, no one would be the wiser. :slight_smile: More seriously, this is explicitly allowed in the principle I was suggesting. If you rewrite something, you don't have to preserve the style in which it was written.
My ulterior motive for these borderline stylistic changes was actually to clean up the diff of potential PRs doing just this. I'm employing LLMs to clean up mathlib documentation, but LLMs tend to want to fix both grey-zone and non-grey-zone mistakes. My thinking was that if I make preemptive PRs cleaning up all the common grey-zone stuff, I can eventually get diffs that cleanly focus in on more substantive changes, some of which might require careful review. My thinking was that the reviewer burden would be smaller in total if these grey-zone changes were done isolated and in bulk (and only a single time for each class of mistake), rather than intermittently "polluting" more substantive changes in a wider set of PRs.
I take it that you might actually prefer the latter, and that I can totally respect. My intuition just wasn't calibrated for that.
Harald Husum (Aug 28 2025 at 07:36):
I might as well ask for a "pre-review" on the following PRs then. All of these are examples of PRs that can hypothetically be construed as too stylistic, depending on where the line is drawn.
-#28798: doc: prefer "cannot" over "can not"- #28945: doc: hyphenate compound adjectives ending in -dimensional
- #28910: doc: hyphenate compound adjectives ending in -theoretic
Are these PRs I should aim to keep free of merge-conflicts, because they eventually will be reviewed? Or are they considered controversial and distracting, and hence should be closed ASAP?
Any takes are welcome.
Last updated: Dec 20 2025 at 21:32 UTC