Zulip Chat Archive
Stream: general
Topic: Copyright header
Yaël Dillies (Sep 17 2021 at 16:15):
What should I do about the copyright header of a file that I rewrote entirely? It's still about the same thing, but there's virtually nothing in common. Here it is: https://github.com/leanprover-community/mathlib/pull/7987/files#diff-59e4f73ee14640e1411e17ea4b5c962c39ee4b462e1055f0e298755b27629fee
Johan Commelin (Sep 17 2021 at 16:24):
Personally, I would just add my name to the author list.
Chris B (Sep 17 2021 at 16:26):
The guidance in the docs is "the people listed there should be the ones we would reach out to if we had questions about the design or development of the Lean code". Legally speaking, this is very low-stakes.
Yaël Dillies (Sep 17 2021 at 16:28):
Yeah... and in that case Scott won't be more knowledgeable than anybody else because virtually nothing of what he did will be left.
Johan Commelin (Sep 17 2021 at 16:31):
I had the impression that Author lists were somewhat monotonous
Johan Commelin (Sep 17 2021 at 16:31):
But I don't have strong feelings about that myself
Scott Morrison (Sep 17 2021 at 20:37):
Personally I have no objection to being removed from author lists when a substantial rewrite has happened, but to avoid treading on toes I think we should err on the side of monotonicity.
Yaël Dillies (Sep 17 2021 at 20:38):
I'll let you decide on that one, Scott :smile:
Filippo A. E. Nuccio (Sep 20 2021 at 07:03):
IMHO removing someone either from the author list or from the copyright header is a bit harsh. I would prefer to only have new authors added to both lists.
Eric Wieser (Sep 20 2021 at 07:41):
Sometimes its the obvious choice when splitting a file in two, and the new piece is the only part of the original that was written by the author in question
Yaël Dillies (Apr 27 2022 at 22:18):
Is this considered acceptable? https://github.com/leanprover-community/mathlib/pull/13315/files/56c3894f03edba04ef988e7922b148bcc1eeefda..77defe2449f65cc7a4ff5cbe375cc135284401c6#r860286198
Eric Wieser (Apr 27 2022 at 22:29):
AFAIK, Copyright dates should usually just be the date of earliest copyright, the extension to the present is usually implied. Certainly we don't want to have to be keeping track of the second date in these files
Yaël Dillies (Apr 27 2022 at 22:30):
And should the authors be in chronological order as well?
Eric Wieser (Apr 27 2022 at 22:31):
Copyright owners or authors?
Yaël Dillies (Apr 27 2022 at 22:31):
Sorry, copyright owners
Eric Wieser (Apr 27 2022 at 22:32):
Who knows, I don't really think individual copyright makes much sense for mathlib at a file level anyway
Eric Wieser (Apr 27 2022 at 22:32):
Stuff moves and changes too often
Eric Wieser (Apr 27 2022 at 22:37):
"Copyright Mathlib contributors" would be a lot simpler, although making such a change would possibly require contacting every current contributor and asking them to relinquish whatever claim they think the header gives them
Jireh Loreaux (Apr 27 2022 at 23:31):
That list is still relatively small. It might be worth making that kind of change now as opposed to later when it may be much more difficult.
Arthur Paulino (Apr 27 2022 at 23:45):
I think some people were just following the standard without much attachment to such claims. To me, git blame is what really counts when it comes down to the idea of ownership and responsibility over a piece of code
Kyle Miller (Apr 28 2022 at 01:02):
My understanding of copyright notices in the United States (and all Berne Convention countries) is that they are optional since authors get automatic copyright for a work. It seems that the only reason to include them still is that copyright notices can apparently be useful in infringement cases since the defendant can't claim they weren't aware of the copyright status. If you do place a notice, you're supposed to use the date of first publication, but each additional edition gets its own copyright date, and these can all be listed. I don't know how that applies to mathlib, since each merged PR could potentially be considered a new edition for the entire work.
Kyle Miller (Apr 28 2022 at 01:02):
There's a concept of joint authorship, where all contributing authors are co-owners of a single copyright over the entire work. Presumably mathlib would fall into this category, since everyone who contributes knows their code will eventually be refactored beyond recognition.
Kyle Miller (Apr 28 2022 at 01:03):
If mathlib is indeed a joint authorship work, maybe what makes sense is to do what @Eric Wieser suggested and say "Copyright mathlib contributors" per file and have a top-level file with the names of everyone who's ever contributed, to define what "mathlib contributors" means. That information might all be in the commit history, but it's not necessarily with full names.
Kyle Miller (Apr 28 2022 at 01:16):
Eric Wieser said:
possibly require contacting every current contributor and asking them to relinquish whatever claim they think the header gives them
Given that copyright is automatic (so the header doesn't give any additional rights) and that the risk of litigation has to be very small, this doesn't seem like it's necessary, though if anyone thinks it would be a nice gesture to contact everyone who's ever contributed, then why not. (The standard disclaimer: I'm not a lawyer.)
Vincent Beffara (Apr 28 2022 at 07:50):
The easiest would be to completely remove copyright headers from individual files, and put a global mention in a readme or similar for the whole project (perhaps mentioning the list of contributors). Copyright headers make (some) sense if there is a possibility that one file gets copy-pasted into some other project, but given the inter-dependencies in mathlib there is little chance that this would occur.
To add to the whole issue, in some countries (e.g. France) it is just not possible to relinquish one's moral rights on a work ...
Kalle Kytölä (Apr 28 2022 at 08:23):
I agree! (Without any expertise or insight, however. And I am not actively advocating for any change, just sharing my feelings about it.)
Listing the authors of a file seems to make sense for various reasons. But for the copyright, it seems natural to have the more global view that it applies to the whole mathlib and is held by the mathlib contributors. The license determining how the content can be used is anyway what it is, and sufficiently permissive.
(The only hypothetical downside I see in attributing the copyright to all contributors is if there ever would be some situation in which a unanimous agreement of all copyright holders would be needed, in which case it would not be feasible to obtain.)
Jireh has a point that if there ever is a time when we might want to contact all copyright holders to ask, now is strictly better than any time later. Later there will be more names to contact, and some of their contributions will be further in the past.
Kyle Miller (Apr 29 2022 at 01:17):
Here's a concrete proposal. For all new files in mathlib, we start using this copyright header :
/-
Copyright (c) 2017-2022 Mathlib contributors. All rights reserved.
Released under Apache 2.0 license as described in the file LICENSE.
Authors: David Hilbert, Emmy Noether
-/
The Apache license defines a "contributor" as anyone from whom a "contribution" has been received. If we want, we can have a NOTICE file (which licensees are required to redistribute when copies or derivative works are made) that includes a list of everyone who has contributed. (Edit: using the NOTICE file like this is probably not a good idea since it's meant to include only legally required notices. A CONTRIBUTORS file would be better.)
Whenever a file is updated, we can try to remember to bump up the copyright year, but as far as I can tell it doesn't really matter legally. I figure we should leave the 2017 constant across all files since that's the first edition of mathlib. (As I understand it, the copyright holders of Lean were involved when mathlib was split off from core Lean, so they were able to create a new derivative work with a new copyright -- normally Apache licensees would need to include the original copyright notice.)
Kyle Miller (Apr 29 2022 at 01:18):
One function of this copyright header is that it makes it clearer to contributors that its copyright is owned under "joint authorship" (which is recognized, but not exactly defined, by the Berne Convention). In the US, someone is considered to be a joint author of a work if (1) they made a copyrightable contribution and (2) all contributors intend for their contributions to be merged into a unitary whole. Note that this is not something that you declare ahead of time: if it comes to it, joint authorship is something established in court. Apparently joint authors can, without permission of the other authors, individually give out licenses to use the work. But, if we ever wanted to transfer copyright ownership of mathlib to some organization, like a mathlib foundation, we would need unanimous agreement.
Kyle Miller (Apr 29 2022 at 01:27):
The Apache license is very silent about how exactly the copyright for mathlib is owned. Certainly everyone who has contributed anything to mathlib has some copyright ownership of at least part of mathlib due to copyright law -- and this is true no matter what the copyright headers may or may not say now or in the future.
For pre-existing copyright headers, we can switch them out with this new header as we get around to it during refactorings. If we want to be careful, we can check with everyone whose name is listed in a copyright header that they intended their work to be part of a unitary whole and put their names into the CONTRIBUTORS file. It seems implausible to me, though, that there's anyone who somehow contributed code to mathlib and got through the PR process without getting the impression that the point of the library is to be a unitary whole.
Chris B (Apr 29 2022 at 02:39):
Kyle Miller said:
One function of this copyright header is that it makes it clear to contributors that its copyright is owned under "joint authorship" (which is recognized, but not exactly defined, by the Berne Convention). In the US, someone is considered to be a joint author of a work if (1) they made a copyrightable contribution and (2) all contributors intend for their contributions to be merged into a unitary whole. Note that this is not something that you declare ahead of time: if it comes to it, joint authorship is something established in court. Apparently joint authors can, without permission of the other authors, individually give out licenses to use the work. But, if we ever wanted to transfer copyright ownership of mathlib to some organization, like a mathlib foundation, we would need unanimous agreement.
I think you're making the (very reasonable) mistake of putting too much faith in a plain-language interpretation of the law; in my opinion mathlib is more likely to constitute a collective work than a work of joint authorship. The case law around joint authorship is pretty complex, and to my knowledge there haven't been any landmark cases about joint authorship for large open source software projects. Aalmuhammed v. Lee is still one of the big ones for joint authorship, and I'm pretty confident most mathlib contributors would not satisfy the standard they set out for a finding of joint authorship. The point about not being able to establish joint authorship ahead of time is not correct though, courts have made it pretty clear that a contract will suffice as proof of joint authorship unless there's a finding that the party contributed literally nothing, so if you want to try and force joint authorship you can collect CLAs to that effect, but you probably don't want drive-by contributors to be joint authors.
IMO if the maintainers are interested in actually securing the copyrights in the US, they should assign them to a single entity and let the entity register the work with the copyright office when there's a major version bump. Under Title 17, you can only pursue a claim if the work has been registered, and if you don't register within three months of publication you lose the ability to recoup statutory damages and attorneys fees.
Last updated: Dec 20 2023 at 11:08 UTC