Zulip Chat Archive

Stream: general

Topic: Authors:


Scott Morrison (Mar 18 2021 at 06:46):

I noticed lots of variations in the Authors: line:

  • sometimes Authors: , sometimes Author:
  • sometimes and instead of ,
  • sometimes a trailing .
  • sometimes variable spacing

I could easy normalize all these with a few greps. Waste of time or useful cleanup? First to answer gets to decide.

Damiano Testa (Mar 18 2021 at 07:26):

I like Author: [1]. if it is only one. Otherwise, my preference would be Authors: [1], [2],..., [n-1] and [n].!

Damiano Testa (Mar 18 2021 at 07:27):

(I was quite tense, while answering, wondering if someone else would beat me to set the golden standard for the Authors tab.)

Johan Commelin (Mar 18 2021 at 07:29):

I think we should standardize and lint against it. The format should preferably be easy to parse by a script.

Johan Commelin (Mar 18 2021 at 07:30):

Damiano's proposal is the best from a natural language point of view, but of course it means that scripts will need to consider some edge cases

Damiano Testa (Mar 18 2021 at 07:31):

I would also be happy removing the and in the multiple authors case. I still feel that the singular/plural distinction is nice, though.

Johan Commelin (Mar 18 2021 at 07:31):

Author(s): :see_no_evil:

Damiano Testa (Mar 18 2021 at 07:31):

still, unless we are envisaging people's names including a comma, that may not be hard to regex, right?

Damiano Testa (Mar 18 2021 at 07:32):

Author(s): :see_no_evil:

You did well covering your ears... :stuck_out_tongue_wink:

Damiano Testa (Mar 18 2021 at 07:34):

An alternative would be to use by [author(s) name(s)].

Scott Morrison (Mar 18 2021 at 07:40):

My preference is:

Authors: Name1, Name2, Name3

That is, always Authors: , only connector , , no trailing punctuation.

Scott Morrison (Mar 18 2021 at 07:40):

Trivial to check with a linter, trivial to scrape from. :-)

Scott Morrison (Mar 18 2021 at 07:41):

Let me argue against allowing an Authors: vs Author: distinction: it might make someone feel disinvited to edit an existing file, because it is labelled as "belonging to" a single person.

Damiano Testa (Mar 18 2021 at 07:41):

The linter would check for no punctuation at the end as well? Should names end in a character?

Scott Morrison (Mar 18 2021 at 07:42):

I would further explicitly disallow and in the line.

Damiano Testa (Mar 18 2021 at 07:42):

For instance, if I write a period (or a comma, [shudder]) at the end of the author's list, would the linter complain?

Scott Morrison (Mar 18 2021 at 07:43):

Yes :-) Otherwise when we scrape authors we're going to have to have a line that normalizes Damiano Testa. and Damiano Testa to the same person.

Damiano Testa (Mar 18 2021 at 07:43):

I like the inclusive point of view for the Authors field, so you are gaining me, slightly, on that one! :wink:

Damiano Testa (Mar 18 2021 at 07:45):

Even though I never added myself to an existing author(s)'s list.

Eric Wieser (Mar 18 2021 at 07:45):

Should we be scraping authors with doc-gen, or are we deliberately omitting them from the web docs?

Johan Commelin (Mar 18 2021 at 07:47):

If we are bikeshedding on this format, can we replace Authors by something that communicates better that it isn't just the list of people that have made (trivial) edits to the file, but the set of people that are somehow "responsible" for the file.

Damiano Testa (Mar 18 2021 at 07:47):

Designers?

Damiano Testa (Mar 18 2021 at 07:47):

Maintainers?

Johan Commelin (Mar 18 2021 at 07:48):

Maybe Principal authors?

Johan Commelin (Mar 18 2021 at 07:48):

Maintainers sounds too much like it has to be a subset of the mathlib maintainers, maybe

Kevin Buzzard (Mar 18 2021 at 07:48):

I thought that the idea was that the first list was the people to contact if things went wrong and the second list was people who contributed

Kevin Buzzard (Mar 18 2021 at 07:49):

One is authors, one is copyright or something

Damiano Testa (Mar 18 2021 at 07:50):

Ah, Kevin, I had not thought about the possibility of having different names in the two places, but I like it!

So, one field could be Contributors.

Kevin Buzzard (Mar 18 2021 at 07:51):

My impression is that currently the copyright is just the first group of people who made the file

Damiano Testa (Mar 18 2021 at 07:52):

Yes, I have certainly edited existing files, but I have never added myself to any named list, since I do not consider myself an author. I would have considered myself a contributor, though.

Johan Commelin (Mar 18 2021 at 07:52):

https://leanprover-community.github.io/contribute/style.html#header-and-imports

Johan Commelin (Mar 18 2021 at 07:52):

Regarding the list of authors: we don't have strict rules on what contributions qualify for inclusion there. The general idea is that the people listed there should be the ones we would reach out to if we had questions about the design or development of the Lean code.

Johan Commelin (Mar 18 2021 at 07:53):

In principal we have the git log to track who contributed what.

Johan Commelin (Mar 18 2021 at 07:53):

Although splitting files into pieces can destroy some of that info

Johan Commelin (Mar 18 2021 at 07:54):

But splitting files also means you don't know which 2 lines Bobby Alice contributed to the 2500 lines file that you are splitting.

Johan Commelin (Mar 18 2021 at 07:54):

So you still don't know to which new file they should be added to the contributor list

Damiano Testa (Mar 18 2021 at 07:54):

The linked guidelines, do not seem to allow different lists of people, right?

Well, splitting files would also mean having to understand how to split/duplicate contributors.

Johan Commelin (Mar 18 2021 at 07:55):

Right, whereas with a short list of designers/maintainers/principal authors this will typically be straightforward

Damiano Testa (Mar 18 2021 at 07:56):

I think that, when splitting, I would be happy with every name being duplicated in the copies.

Scott Morrison (Mar 18 2021 at 07:58):

I think we should mostly ignore the copyright line, and keep that as simple as possible. Perhaps just always a single person, whoever makes the PR.

Damiano Testa (Mar 18 2021 at 08:00):

I support this, also since, once you split the file, it is not clear to me how the copyright would work/propagate/change.

Scott Morrison (Mar 18 2021 at 08:01):

If we do want to change the Authors: prefix, so far my favourite suggestion is Principal authors: , but I would prefer to leave it as is, and just deal with having to explain the meaning --- this doesn't seem to come up much?

Damiano Testa (Mar 18 2021 at 08:02):

For me, it came up once, and Johan explained to me that those were the people who would be contacted in case of serious issues/doubts with the file.

Damiano Testa (Mar 18 2021 at 08:02):

Hence, I never added myself to an existing authors' list!

Mario Carneiro (Mar 18 2021 at 08:52):

The rule I have followed for the authors field is that it lists the people who have made nontrivial additions to the file. This does not necessarily imply maintainership over that material, although they would probably be good people to talk to about design decisions in the file

Mario Carneiro (Mar 18 2021 at 08:52):

The copyright notice contains the (chronologically) first author to work on the file, who is also listed as the first author in the list. Authors are never removed even if their material is edited to unrecognizability

Eric Wieser (Mar 18 2021 at 09:00):

I've removed authors when splitting sometimes, when it's clear that the entire part split out was written by just one author

Mario Carneiro (Mar 18 2021 at 09:01):

I would say "use your best judgement" for splitting, there aren't any hard rules. I would say that you should at least ensure that the union of the authors of the split files is a superset of the old file

Gabriel Ebner (Mar 18 2021 at 09:06):

Personally, I don't think the author information in the copyright header is useful in any way. On the contrary, it is even actively misleading: it's not the people who wrote the file, just the original author. It's not the people who you should contact when changing the file, because the original author has been long gone. It's also not the people who are the copyright holders.
I would support a mathlib-wide change to remove individual author names from files:

/-
Copyright (c) 2021 The mathlib community. All rights reserved.
Released under Apache 2.0 license as described in the file LICENSE.
Authors: The mathlib community
-/

Of course this is easy to say for me, as my name is only on 14 files. There is something to be said for keeping the authors field to reward contributors.

Damiano Testa (Mar 18 2021 at 09:07):

Gabriel, your suggestion is in line with the Bourbaki approach!

Eric Wieser (Mar 18 2021 at 09:08):

I'd be in favor of dropping the author names from the copyright, although I feel like that type of change requires contacting every existing author and asking them to relinquish copyright on those files

Johan Commelin (Mar 18 2021 at 09:09):

I am very strongly in favour of Gabriel's proposal

Damiano Testa (Mar 18 2021 at 09:10):

Maybe the names that appear now in the various files could be collected to a single "Contributors" file, and any form of contribution would be recorded as a name (without duplication) in that file?

Damiano Testa (Mar 18 2021 at 09:12):

My personal experience has been that, when I contributed a new file, I have received a lot of help here on Zulip and I felt that I would have wanted to acknowledge that help. If I knew that the names of the people who helped me already appear in a list of contributors and that it would not be my sole name would be associated with the specific file in question, that would seem to reflect better the situation.

Eric Wieser (Mar 18 2021 at 09:13):

@Damiano Testa, at that point there's little point having the file at all, since it can be easily constructed from the git history

Damiano Testa (Mar 18 2021 at 09:15):

Eric, you are correct, but I am not sure how to do that, and it is kind of nice to have an explicit list of people's names appearing somewhere.

There could also be a situation of a "non-corresponding author", where a team works on a project, but only one person commits to mathlib. It would be nice to be able to include the "ghost-coders"!

Eric Wieser (Mar 18 2021 at 09:16):

Damiano Testa said:

Eric, you are correct, but I am not sure how to do that, and it is kind of nice to have an explicit list of people's names appearing somewhere.

https://leanprover-community.github.io/mathlib_stats.html I think is already such a place

There could also be a situation of a "non-corresponding author", where a team works on a project, but only one person commits to mathlib. It would be nice to be able to include the "ghost-coders"

Ideally those people would be included via a "Co-authored-by" line in the commit message.

Damiano Testa (Mar 18 2021 at 09:16):

Btw, this is also similar to the Stacks Project approach:

https://stacks.math.columbia.edu/contributors

Eric Wieser (Mar 18 2021 at 09:17):

If one of the goals of the "authors" section is to help users find appropriate reviewers, one option would be to extract all the current "author" comments into a github CODEOWNERS file, which automatically requests those users to review the files when they're edited

Scott Morrison (Mar 18 2021 at 09:57):

Much as I like Gabriel's suggestion, I'm not sure it's a good idea to go and rewrite all the copyright lines in old files.

Scott Morrison (Mar 18 2021 at 09:57):

I wish it were.

Scott Morrison (Mar 18 2021 at 10:07):

Excellent radical suggestions to end copyright as we know it notwithstanding:

#6749 for the normalization
#6750 for the corresponding lint update

Julian Berman (Mar 18 2021 at 13:07):

Eric Wieser said:

I'd be in favor of dropping the author names from the copyright, although I feel like that type of change requires contacting every existing author and asking them to relinquish copyright on those files

you don't need the header for copyright purposes, and removing it doesn't relinquish copyright (IANAL etc.)

Julian Berman (Mar 18 2021 at 13:13):

Unless that meant you explicitly want folks to relinquish copyright. Otherwise putting Copyright foo is an artifact from pre-Berne convention times where there were countries you needed to explicitly assert copyright to get the rights. Now essentially every country is party to either it or a similar agreement -- you get copyright implicitly by writing the code, and each person who writes some sublines of code in each file has copyright over their portions, you don't need to assert it explicitly.

Johan Commelin (Mar 18 2021 at 13:27):

The dutch distinguish between "author rights" and "copy right". I've always thought this is very natural.
I don't want people to claim that they wrote stuff that I wrote. I maintain author rights.
But I really can't be bothered by copy rights. Just do what the heck you want with stuff that I wrote.

Julian Berman (Mar 18 2021 at 14:13):

That seems likely similar. What other folks (besides the copyright holder) can do with the thing is governed by the license(s) granted by the copyright holders, so for mathlib that's an Apache 2.0 license for other people, which is permissive.

Andrew Ashworth (Mar 18 2021 at 16:19):

In Linux, they have a CREDITS file for people who contribute but are not maintainers. The only name in a file is the maintainer - whoever is responsible if there is an issue.


Last updated: Dec 20 2023 at 11:08 UTC