Zulip Chat Archive

Stream: general

Topic: Mathlib and copyright ownership


Karl Palmskog (Jul 01 2020 at 18:50):

Given the following tweet, I would be interested in how copyright is handled for Mathlib: https://twitter.com/XenaProject/status/1276985859552825346

What happens usually in open source is that copyright is retained by each author, so the definition in Lean of perfectoid spaces is owned by its authors. For something like Lean code to "never be owned by anyone", the only way I know would be to set up an organization to which all copyright is assigned, like the FSF does with GNU project contributions. Is this what is done currently?

Johan Commelin (Jul 01 2020 at 18:52):

I am far from an expert, but I think Kevin meant that this code can not be put behind a paywall.

Jalex Stark (Jul 01 2020 at 18:52):

"own" is a weird word

Johan Commelin (Jul 01 2020 at 18:53):

Personally, I don't care about copyrights... people can do whatever they want with the code I write.

Simon Hudon (Jul 01 2020 at 18:53):

I believe the authors still retain the ownership in the case of mathlib

Johan Commelin (Jul 01 2020 at 18:53):

If they want to claim that they wrote the code... I don't care.

Jalex Stark (Jul 01 2020 at 18:53):

does ownership mean that they can later unpublish it?

Johan Commelin (Jul 01 2020 at 18:54):

But you are asking about mathlib, not about my opinion. Currently I think all mathlib files (with nontrivial content) have a copyright header that mentions the Apache License.

Karl Palmskog (Jul 01 2020 at 18:54):

Johan Commelin said:

I am far from an expert, but I think Kevin meant that this code can not be put behind a paywall.

The Apache-2.0 license absolutely permits putting copies of code behind a paywall. But I guess you mean something like: "the Lean community will endeavor to always provide this code for download for free"

Karl Palmskog (Jul 01 2020 at 18:55):

Jalex Stark said:

does ownership mean that they can later unpublish it?

Most license dedications can't be revoked. However, they could also license it under other licenses, such as proprietary ones.

Johan Commelin (Jul 01 2020 at 18:55):

Well, our repo is not behind a paywall, and organisations cannot force us to take it down. So the perfectoid repo should in principal always stay around. Free to read for everyone with an internet connection.

Jalex Stark (Jul 01 2020 at 18:56):

so i guess there's a question of whether we can make the legal reality match up better with our interpretation

Simon Hudon (Jul 01 2020 at 18:56):

The other variable @Johan Commelin is for us to not lose interest and abandon the website or delete the account

Jalex Stark (Jul 01 2020 at 18:57):

and karl is suggesting this social technology of forming an org with a mandate to keep providing this code for free

Johan Commelin (Jul 01 2020 at 18:57):

Yup, agreed. Hence the "in principal" (@Simon Hudon)

Karl Palmskog (Jul 01 2020 at 18:57):

Johan Commelin said:

Well, our repo is not behind a paywall, and organisations cannot force us to take it down. So the perfectoid repo should in principal always stay around. Free to read for everyone with an internet connection.

We have discussed this issue in coq-community. GitHub can absolutely shut repos/websites down (whether by policy or accident). Hence, many of our project repos are mirrored on multiple open platforms.

Simon Hudon (Jul 01 2020 at 18:58):

That's a good idea

Johan Commelin (Jul 01 2020 at 18:58):

We have local clones, so we can host it in another place as soon as github acts up

Simon Hudon (Jul 01 2020 at 18:58):

Where do you mirror them?

Johan Commelin (Jul 01 2020 at 18:58):

I guess gitlab and sourcehut are obvious candidates?

Karl Palmskog (Jul 01 2020 at 18:58):

GitLab for one

Simon Hudon (Jul 01 2020 at 19:00):

Is there a mechanism to push everything from the github repos to gitlab or others on a regular basis?

Jalex Stark (Jul 01 2020 at 19:01):

could it be implemented as a GitHub Action?

Bhavik Mehta (Jul 01 2020 at 19:01):

GitLab lets you continuously mirror from github automatically, I think

Jalex Stark (Jul 01 2020 at 19:02):

this mirroring can be done unilaterally, then

Karl Palmskog (Jul 01 2020 at 19:02):

Here's our discussion on backup etc. so far: https://github.com/coq-community/manifesto/issues/76 --- We have also considered platforms like Zenodo: https://github.com/coq-community/manifesto/issues/75

In any case, for the "free availability" goal to hold, I think these issues have to be considered.

Johan Commelin (Jul 01 2020 at 19:03):

The bigger problem is all the "extras" that github offers: issues and PRs. If those get taken down, it's harder to recover the info, because it's not stored in the repo.

Karl Palmskog (Jul 01 2020 at 19:03):

Johan Commelin said:

The bigger problem is all the "extras" that github offers: issues and PRs. If those get taken down, it's harder to recover the info, because it's not stored in the repo.

they can be exported (regularly) via GitHub's API

Johan Commelin (Jul 01 2020 at 19:03):

But is gitlab compatible with it?

Johan Commelin (Jul 01 2020 at 19:04):

Maybe we only want to backup the stuff, and not mirror things like issues. I guess that would quickly become confusing anyway.

Karl Palmskog (Jul 01 2020 at 19:04):

if you have the JSON or whatever, you could always convert it (we did this for imported Bitbucket repos)

Karl Palmskog (Jul 01 2020 at 19:05):

Jalex Stark said:

and karl is suggesting this social technology of forming an org with a mandate to keep providing this code for free

I wasn't suggesting to go the FSF route and force reassignment of copyright for contributions, I'm just saying this is what one has to do to if the goal is to make sure nobody owns the code. In coq-community at least, we don't mind if everybody owns their code themselves, even though it becomes very problematic if one wants to change license (e.g., from GPL2 -> Apache-2.0)

Jalex Stark (Jul 01 2020 at 19:07):

i think it's not a crazy goal; I would guess it is not worth the effort right now, but would not be surprised if it were worthwhile two years from now.

Simon Hudon (Jul 01 2020 at 19:10):

What would we gain?

Karl Palmskog (Jul 01 2020 at 19:11):

I think FSF does it mostly so they can take legal action against violators of the GPL. But Apache-2.0 is such a permissive license that it's hard to violate...

Mario Carneiro (Jul 01 2020 at 19:12):

I would like to move everything to public domain, but I think that's probably not possible because we have too much MSR code in mathlib

Karl Palmskog (Jul 01 2020 at 19:13):

public domain doesn't work in all jurisdictions, the closest we have used in coq-community is the Unlicense: https://unlicense.org (which has an MIT-like fallback)

Jalex Stark (Jul 01 2020 at 19:13):

If you increase the long-term availability of the code, you gain some probability mass on "the work we're doing on mathlib now is appreciated by people 40 years from now." But right now if you want to buy probability on that event, the most efficient thing to do is increase the number of good researchers contributing to mathlib.
("is my work still recognized 40 years from now" is a thing career mathematicians optimize somewhat hard for, I think)

Mario Carneiro (Jul 01 2020 at 19:13):

I mean by PD multilicensed PD / CC0 / unlicense / WTFPL of course :)

Karl Palmskog (Jul 01 2020 at 19:15):

well, that kind of PD at least renders the "ownership" issue moot (anyone can claim they own Mathlib)

Mario Carneiro (Jul 01 2020 at 19:15):

that works for me

Mario Carneiro (Jul 01 2020 at 19:15):

we even tried to sidestep that question pretty hard in the mathlib paper

Mario Carneiro (Jul 01 2020 at 19:17):

I think there are more than enough social institutions for apportioning credit in OSS projects without any need for copyright claims

Simon Hudon (Jul 01 2020 at 19:17):

Jalex Stark said:

If you increase the long-term availability of the code, you gain some probability mass on "the work we're doing on mathlib now is appreciated by people 40 years from now." But right now if you want to buy probability on that event, the most efficient thing to do is increase the number of good researchers contributing to mathlib.
("is my work still recognized 40 years from now" is a thing career mathematicians optimize somewhat hard for, I think)

I don't think that's a major stumbling block. Yes, mathlib is a mathematical artifact but it's also a software artifact. It's hard to find a piece of code that remains relevant for 40 years. The theorems proved today will still be true in 40 years but it is encoded in Lean. Will Lean still be around in 40 years? Maybe the language will be completely different maybe there will be something better

Mario Carneiro (Jul 01 2020 at 19:19):

I think this is a serious concern though, that is not really being currently addressed and will become a bigger problem as mathlib grows

Mario Carneiro (Jul 01 2020 at 19:19):

Lean is absolutely not "40-year proof"

Mario Carneiro (Jul 01 2020 at 19:20):

that doesn't mean it's impossible to have something that is

Jalex Stark (Jul 01 2020 at 19:20):

we'll learn a lot from porting mathlib to lean 4

Mario Carneiro (Jul 01 2020 at 19:20):

lean is notoriously volatile even among contemporary proof assistants

Karl Palmskog (Jul 01 2020 at 19:22):

I have worked on updating 20+ year old Coq code quite a lot, it's hard but usually works out in the end (due to Coq maintainers' compatibility policies)

Johan Commelin (Jul 01 2020 at 19:22):

One thing that I want, is that I can edit mathlib files without having the original author having legal rights about what the file should look like. But I guess that is not a problem with our current setup.

Karl Palmskog (Jul 01 2020 at 19:23):

the main value from 20+ year old code are definitions and theorem statements... if a development has been partitioned properly so that there aren't gigantic proof scripts for a single lemma, it usually works out. Paulson makes similar points about Isabelle here (when talking about porting from HOL Light)

Jalex Stark (Jul 01 2020 at 19:24):

i imagine that when lean stops being the best choice for developing math libraries, a fork of mathlib will be translated into a better language

Mario Carneiro (Jul 01 2020 at 19:25):

Metamath was definitely built to be "archival quality"; the spec has remained essentially unchanged from the 1991 version although it has grown in several backwards compatible ways

Mario Carneiro (Jul 01 2020 at 19:27):

The idea that your proofs will outlive you is part of the ethos

Jalex Stark (Jul 01 2020 at 19:27):

maybe all we need for lean proofs to survive is to be able to compile them down to metamath

Simon Hudon (Jul 01 2020 at 19:27):

I don't suppose we can expect the proofs to outlive the community.

Mario Carneiro (Jul 01 2020 at 19:27):

why not?

Jalex Stark (Jul 01 2020 at 19:28):

say everything in mathlib was compiled to metamath, and then everyone forgot about the existence of lean

Mario Carneiro (Jul 01 2020 at 19:28):

metamath proofs may not live on as metamath proofs necessarily, but the idea is that they will be forward ported in perpetuity

Simon Hudon (Jul 01 2020 at 19:28):

Especially with Lean, we've kept the code alive but adapting it through the myriad of breaking changes

Jalex Stark (Jul 01 2020 at 19:29):

then in the development of the next general mathematics library, you make heavy use of a mm_library_search

Jalex Stark (Jul 01 2020 at 19:30):

this is like relying on details checked by elders in ordinary maths

Mario Carneiro (Jul 01 2020 at 19:31):

@Simon Hudon The problem with this method is that it is very labour intensive and as soon as people decide that porting is no longer worth the effort, it dies

Mario Carneiro (Jul 01 2020 at 19:32):

having a stable "fallback" language is good for offloading developments where you don't care to upkeep them at that level

Simon Hudon (Jul 01 2020 at 19:33):

That's true but not many software project survive the lack of maintainers. We might want to get to a point where the breaking changes are less common or we track master less closely but it will have to keep being maintained

Mario Carneiro (Jul 01 2020 at 19:34):

I think you are highlighting the importance of a Project Gutenberg for formal proof

Jalex Stark (Jul 01 2020 at 19:36):

is a "Project Gutenberg for formal proof" an archivist who keeps the code runnable without developing it?

Mario Carneiro (Jul 01 2020 at 19:37):

right

Jalex Stark (Jul 01 2020 at 19:40):

do you prefer keeping code runnable to translating the code-base into metamath and then letting it bitrot?

Mario Carneiro (Jul 01 2020 at 19:41):

I imagine that the latter is less labor intensive because keeping a complicated program running on newer hardware can be challenging

Jalex Stark (Jul 01 2020 at 19:43):

for some appropriate definition of metamath C, is "reimplement the prover in metamath C" enough to translate all of the proofs to metamath?

Mario Carneiro (Jul 01 2020 at 19:46):

The main claim for reliability of metamath on future hardware is that it is really simple to write from scratch and this has been done over 20 times in almost every major language

Mario Carneiro (Jul 01 2020 at 19:46):

Metamath C is a bit of vaporware of my own devising, not to be confused with metamath :)

Jalex Stark (Jul 01 2020 at 19:47):

yes, I did mean to ask about the vaporware. the question is "do you think it's possible to write a language with the specified property?"

Mario Carneiro (Jul 01 2020 at 19:48):

yes, the goal was to outline what I think I can feasibly get done in ~1 year when my dissertation comes due

Mario Carneiro (Jul 01 2020 at 19:50):

(but that's a bit off topic for this thread, feel free to continue at #Program verification)


Last updated: Dec 20 2023 at 11:08 UTC