Zulip Chat Archive

Stream: new members

Topic: Starting to contribute to mathlib

Chris M (Aug 13 2020 at 02:04):

I've been doing Lean tutorials in my spare time for the past couple of weeks. I’d like to start taking steps to contribute to mathlib, and I’ll probably have about 1 hour per day to spare for this for now.

Firstly, my understanding is that the Lean community is interested in formalizing mathematics, and not so interested in formalizing theoretical computer science. Is my impression correct? (though I’ve also seen computability theory as a topic mathlib).

Regarding math, there are topics that I’m interested in and topics I’m not that interested in: I’m interested in both applied and pure math, but within pure math I’m mostly interested in topics that are foundational for parts of computer science, machine learning, and parts of physics. Also, if it’s more “fundamental”, I’ll be more motivated to work on it. But it doesn’t have to be exotic, and in fact I’d prefer to do quite basic things in Lean.

Looking at the Missing undergraduate mathematics in mathlib page, some topics that I’d broadly be interested in working on within Mathlib are:
Linear algebra
Topology
Measure theory
Multivariable calculus and differential geometry
Group theory
Topics that are not on that list but that I’d still be interested in working on in Lean, are dynamical systems, theory of computation, high dimensional geometry, and parts of mathematical physics.
I’m not really currently interested in topics such as: number theory, complex analysis (or mostly complex numbers generally), ring theory, representation theory.

By the way, I’ve started an earlier topic a number of weeks ago, when I hadn’t gone through the Lean tutorials yet (mostly #mil and #TPIL), and this message basically supersedes that one.

It seems to me that it would be good to start working within Lean on broadly one area so I build knowledge of how that area is formalized in Lean. So before picking my first problem it seems good to think about which area would be the best fit for me given my interests and given how doable it is to work on within Lean, etc.

So it seems like it would be good to think about on the one hand, which area I should start with and on the other hand, which initial problem(s) to try. Based on the topics that are written in Missing undergraduate mathematics in mathlib, my initial guess is that linear algebra or group theory would be the best fit, but people on Zulip chat seemed to have strong opinions about what are and aren’t good topics to start with, so I’m curious about people’s thoughts.

Mario Carneiro (Aug 13 2020 at 02:17):

We're currently underrepresented in CS. I don't think the lack of CS formalizations is a deliberate choice

Alex J. Best (Aug 13 2020 at 02:22):

I agree with Mario, I think many people here would be interested in formalizing CS, and there are many people interested in program verification of course (have you seen the program verification stream? I think its an optional one you have to subscribe to separately). I think linear algebra is a great area to start in, for instance we only got the Cayley-hamilton theorem recently, so it seems there are a lot of things in that area that can be done but haven't yet.

Scott Morrison (Aug 13 2020 at 05:38):

Yes! Perhaps a good step is to pick a theorem that you'd guess is within reach, and ask for advice here. People will probably pile in giving suggestions about how to go about it (and possibly pointing out pitfalls!).

Scott Morrison (Aug 13 2020 at 05:39):

Generally, when you're getting started it's best to pick something that needs more theorems, not more definitions.

Scott Morrison (Aug 13 2020 at 05:39):

Definitions (perhaps counterintuitively) are hard.

Scott Morrison (Aug 13 2020 at 05:40):

(So while eigenstuff is a gaping hole in the linear algebra library, for someone just getting started there it would be very important to get help / review of even the basic definitions before going too far.)

Kevin Buzzard (Aug 13 2020 at 07:14):

Yeah lemme echo Scott. Your first job is to isolate a precise mathematical statement or proof which you want to work on rather than just thinking about a general area. Then (esp if it involves new definitions) you should ask for advice on how to formalise any relevant intermediate statements

Chris M (Aug 14 2020 at 03:30):

Kevin Buzzard said:

Yeah lemme echo Scott. Your first job is to isolate a precise mathematical statement or proof which you want to work on rather than just thinking about a general area. Then (esp if it involves new definitions) you should ask for advice on how to formalise any relevant intermediate statements

Yes it was definitely my intention to isolate a precise problem to work on. Regarding this, it seems that the (missing undergraduate math in mathlib)[https://leanprover-community.github.io/undergrad_todo.html] list is quite broad. i.e. it doesn't list pre-stated theorems that need proving, but mostly definitions and general topics. E.g. if I look at Linear algebra, there doesn't seem to be listed a clear problem, except perhaps Schur's lemma, though this seems too advanced.

Can we find a simple problem in linear algebra (preferably not related to matrices)?

Heather Macbeth (Aug 14 2020 at 03:47):

Did you read through all the suggestions on your previous thread? For example, what about @Sebastien Gouezel 's suggestion of proving that partial derivatives commute?

Chris M (Aug 14 2020 at 03:58):

Heather Macbeth said:

Did you read through all the suggestions on your previous thread? For example, what about Sebastien Gouezel 's suggestion of proving that partial derivatives commute?

I did, but I didn't explain the topics that I'm interested in in that post. I think proving that partial derivatives commute (or another suggestion by Gouezel, that a C^k function is approximated by its k-th Taylor series up to o(norm(h) ^ k) ) is a possibility, but it's currently not the most interesting topic to me. But yes it's a possibility. Another topic from that thread: "Commuting operators are simultatneously diagonalizable" is a possibility, though it seems complicated (I'd prefer to not deal with matrices, i.e. stay at a more abstract level, but it's a possibility).

Chris M (Aug 14 2020 at 04:04):

By the way, would it make sense for me to just grab a linear algebra textbook and look for theorems that way? Though I can imagine that a lot of it will already be formalized.

Heather Macbeth (Aug 14 2020 at 04:14):

I suspect that you would be better off picking a subject that's more obscure than linear algebra. The low-hanging fruit tends to get done faster in fields (like linear algebra) which a lot of people know well.

Chris M (Aug 14 2020 at 04:16):

Alex J. Best said:

I think linear algebra is a great area to start in, for instance we only got the Cayley-hamilton theorem recently, so it seems there are a lot of things in that area that can be done but haven't yet.

Oh, the reason I thought Linear algebra might be good is that Alex J. Best said that it is a good area to start. I'm not wedded to it though.

Heather Macbeth (Aug 14 2020 at 04:17):

As a random example from the "Missing undergrad math" list (chosen on the idea that if you know measure theory then you might know what a Hilbert space is), how about the Hilbert projection theorem?

Chris M (Aug 14 2020 at 04:21):

Yes I know what a Hilbert space is, and this theorem is interesting to me. I can't judge whether it'd be easy for me to formalize in Lean, but the informal math of it seems within reach.

Heather Macbeth (Aug 14 2020 at 04:21):

Alex was talking about the eigenvalues/spectral theory side of linear algebra, which is kind of a special case. This is indeed missing from mathlib, but Scott already mentioned the reasons this might not be ideal for a beginner:

Scott Morrison said:

(So while eigenstuff is a gaping hole in the linear algebra library, for someone just getting started there it would be very important to get help / review of even the basic definitions before going too far.)

Heather Macbeth (Aug 14 2020 at 04:23):

Chris M said:

Yes I know what a Hilbert space is, and this theorem is interesting to me. I can't judge whether it'd be easy for me to formalize in Lean, but the informal math of it seems within reach.

If you're interested, I think it's a reasonable thing to try. The point is that it satisfies the guidelines outlined by Scott and Kevin: the definitions are already in mathlib, so you could get straight to the theorem itself.

Chris M (Aug 14 2020 at 04:24):

Ok, I'm happy to just start with this one. :). I'd also be interested in collecting a list of problems to work on after this btw. :)

Heather Macbeth (Aug 14 2020 at 04:26):

Well, you could work straight down the Hilbert spaces list from there: orthogonal projection, Riesz representation, ... :)

Junyan Xu (Aug 14 2020 at 16:05):

I want to use a generalization of https://github.com/leanprover-community/mathlib/blob/0e5f44b078b085c0bb3a999b6adbe7b8997877ba/src/algebra/archimedean.lean#L43 that replaces 1<x by 1≤x, which amounts to change not_lt_of_gt by not_le_of_gt in the proof. Should I submit a PR?

Kevin Buzzard (Aug 14 2020 at 16:08):

Good catch! Make a branch and change the lemma and see what breaks; fix everything and I'm sure the PR will be welcome.

Junyan Xu (Aug 14 2020 at 16:11):

I tested it locally and this works 图片.png

Kevin Buzzard (Aug 14 2020 at 16:19):

But the question is how much more of mathlib did you break with this change :-)

Kevin Buzzard (Aug 14 2020 at 16:20):

If you can fix it, you should PR it. Oh! Is this lemma not used anywhere? You're in luck :-)

Junyan Xu (Aug 14 2020 at 16:26):

Kevin Buzzard said:

Oh! Is this lemma not used anywhere? You're in luck :-)

Seems so! searched the repo and didn't find any usage, only its definition. It's also not labeled [simp] or something that can be used implicitly. I don't know yet how to verify/(compile?) the whole mathlib, but it's possible I need to do that in the future...

Kevin Buzzard (Aug 14 2020 at 16:27):

I would just make the PR, I think you're good to go

Reid Barton (Aug 14 2020 at 16:30):

What is nea? Is it supposed to be near?

Junyan Xu (Aug 14 2020 at 16:36):

Reid Barton said:

What is nea? Is it supposed to be near?

because I imported the original file I temporarily changed the name to avoid conflict.

Junyan Xu (Aug 14 2020 at 16:58):

Another problem I came across is that
https://github.com/leanprover-community/mathlib/blob/c25280031da754d1731b31e81426ab2f8fe218d0/src/algebra/ring/basic.lean#L584
and https://github.com/leanprover-community/mathlib/blob/c25280031da754d1731b31e81426ab2f8fe218d0/src/algebra/ring/basic.lean#L624
have exactly the same statement. How do we decide which name and proof are better and de-duplicate them? (Both are used, one right below, the other in algebra/group_power.)

Mario Carneiro (Aug 14 2020 at 16:59):

We should have a lint for this

Mario Carneiro (Aug 14 2020 at 17:00):

I would keep mul_self_sub_mul_self (both name and proof)

Kevin Buzzard (Aug 14 2020 at 17:07):

We ran into one of these yesterday at Xena: something like mul_left_cancel_iff = mul_right_inj (with the same proof) in algebra.groups.defs (lines 132, 140)

Mario Carneiro (Aug 14 2020 at 17:32):

I think that there are some deliberate aliases along those lines though

Eric Wieser (Aug 14 2020 at 19:39):

Can the linter require the deliberate aliases be written with abbreviation?

Mario Carneiro (Aug 14 2020 at 19:51):

they should be written using alias

Chris Wong (Aug 15 2020 at 02:16):

@Junyan Xu GitHub Actions will try to build your branch as soon as you push. That will tell you if it works or not :smiling_face:️

Jalex Stark (Aug 16 2020 at 22:03):

Mario Carneiro said:

We should have a lint for this

I guess the most obvious way to implement is to try to prove every statement with library_search, and complain if it finds a proof by exact foo?
@Scott Morrison
Is the less-powerful mode of library search cheap enough for this? (would it require an even weaker mode to get the behavior we want?)
I guess this only helps if one of the files with the duplicate declaration imports the other. I expect that's true in a large fraction of cases?

Mario Carneiro (Aug 16 2020 at 23:14):

That's way too expensive. The simple option is to throw everything in an expr_map and check for duplicates, but I think that will have false negatives

Jalex Stark (Aug 16 2020 at 23:45):

~~is there a notion of bounded definitional equality that's cheap enough to compute and gets rid of most of the false negatives?~~
I think i don't have a model for what your false negative looks like.

Do you want the linter to catch two instances of the same lemma, but with the Prop arguments in a different order?

Notification Bot (Aug 17 2020 at 02:05):

This topic was moved by Jalex Stark to #general > a dedup linter?

Last updated: May 02 2025 at 03:31 UTC