Zulip Chat Archive

Stream: new members

Topic: Correct workflow to work with Mathlib


Nicolò Cavalleri (Jun 26 2020 at 10:42):

Suppose I create a new Lean project with the command leanproject following the tutorial online (https://leanprover-community.github.io/install/project.html). If I understand correctly, this creates a local Git repository on my computer that is not linked to any online GitHub repo. Suppose further that the files in this repository are meant to be PRed on Mathlib. I am wondering what the correct workflow is. I suppose that, since this repo is not linked to Mathlib at the beginning, a good idea is to upload it on a personal repo online on my GitHub account (that I am not really sure how to do since I always create the GitHub repo first but if this is the case I will figure out by myself). Suppose that to run my code I need to fill up holes in the existing files of Mathlib. Should I edit directly the files in the src folder in this repo? What are then the next step to PR on Mathlib?

Shing Tak Lam (Jun 26 2020 at 10:58):

https://leanprover-community.github.io/contribute/index.html might help?

Johan Commelin (Jun 26 2020 at 11:09):

@Nicolò Cavalleri The best approach would be to do leanproject get mathlib, and then work on a branch inside that

Johan Commelin (Jun 26 2020 at 11:09):

(I don't know how familiar you are with git)

Johan Commelin (Jun 26 2020 at 11:10):

If you give us your github username, we can give you push access, so that you can push your branch to github, and create PRs

Nicolò Cavalleri (Jun 26 2020 at 11:21):

Ok thanks a lot! My github username is Nicknamen

Patrick Massot (Jun 26 2020 at 11:21):

Nicolò, could you tell us if https://leanprover-community.github.io/contribute/index.html is hard to understand?

Patrick Massot (Jun 26 2020 at 11:22):

You should have received a GitHub invitation.

Nicolò Cavalleri (Jun 26 2020 at 11:24):

Johan Commelin said:

(I don't know how familiar you are with git)

I used it in the past but I never used it in shared projects so I am not that familiar

Nicolò Cavalleri (Jun 26 2020 at 11:27):

Patrick Massot said:

Nicolò, could you tell us if https://leanprover-community.github.io/contribute/index.html is hard to understand?

What I do not understand is if this process has anything to do with the command leanproject new my_project or if this latter command is meant to create projects that are not meant to be uploaded on Mathlib

Kevin Buzzard (Jun 26 2020 at 11:28):

Making a new project is usually for projects which are not for mathlib but rather use mathlib as a dependency

Nicolò Cavalleri (Jun 26 2020 at 11:36):

Kevin Buzzard said:

Making a new project is usually for projects which are not for mathlib but rather use mathlib as a dependency

Ok thanks, this was precisely my doubt! So I guess that to work on Mathlib I am supposed to use plain git commands and the mathilb tools do not play a major role

Patrick Massot (Jun 26 2020 at 11:37):

No, they play a major role.

Nicolò Cavalleri (Jun 26 2020 at 11:38):

Patrick Massot said:

No, they play a major role.

To compile right?

Nicolò Cavalleri (Jun 26 2020 at 11:40):

I mean for example I use pull to update the library and not leanproject up

Nicolò Cavalleri (Jun 26 2020 at 11:41):

I guess I will have a try and try to understand also with practice

Patrick Massot (Jun 26 2020 at 11:42):

Say you want to share a shiny new lemmas about sets. The workflow is:

  • leanproject get -b project_name:shiny_lemma
  • edit data/set/basic.lean
  • If you are anxious, leanproject build to check you didn't break anything. This will be long because you edit a fundamental file, imported by pretty much everything else.
  • git commit -a
  • git push origin
  • Visit https://github.com/leanprover/mathlib to see an invitation to open a PR based on what you just did.
  • Wait for continuous integration to build your branch if you didn't do it locally
  • leanproject get-cache will download what was built by CI
  • etc.

Nicolò Cavalleri (Jun 26 2020 at 11:42):

Patrick Massot said:

Say you want to share a shiny new lemmas about sets. The workflow is:

  • leanproject get -b project_name:shiny_lemma
  • edit data/set/basic.lean
  • If you are anxious, leanproject build to check you didn't break anything. This will be long because you edit a fundamental file, imported by pretty much everything else.
  • git commit -a
  • git push origin
  • Visit https://github.com/leanprover/mathlib to see an invitation to open a PR based on what you just did.
  • Wait for continuous integration to build your branch if you didn't do it locally
  • leanproject get-cache will download what was built by CI
  • etc.

Ok thanks this is very useful!!

Patrick Massot (Jun 26 2020 at 11:45):

Oops, I failed to replace "project_name" by "mathlib".

Johan Commelin (Jun 26 2020 at 11:59):

@Nicolò Cavalleri Yes, I use leanproject up and leanproject get-cache all the time when working on mathlib.

Paul van Wamelen (Jun 26 2020 at 13:16):

How does all of this change if I'm working off a forked mathlib? In particular, how do I update my fork from the upstream changes? I merged the remote-tracking branch 'upstream/master' into my forked branch and pushed it. How do I update my local folder now? I tried leanproject up but it says "Couldn't pull from a relevant git remote". My toml says 'leanprover-community/lean:3.16.5' though. Shouldn't that be pointing to my forked repo? If I wait for the oleans to be built on GitHub can I then do leanproject get-cache?

Patrick Massot (Jun 26 2020 at 13:33):

To be honest, we don't really thought carefully about that case. Basically every mathlib contributor is given write access to non-master branch of the community repo, and nobody works on forks of mathlib.

Patrick Massot (Jun 26 2020 at 13:34):

So you only need to tell us you want to contribute and give us you GitHub login.

Paul van Wamelen (Jun 26 2020 at 13:43):

Then we should DEFINITELY remove the "Mathlib principally uses the fork-and-branch workflow." comment at the top of https://leanprover-community.github.io/contribute/index.html!

Paul van Wamelen (Jun 26 2020 at 13:44):

My GitHub login is paulvanwamelen. Thanks!

Patrick Massot (Jun 26 2020 at 13:44):

True, this is outdated.

Patrick Massot (Jun 26 2020 at 13:45):

You can now use your invitation to open a PR fixing that page.

Nicolò Cavalleri (Jun 26 2020 at 14:12):

Paul van Wamelen said:

Then we should DEFINITELY remove the "Mathlib principally uses the fork-and-branch workflow." comment at the top of https://leanprover-community.github.io/contribute/index.html!

I admit I was also a bit confused by this sentence as the workflow seems to be slightly different from what I have always been doing with Git in the past, but since I have never used it in shared projects there was also the option that I have not been using it the whole time as it is supposed to be used without no one ever noticing

Nicolò Cavalleri (Jun 29 2020 at 17:16):

I am trying to follow the workflow delineated by Patrick, but I am having problems to update Mathlib.
I branched Mathlib with leanproject get -b mathlib:whatever and then I edited a couple of files. Before trying to compile I ran leanproject up but I get

Couldn't pull from a relevant git remote. You may try to git pull manually and then run `leanproject get-cache`

If I run git pull origin master and then leanproject get-cache I get

Looking for local mathlib oleans
Looking for remote mathlib oleans
Trying to download
https://oleanstorage.azureedge.net/mathlib/c5313d5ede5d33bc6ece62ecff1aa7d84691f692.tar.xz to C:\Users\nicoc\.mathlib\c5313d5ede5d33bc6ece62ecff1aa7d84691f692.tar.xz
Trying to download https://oleanstorage.azureedge.net/mathlib/c5313d5ede5d33bc6ece62ecff1aa7d84691f692.tar.gz to C:\Users\nicoc\.mathlib\c5313d5ede5d33bc6ece62ecff1aa7d84691f692.tar.gz
Trying to download https://oleanstorage.azureedge.net/mathlib/c5313d5ede5d33bc6ece62ecff1aa7d84691f692.tar.bz2 to C:\Users\nicoc\.mathlib\c5313d5ede5d33bc6ece62ecff1aa7d84691f692.tar.bz2
Looking for GitHub mathlib oleans
Info: No github section found in 'git config', we will use GitHub with no authentication
Failed to fetch mathlib oleans

I am nor really sure what the GitHub section is. I do have my GitHub username in git config and I have no problem pulling and pushing to personal repos on my account.

Johan Commelin (Jun 29 2020 at 17:17):

@Nicolò Cavalleri Did you commit your changes?

Johan Commelin (Jun 29 2020 at 17:18):

If so, then github will not have olean files for your repo

Nicolò Cavalleri (Jun 29 2020 at 17:18):

Yes I did!

Johan Commelin (Jun 29 2020 at 17:18):

Ok, that explains why it's failing

Johan Commelin (Jun 29 2020 at 17:18):

I suggest

git checkout master
leanproject get-cache
git checkout your-branch

Bryan Gin-ge Chen (Jun 29 2020 at 17:19):

leanproject get-cache attempts to download olean files that have been built "in the cloud". It won't be able to download olean files reflecting changes that have only been made on your computer.

Johan Commelin (Jun 29 2020 at 17:19):

If you push your branch, then CI will start creating oleans for you. You can fetch them later with leanproject, but it will take some time. (2 hours? approx.)

Nicolò Cavalleri (Jun 29 2020 at 17:19):

But leanproject wanted me to do it:

leanproject get-cache
The repository is dirty, please commit changes before fetching cache, or run this command with option --force.

Johan Commelin (Jun 29 2020 at 17:20):

Sure... I guess this is confusing.

Nicolò Cavalleri (Jun 29 2020 at 17:20):

Am I supposed to run it with --force then?

Johan Commelin (Jun 29 2020 at 17:20):

If you run git log you will see a bunch of long hexadecimal numbers. Those are git commit hashes.

Johan Commelin (Jun 29 2020 at 17:21):

leanproject looks at the git commit hash of the current commit, and tries to fetch oleans from the cloud for that commit.

Johan Commelin (Jun 29 2020 at 17:21):

But it the commit doesn't exist upstairs... it can't help.

Nicolò Cavalleri (Jun 29 2020 at 17:21):

Oh ok that makes sense!

Johan Commelin (Jun 29 2020 at 17:22):

@Nicolò Cavalleri No, I think just commiting is fine. And then move to a commit (e.g. master) that does have oleans. Fetch the oleans, and go back to your branch.

Johan Commelin (Jun 29 2020 at 17:22):

Then you will have oleans that are quite close to what you want, but you still have to compile because of your local changes.

Nicolò Cavalleri (Jun 29 2020 at 17:22):

Johan Commelin said:

Then you will have oleans that are quite close to what you want, but you still have to compile because of your local changes.

You mean with leanproject buid?

Johan Commelin (Jun 29 2020 at 17:23):

If you already pushed your branch (say yesterday) then you can do

git checkout origin/your-branch

and fetch oleans there. Those might be better than the oleans from master.

Johan Commelin (Jun 29 2020 at 17:23):

Depends on when you last merged master into your branch

Nicolò Cavalleri (Jun 29 2020 at 17:23):

Johan Commelin said:

Depends on when you last merged master into your branch

I never merged anything yet! I did all of this like 2 hours ago

Johan Commelin (Jun 29 2020 at 17:25):

Johan Commelin said:

I suggest

git checkout master
leanproject get-cache
git checkout your-branch

Then do this sequence :up:

Nicolò Cavalleri (Jun 29 2020 at 17:26):

Well switching to the master branch it worked to do leanproject up... Am I supposed to do it only from the master branch?

Johan Commelin (Jun 29 2020 at 17:26):

No, like I said, you can also do it from origin/your-branch

Johan Commelin (Jun 29 2020 at 17:27):

From any commit that also exists on github (for more than 2hrs, otherwise it might still be busy creating your oleans)

Nicolò Cavalleri (Jun 29 2020 at 17:27):

Johan Commelin said:

No, like I said, you can also do it from origin/your-branch

I thought you meant leanpoject get-cache becasue leanproject up did not work from my branch even before I committed anything

Johan Commelin (Jun 29 2020 at 17:27):

That's the power. If you push to github, and share your branchname. We can pull, and get the oleans.

Johan Commelin (Jun 29 2020 at 17:28):

becasue leanproject up did not work from my branch even before I committed anything

but your branch only existed on your computer, not on github, right?

Nicolò Cavalleri (Jun 29 2020 at 17:29):

Johan Commelin said:

becasue leanproject up did not work from my branch even before I committed anything

but your branch only existed on your computer, not on github, right?

Oh right sorry I was thinking only about the commits existing only on my computer, not my branch

Johan Commelin (Jun 29 2020 at 17:29):

If you do git checkout -b new-branch then after that leanproject get-cache still works. But as soon as you make changes, it will probably complain that your directory is dirty.

Nicolò Cavalleri (Jun 29 2020 at 17:30):

Ok thanks it worked I will try to build now with leanproject build

Dima Pasechnik (Jun 29 2020 at 18:49):

Patrick Massot said:

Nicolò, could you tell us if https://leanprover-community.github.io/contribute/index.html is hard to understand?

I am curious to understand the chosen model - why everyone (apart from committers to the master branch and the like) would need write access to the main repo (I do know much more than I ever wanted to know about Git workflows)?

Bryan Gin-ge Chen (Jun 29 2020 at 18:53):

The main reason is that we use Github actions to compile the ".olean" files and upload them to Azure (so that any reviewers don't have to compile mathlib themselves), and we only want that to happen for workflows run from the main repository.

Johan Commelin (Jun 29 2020 at 18:56):

Right... compiling mathlib takes ~1 hour on a decent machine. This way, people can jump right in, after downloading precompiled binaries from a branch.

Dima Pasechnik (Jun 29 2020 at 18:58):

Bryan Gin-ge Chen said:

The main reason is that we use Github actions to compile the ".olean" files and upload them to Azure (so that any reviewers don't have to compile mathlib themselves), and we only want that to happen for workflows run from the main repository.

I guess you have a priority access to GH Actions queue (we do use a free tier GH Actions in SageMath work, and they tend to take a lot of time), right?

Bryan Gin-ge Chen (Jun 29 2020 at 19:01):

We're also using the free tier for GH actions as far as I know. It's more of a security thing, I think. Other forks can run most of our workflow to build and check their commits under their own GH Actions plans, but they can't upload the results to Azure since we store the credentials as "secrets" on github, and those only work for workflows run from our own repo.

Johan Commelin (Jun 29 2020 at 19:03):

@Dima Pasechnik We're using incremental builds, so not every build takes 1hr, most take 5-15 minutes. Nevertheless, the "leanproject get-cache, wait 5 seconds" is hard to beat.

Bryan Gin-ge Chen (Jun 29 2020 at 19:03):

Of course, if we're giving out write access willy-nilly then the security benefits become negligible. So we'll most likely have to revisit this (if we're lucky enough to keep growing).

Dima Pasechnik (Jun 29 2020 at 19:03):

Johan Commelin said:

Right... compiling mathlib takes ~1 hour on a decent machine. This way, people can jump right in, after downloading precompiled binaries from a branch.

precompiled binaries packed in some kind of container (we use Docker?), otherwise I don't see how this can work.

Kevin Buzzard (Jun 29 2020 at 19:04):

Trying to download https://oleanstorage.azureedge.net/mathlib/4e2c7e36308b6349c5a794237ebfef91e309ac2f.tar.xz to /home/buzzard/.mathlib/4e2c7e36308b6349c5a794237ebfef91e309ac2f.tar.xz

Kevin Buzzard (Jun 29 2020 at 19:05):

A python script downloads the tarball, and then unpacks it and puts it all in the right place.

Johan Commelin (Jun 29 2020 at 19:05):

@Dima Pasechnik what do you mean? lean compiles .lean files into .olean files. And leanproject will download those olean files (in a tarball) for whatever branch you are currently interested in.

Johan Commelin (Jun 29 2020 at 19:06):

So there is no Docker involved.

Sebastien Gouezel (Jun 29 2020 at 19:06):

"binaries" is not the right word. Rather, from a proof script in which there is a lot of work to do, Lean can extract the compiled version of the proofs, which is much faster to read and check and use, but this is not a binary in the sense of executable, it is still a file that will be used by the Lean executable.

Johan Commelin (Jun 29 2020 at 19:06):

Also, we are not talking about compiling lean itself. That's a different story (-;

Johan Commelin (Jun 29 2020 at 19:07):

@Sebastien Gouezel It is a "binary" file, in the sense that it isn't plaintext human readable. (Unless you are Mario, I guess.)

Sebastien Gouezel (Jun 29 2020 at 19:07):

It is a binary file, but it is not "binaries" in the executable sense.

Alex J. Best (Jun 29 2020 at 19:08):

The "binaries" are leans .olean files, which is a custom format that happens to be platform independent, maybe its more like .py->.pyc than .c->.o

Dima Pasechnik (Jun 29 2020 at 19:13):

I see, thanks for explaining.

Dima Pasechnik (Jun 29 2020 at 19:15):

basically, GH Actions is a mainframe for doing some heavy lifting.


Last updated: Dec 20 2023 at 11:08 UTC