Zulip Chat Archive

Stream: new members

Topic: mathlib versioning and changelog


David Chanin (Jul 08 2022 at 09:44):

Is there a changelog or mathlib? I keep finding breaking changes have been made since a tutorial I've been following from 2020 was posted (seems like lots of things get renamed in mathlib so code breaks), but the major version of mathlib doesn't change. Is there a list of breaking changes somewhere? It seems like mathlib doesn't follow semantic versioning?

Kevin Buzzard (Jul 08 2022 at 09:49):

It's #rss :-)

Kevin Buzzard (Jul 08 2022 at 09:50):

I absolutely agree that mathlib subscribes to the "move fast and break things" approach. If whatever you are following is a properly-formatted Lean project then changes in mathlib shouldn't matter though, because the project should be tied to a version of mathlib which will be compatible with it.

David Chanin (Jul 08 2022 at 09:56):

I'm just trying to follow along with a video tutorial (actually created by you! Thank you so much for these!), so no way to check the version of mathlib. I'd also like to be on the recent version in general though. Move fast and break things is OK as long as it's all documented what's going on I think. So there's no way to tell what breaking changes have happened in each version release, aside from scrolling through Zulip and hoping it's mentioned somewhere in the past? There are automated systems to enforce this, like https://github.com/semantic-release/semantic-release will keep updating a CHANGELOG file in the repo with all the changes at every release, and note if they're breaking changes, etc.

David Chanin (Jul 08 2022 at 09:58):

If I could just search a "CHANGELOG" file for the term "findim" for example and discover that it's been renamed to "finrank" that would be amazing

Johan Commelin (Jul 08 2022 at 10:00):

Would searching the git log be good enough for you? In principle, such renames should be mentioned in the commit messages. Although sometimes something might slip through.

Mauricio Collares (Jul 08 2022 at 10:00):

An automated alternative would be to use bisection to upgrade mathlib "as far as possible" ("does it build at commit X" is not necessarily a monotone function, but probably close enough), then show the commit message for the breaking commit, ask the user to fix the breakage manually, and repeat until you reached master.

David Chanin (Jul 08 2022 at 10:03):

I'd rather not need to clone the git repo and search commit messages. A simple CHANGELOG file is indexable by Google too, so in this case if I searched for "Lean findim" it would have come up with the changelog entry, saying something like "vx.y.z findim was renamed to finrank".

Johan Commelin (Jul 08 2022 at 10:03):

Re @Mauricio Collares: That sounds like it could be extremely time consuming, given how long it takes to build mathlib

Yaël Dillies (Jul 08 2022 at 10:04):

... except that all you need is to get the cache.

Johan Commelin (Jul 08 2022 at 10:04):

Fair enough.

Johan Commelin (Jul 08 2022 at 10:05):

I would still prefer to merge multiple bumps in one go.

David Chanin (Jul 08 2022 at 10:05):

Are the mathlib releases automated in github actions or another CI? If so it's pretty easy to add an automated step to append the new version to a CHANGELOG file in the git repo with the version number and add all the commit messages for the version into it too

Johan Commelin (Jul 08 2022 at 10:05):

@David Chanin mathlib is very much rolling release (similar to Archlinux, eg). Almost every commit breaks something.

Johan Commelin (Jul 08 2022 at 10:05):

So the CHANGELOG would almost be a concatenation of all the commit messages.

Alex J. Best (Jul 08 2022 at 10:06):

git log -Slemma_that_used_to_exist does require you to clone the repo but will be far more useful than a human written change log, it searches the actual changes themselves. There are way too many changes to small things all the time in mathlib than a changelog would cover.

David Chanin (Jul 08 2022 at 10:06):

Is there a versioning philosophy at least? It seems like it's not semantic versioning?

Johan Commelin (Jul 08 2022 at 10:06):

No, there's no versioning at all. It is rolling release.

Johan Commelin (Jul 08 2022 at 10:07):

Many of us view mathlib as a huge experiment. Not necessarily a product to build upon. But that might change in the future.

David Chanin (Jul 08 2022 at 10:07):

There is versioning, for instance my project shows I'm on Mathlib 3.44.1.

Johan Commelin (Jul 08 2022 at 10:07):

Aah, ok. But that just mirrors the lean release.

David Chanin (Jul 08 2022 at 10:08):

oh wait nm that's just lean, you're right

Johan Commelin (Jul 08 2022 at 10:08):

Lean does sorta follow semver.

David Chanin (Jul 08 2022 at 10:10):

there's no versioning at all? This can all be automated really easily, there's plugins like standard-version and semantic-release will manage the semantic versioning, keep a changelog, and push a new git tag for each new merge to master, all based on the commit message

Henrik Böving (Jul 08 2022 at 10:11):

semantic versioning in a Lean context is very hard. You have to understand, that unlike in other programming languages where the API and the behaviour matters in Lean the implementation of every function matters as well because they can be unfolded in proofs in other people's libraries and are thus part of the semantic of your library as well (if you change the implementation someone elses proof might not work). So technically speaking every change to a function or thing that can be unfolded from a consumer of your library is a breaking change.

David Chanin (Jul 08 2022 at 10:13):

Even if it's not semantic versioning, a version number would still be nice I think. Then I could say I'm on mathlib v321, and on v319 something was renamed etc

Eric Wieser (Jul 08 2022 at 10:16):

Perhaps what we should actually do is make a tool that takes in two mathlib revisions, and outputs a diff of the changed API surface

Eric Wieser (Jul 08 2022 at 10:17):

Which would look something like:

  • Fetch a cache for revision A, export a list of all symbols
  • Fetch a cache for revision B, export a list of all symbols
  • Compute a diff of just the lemma names (matching type information is very hard as A and B might define the types themselves in different ways

Eric Wieser (Jul 08 2022 at 10:18):

That wouldn't tell you which lemmas were renamed, but would tell you "foo_of_bar was added and foo was removed"

Henrik Böving (Jul 08 2022 at 10:19):

Elm does have a feature like this: https://elm-lang.org/ (3rd from the bottom)

David Chanin (Jul 08 2022 at 10:19):

That would be helpful, but could it be automated and written to a CHANGELOG after each push to master?

Johan Commelin (Jul 08 2022 at 10:21):

@Eric Wieser I guess the docs already contain a list of all names. Can that be reused for your idea?

David Chanin (Jul 08 2022 at 10:21):

I can open a PR on mathlib adding standard version as a Github action that will just do the git version tagging and update a integer as the version if it's easier to discuss there / if people are OK with that idea?

David Chanin (Jul 08 2022 at 10:21):

Is there any argument for not having a version number and a changelog if it's all automatic?

Eric Wieser (Jul 08 2022 at 10:22):

What's the point in havinga a version number vs using the git hash, if every commit is a new version?

Henrik Böving (Jul 08 2022 at 10:22):

We did have a discussion like this in the Lean 4 chat before. The argument there was basically that due to the reasons outlined above the version number does carry no meaning.

Johan Commelin (Jul 08 2022 at 10:23):

It's slightly easier to compare two integers vs two git hashes.

Damiano Testa (Jul 08 2022 at 10:23):

Would this mean that every time a new PR is merged into master, this file gets updated? This seems very close to the gitlog of master.

Eric Wieser (Jul 08 2022 at 10:23):

You can use git describe to convert a git hash into something reasonable I think

Eric Wieser (Jul 08 2022 at 10:23):

Although that probably fails spectacularly since all of our feature branches are in the main repo and not a fork

Damiano Testa (Jul 08 2022 at 10:24):

(My comment is not meant as a critique, just to try to understand what is being discussed!)

David Chanin (Jul 08 2022 at 10:24):

the difference is that casual users don't want to have to do a git clone, copy git hashes, and run CLI tools to tell what's changing / what version they're on

Eric Wieser (Jul 08 2022 at 10:24):

Johan Commelin said:

Eric Wieser I guess the docs already contain a list of all names. Can that be reused for your idea?

Yes, but the history of past documentation sites is discarded

Eric Wieser (Jul 08 2022 at 10:24):

David Chanin said:

the difference is that casual users don't want to have to do a git clone, copy git hashes, and run CLI tools to tell what's changing / what version they're on

But if they;re "on" a version then they have already git cloned it?

David Chanin (Jul 08 2022 at 10:25):

They probably don't know that though. For example I just did "leanproject new" and it did stuff, I didn't know it was cloning a git repo

David Chanin (Jul 08 2022 at 10:27):

It would also just be nice to see that something isn't working, and you could ask "what mathlib version are you on?" Then you'd be able to say "oh, you're on v127, but the most recent is v219" compared to "I'm on git hash 1as2d3e3324423d"

Damiano Testa (Jul 08 2022 at 10:28):

So maybe the log could simply translate the git hashes into consecutive numbers. Is this what you are proposing?

Eric Wieser (Jul 08 2022 at 10:28):

Would a leanproject show-upgrades command that printed "your mathlib is <n> commits behind master" help?

Damiano Testa (Jul 08 2022 at 10:28):

(I have gotten used to the git hashes, by now, but I agree that as a beginner, I would have rather seen a number.)

Kevin Buzzard (Jul 08 2022 at 10:29):

Theoretically this sounds nice but in practice this is not what is happening. The answer to a question in practice is never "you are on version x and need to be on version y", the answer is always "update mathlib to current master".

David Chanin (Jul 08 2022 at 10:29):

yes, and be a single file which could be indexed by google, and also submit a git tag with the version number as well which would show up in "releases" on github

Damiano Testa (Jul 08 2022 at 10:29):

Of course, since the updates happen every ~2 hours, the numbers will get very big quickly, I think!

David Chanin (Jul 08 2022 at 10:31):

Kevin Buzzard said:

Theoretically this sounds nice but in practice this is not what is happening. The answer to a question in practice is never "you are on version x and need to be on version y", the answer is always "update mathlib to current master".

But in this case all old proofs will be broken with no way to know what

Damiano Testa (Jul 08 2022 at 10:31):

Currently, there are 63132 "versions" of mathlib.

Kevin Buzzard (Jul 08 2022 at 10:31):

That's how we roll

Eric Wieser (Jul 08 2022 at 10:31):

I don't think tagged releases make sense at all if they're triggered on every commit. We could do them monthly in sync with the "this month in mathlib" posts, but as Kevin says this isn't useful information

Kevin Buzzard (Jul 08 2022 at 10:31):

If something is broken, just ask here.

Eric Wieser (Jul 08 2022 at 10:32):

I think this is an #xy problem; your X problem is "how do I work out why a mathlib update broke me", to which you've constructed the Y problem "can mathlib have better version information"

Kevin Buzzard (Jul 08 2022 at 10:33):

In the grand scheme of things, this is an experimental project with a small number of users and a friendly community who are happy to help out people with broken proofs.

Eric Wieser (Jul 08 2022 at 10:33):

I think the actual solution to X is "we need to make tooling (or at least documentation on existing tools) to help with this kind of thing, because the development methodology of mathlib makes conventional versioning useless"

Damiano Testa (Jul 08 2022 at 10:36):

Also, maybe it should be made more explicit that tutorials/videos/documentation often are tied to a very specific state (rather than version :) of mathlib and that the current one may not work the same way.

David Chanin (Jul 08 2022 at 10:36):

It's true you all are amazing and super helpful! I just feel like this will hold the project back. As a new user myself I find it really discouraging when I can't follow Lean tutorials without getting errors, then I can't google those errors and find a simple solution like "findim was renamed to finrank in version X", and need to know to come ask in Zulip

Damiano Testa (Jul 08 2022 at 10:38):

I understand your point, but besides name changes, there will be changes in which arguments are implicity/explicit, proofs that now require fewer assumptions, changes in namespaces, lemmas that became redundant due to better use of typeclasses...

While what you are suggesting seems to work in a special case, I am skeptical that it would really be useful in the long run.

Damiano Testa (Jul 08 2022 at 10:39):

In the long run, you will not be working on tutorials and will ask a question here about what can currently be done in mathlib, and you will not even think of checking out an old version. At least, I never checked out an older version of mathlib...

Damiano Testa (Jul 08 2022 at 10:40):

Btw, fixing (slightly) broken proofs is also very good practice for coming up with completely novel proofs: they start as completely broken and your task is to fix them!

David Chanin (Jul 08 2022 at 10:40):

This is true, but only if new users aren't immediately discouraged when they try to start learning Lean by issues like this.

Damiano Testa (Jul 08 2022 at 10:41):

This is where the welcoming community comes in! :smile:

Eric Wieser (Jul 08 2022 at 10:41):

David Chanin said:

It's true you all are amazing and super helpful! I just feel like this will hold the project back. As a new user myself I find it really discouraging when I can't follow Lean tutorials without getting errors, then I can't google those errors and find a simple solution like "findim was renamed to finrank in version X", and need to know to come ask in Zulip

This sounds like a problem with those tutorials. The right way to write a lean tutorial is to start with "open this project that I created for you that refers to a specific version of mathlib that matches the one the tutorial will be using". https://github.com/leanprover-community/tutorials does things this way.

David Chanin (Jul 08 2022 at 10:42):

Maybe I can make a cron job that updates a Gist every time there's a new commit to mathlib master with this information. I really like @Eric Wieser's idea of saying which lemmas were added, which were deleted as well to at least get a text log of this stuff

Eric Wieser (Jul 08 2022 at 10:43):

David Chanin said:

Maybe I can make a cron job that updates a Gist every time there's a new commit to mathlib master with this information. I really like Eric Wieser's idea of saying which lemmas were added, which were deleted as well to at least get a text log of this stuff

I think this would be a great idea, but I suspect it's more total man hours than just working out things manually every time things move. If building it interests you, go for it!

David Chanin (Jul 08 2022 at 10:44):

It wouldn't just be for me though, that would allow Google to index this info and the next use who searches "Lean findim" in Google will find the Gist, and see a commit where "findim" was remoned and "finrank" was added

David Chanin (Jul 08 2022 at 10:44):

I'll give it a try and see if I can get something simple working

Eric Wieser (Jul 08 2022 at 10:46):

Unfortunately, the type of lean code you need to write to do that kind of thing is rather short on tutorial content

Eric Wieser (Jul 08 2022 at 10:46):

Here's a script I wrote recently that makes a list of all function names and writes their file and line numbers to a .tex file, in case that helps you: https://github.com/eric-wieser/lean-graded-rings/blob/master/export.lean

David Chanin (Jul 08 2022 at 10:46):

I was thinking of just using regexes :sweat_smile:

David Chanin (Jul 08 2022 at 10:46):

aah awesome, thanks so much!

Eric Wieser (Jul 08 2022 at 10:47):

David Chanin said:

I was thinking of just using regexes :sweat_smile:

That doesn't work, these two code snippets have the same API:

@[simps]
def foo := (1, 2)
def foo := (1, 2)
@[simp] lemma foo_fst : foo.fst = 1 := rfl
@[simp] lemma foo_snd : foo.snd = 2 := rfl

Eric Wieser (Jul 08 2022 at 10:47):

(behind the scenes @[simps] is generating code)

David Chanin (Jul 08 2022 at 10:49):

hrm yeah no idea what that code means :sweat_smile:

Damiano Testa (Jul 08 2022 at 10:49):

to_additive is another source of "invisible" lemmas.

Eric Wieser (Jul 08 2022 at 10:49):

https://github.com/leanprover-community/leancrawler might be another option

David Chanin (Jul 08 2022 at 10:52):

I'll check it out! Thank you all so much!

Patrick Massot (Jul 08 2022 at 12:02):

David, you need to unlearn what you think you know about versioning. It simply doesn't apply here. You should also stop trying to follow video tutorials and use our written tutorials which suffer from none of these issues. There is simply no way we can have useful video tutorials, the project is moving too fast.

Patrick Massot (Jul 08 2022 at 12:04):

The learning resources webpage contains links towards the tutorial project, the "Theorem proving in Lean" and the book in progress "Mathematics in Lean". They are all pinned to a specific mathlib and regularly updated in a way that would be impossible to do with videos.

Patrick Massot (Jul 08 2022 at 12:07):

Where did you find those videos? We should probably remove them if they are within reach, or at least make sure they come with huge disclaimers.

Kevin Buzzard (Jul 08 2022 at 12:09):

If it's the video I'm thinking about (me proving that intersection of two 5-d subspaces of a 9-d space is nontrivial) then this wasn't really a tutorial, it was just a recording of a Twitch stream I made in around 2019. I can certainly add to the description of the video that the exact code no longer works as videoed (the same will be true of Scott's videos on the infinitude of primes, I should think).

Patrick Massot (Jul 08 2022 at 12:15):

It's true of any videos that is older than a couple of months old I guess.

Eric Wieser (Jul 08 2022 at 12:37):

Are the lftcm videos fine, or did we opt to bump the repo and make the examples match newer mathlib instead of old videos?

Yakov Pechersky (Jul 08 2022 at 13:07):

I don't understand why a page that is indexed by Google is some gold standard. I don't read the news by looking at the search hit in Google, I go directly to the news site.

Yakov Pechersky (Jul 08 2022 at 13:07):

On that site, I can look at the archive.

Yakov Pechersky (Jul 08 2022 at 13:08):

For mathlib, that is the commit history. And even more powerful, is git blame. We have amazing autogen documentation that can take you directly to the source code. There, you can use the browser gui to git blame and see exactly when a particular lemma appeared or changed.

Eric Wieser (Jul 08 2022 at 13:18):

There, you can use the browser gui to git blame and see exactly when a particular lemma appeared or changed.

Tooling to find when a lemma was removed is much less available online as far as I know

David Chanin (Jul 08 2022 at 13:35):

pleas please please don't take down the xena videos on youtube! they're sooo helpful, it's invaluable to see someone going through how to solve things in lean in real-time

David Chanin (Jul 08 2022 at 13:36):

My point with Google is just that as a beginner, if I'm following a tutorial and get an error, my first instinct is to start searching Google to see what's going on, if other people have this issue, etc

David Chanin (Jul 08 2022 at 13:37):

this is what I do with any programming language. If I'm working on a python project and run into an error, I'll google it, I won't try to go to a python chatroom and ask as a first instinct

Arthur Paulino (Jul 08 2022 at 13:52):

I see where you're going with the google thought. But let's take a step back and think about what's being shown on google's page. If you search for some Python or Java error, you'll probably see links to stackoverflow questions or blog posts. Sometimes we're also presented with the official API page.

I guess if you'd want to see results on google or on any other search engine about Lean, we'd need to populate those spaces first. People aren't talking a lot about Lean out there yet (except for some highly specialized domains about mathematics and formalization).

David Chanin (Jul 08 2022 at 15:47):

I set up a github action to create markdown and txt changelogs from mathlib on a cron here: https://github.com/chanind/mathlib-changelog. I'll see if I can get it to automatically pull-in info about what specific lemmas/theorems were added/removed in each commit, even if it's not perfect (eg just from running a regex over the diff)

Kyle Miller (Jul 08 2022 at 16:00):

I've thought that having a changelog that records changes to names would be useful, since it can be frustrating (even with experience!) to figure out where something went.

I don't know if this would work, but here's a proposal:

  • There's a CI lint step that checks to see that every name that used to exist is accounted for in a BRANCH_CHANGES file. This file has lines like nat.foo -> nat.bar or nat.foo -> ! if it was deleted (and if you want to be helpful, you can write a short message after the ! to say what you're supposed to do now).
  • When Bors assembles a batch, it takes the BRANCH_CHANGES for each PR, concatenates them, then inserts them at the beginning of the CHANGELOG file (mathlib master always has an empty BRANCH_CHANGES file). This block of changes is given some sort of version heading -- let's say the Bors batch number.

Kyle Miller (Jul 08 2022 at 16:02):

This is just meant to solve "where'd X go?" It doesn't seem necessary to include new names.

Kyle Miller (Jul 08 2022 at 16:04):

Or, rather than per-Bors batch, it would make more sense to have it alter the CHANGELOG per commit. I don't really know what we can do with Bors or what's possible. But there needs to be some way to keep different commits from conflicting for the CHANGELOG.

Eric Wieser (Jul 08 2022 at 16:16):

It's not clear to me that there's any benefit to making a human deal with this vs just generating everything after the fact

Eric Wieser (Jul 08 2022 at 16:16):

If CI is checking that BRANCH_CHANGES is accurate, it could just create an automatic comment telling reviewers what the change is

Eric Wieser (Jul 08 2022 at 16:17):

Oh, I guess the point is tracking renames, which can't be inferred

Ruben Van de Velde (Jul 08 2022 at 16:18):

I guess renames can be inferred if the type doesn't change at all

Eric Wieser (Jul 08 2022 at 16:18):

Or even just via heuristics; "90% this new lemma replaces it based on similarity of the expression tree"

Kyle Miller (Jul 08 2022 at 16:19):

There can be a tool to help generate this file for us automatically, but sometimes there are cases that can't be inferred, like when things get renamed and moved.

Eric Wieser (Jul 08 2022 at 16:20):

The nice thing about heuristics is it works on all past PRs too

Eric Wieser (Jul 08 2022 at 16:22):

The heuristics could include "parse a magic section from the commit message", that we encourage people to write for larger PRs

David Chanin (Jul 08 2022 at 16:49):

Yeah I think the "where'd X go?" question has been the hardest for me so far, anything that could make that easier / faster to resolve would be great!

Eric Wieser (Jul 08 2022 at 17:25):

Probably the first step in building such a tool would be deciding upon a smaller cache format that contains enough metadata for producing these diffs without containing the full oleans

Eric Wieser (Jul 08 2022 at 17:26):

Ruben Van de Velde said:

I guess renames can be inferred if the type doesn't change at all

def would like a word!

Alex J. Best (Jul 08 2022 at 17:46):

I just want to reiterate what I said earlier, git log -Slemma_name is a very convenient way to find where X went, it is almost always the case that the first commit listed there is the one that renamed X, and if the log entry doesn't say where normally it's easy to work out from the diff.

David Chanin (Jul 08 2022 at 20:00):

I tried extracting which lemmas / theorems / defs changed in each commit and added it to the repo. A sample changelog is here: https://github.com/chanind/mathlib-changelog/blob/main/markdown/CHANGELOG.2022.07.md. This will definitely miss stuff (it's just looking for things like +lemma xyz in the diff, but I think it does catch a lot of stuff

David Chanin (Jul 11 2022 at 13:20):

I turned the content into simple website for searching / viewing / indexing as well, which will auto-update with the repo as well. A bit rough still, but online at: https://mathlib-changelog.vercel.app/

Eric Wieser (Jul 11 2022 at 13:57):

I can't speak for the reliability of the data yet, but the site looks excellent and very useful for solving this type of problem

Eric Wieser (Jul 11 2022 at 13:58):

(git log -S is obviously great too, but it doesn't give me links that lead to the PR that discussed why something was changed)

Damiano Testa (Jul 11 2022 at 13:58):

Honestly, I was a little skeptical, but for the little that I have seen, it looks really great! Well done!

Alex J. Best (Jul 11 2022 at 14:48):

@Eric Wieser doesn't bors put the pr link in the squashed commit message? What else are you hoping for?

Johan Commelin (Jul 11 2022 at 18:34):

@David Chanin Nice! It's really fast as well.

Eric Wieser (Jul 11 2022 at 19:26):

Looking at this a little more, I've spotted the first major data problem; it doesn't know about namespaces, so considers

def foo.bar := 1

different to

namespace foo
def bar := 1
end foo

David Chanin (Jul 11 2022 at 19:40):

aah interesting, yeah, it will miss that because it's just doing a regex match on +def (term). It might need to do a proper parse of the file being modified before/after to be able to pick up on the namespaces

David Chanin (Jul 11 2022 at 19:43):

If namespaces are the main issue, it would probably be possible to pick that up still with regexes looking for the namespace keyword in the file being modified :thinking:

Eric Wieser (Jul 11 2022 at 20:48):

It's probably a better investment of time to do it properly

Eric Wieser (Jul 11 2022 at 20:49):

Especially since that will give it a real advantage over the git log -S method

David Chanin (Jul 11 2022 at 22:53):

I'm just worried it might be too slow to be feasible to do it the proper way, since if I understand correctly it would require Lean parsing the full Mathlib environment for every commit to analyze the state of which lemmas/defs/theorems exist in which commit. Unless, is there a way to get something like an abstract syntax tree structure from a single Lean file without having to load everything?

Eric Wieser (Jul 12 2022 at 06:47):

You're correct that that is what it would have to do; but remember we have cached builds of most commits

Floris van Doorn (Jul 12 2022 at 09:22):

This is great!

Floris van Doorn (Jul 12 2022 at 09:28):

Namespaces are one thing that are quite easy to read off without parsing the file properly... Finding all semantic changes is probably a couple orders of magnitude slower. Even when you can download the compiled oleans, you still have to run some Lean code importing all of mathlib. And the oleans aren't generated for all commits, so it might be harder to point to specific commits with certain changes.

Eric Wieser (Jul 12 2022 at 09:32):

Another possible problem with downloading all the oleans is that that's probably a _lot_ of network traffic, both for you and for our azure servers hosting them

Eric Wieser (Jul 12 2022 at 09:33):

Even if you just download it once and build a local database from the data

Heather Macbeth (Jul 12 2022 at 14:26):

@David Chanin We (@Daniel Packer and I) just used this tool to track down the new location of lt_omega_of_linear_independent, which was deleted on 5 June. Thanks!

David Chanin (Jul 12 2022 at 17:03):

Heather Macbeth said:

David Chanin We (Daniel Packer and I) just used this tool to track down the new location of lt_omega_of_linear_independent, which was deleted on 5 June. Thanks!

Glad to hear it helped! Thanks for sharing :)

David Chanin (Jul 12 2022 at 17:05):

Lean must first parse the raw string syntax content of a file into some tree structure before it can figure out what the file is referencing I assume? Is there a way to get at that parsed structure without it needing to go and follow all the imports and trying to figure out what's actually happening in the file?

David Chanin (Jul 12 2022 at 17:23):

Actually it looks like lean has a --ast option, which might be what I'm looking for :thinking:

David Chanin (Jul 12 2022 at 17:50):

hrm running lean --ast <file.lean> seems to throw lots of errors. It does output an AST JSON, but a there's a lot of nulls in it, and as a result a bunch of stuff is missing. I don't really understand why that would be though, surely the AST parse shouldn't error as long as the syntax is valid?

Kyle Miller (Jul 12 2022 at 18:08):

Like Floris said, parsing Lean improperly for namespaces should be relatively straightforward -- thankfully they always come in namespace foo / end foo pairs. I think there are two things you need to know:

  • Names can be escaped using french quotes (« »). For example, a.«b».c is the same as a.b.c.
  • Names can start with _root_ to override the fact that they are in a namespace.

For robustness, you should also be parsing section foo / end foo pairs, since frequently there are overlapping namespaces and sections with the same names. Sections have no impact on naming.

Kyle Miller (Jul 12 2022 at 18:14):

David Chanin said:

Lean must first parse the raw string syntax content of a file into some tree structure before it can figure out what the file is referencing I assume?

My understanding is that the data structures that support --ast are actually built for the sole purpose of having some machine readable output.

A Lean file is a sequence of commands, and each command has code for how they parse themselves and manipulate the Lean frontend's internal state. Each command is processed before the next one is even parsed.

Floris van Doorn (Jul 13 2022 at 11:53):

One potential feature request: on the search page, can you link to the most relevant Github page (including line number)?
For example, in Heather's example of
https://mathlib-changelog.vercel.app/lemma/lt_omega_of_linear_independent
it would be nice if the top result contains a link to the correct line in the commit diff and/or the correct line of the file after the commit (for deletions I understand this is tricky to figure out exactly). I don't know if this is too hard to add, but I think that would make it easier to see what the commit has changed.

David Chanin (Jul 13 2022 at 18:03):

Kyle Miller said:

David Chanin said:

Lean must first parse the raw string syntax content of a file into some tree structure before it can figure out what the file is referencing I assume?

My understanding is that the data structures that support --ast are actually built for the sole purpose of having some machine readable output.

A Lean file is a sequence of commands, and each command has code for how they parse themselves and manipulate the Lean frontend's internal state. Each command is processed before the next one is even parsed.

Ah that's much more complicated than I realized! It sounds like just scanning through looking for section/namespace keywords is the way to go

David Chanin (Jul 13 2022 at 18:04):

Floris van Doorn said:

One potential feature request: on the search page, can you link to the most relevant Github page (including line number)?
For example, in Heather's example of
https://mathlib-changelog.vercel.app/lemma/lt_omega_of_linear_independent
it would be nice if the top result contains a link to the correct line in the commit diff and/or the correct line of the file after the commit (for deletions I understand this is tricky to figure out exactly). I don't know if this is too hard to add, but I think that would make it easier to see what the commit has changed.

I think that shouldn't be too hard to add! I'll see if I can get it working

Floris van Doorn (Jul 15 2022 at 10:18):

One other feature request / bug report: I noticed it isn't tracking classes and instances. I guess unnamed instances might be hard to include without proper parsing, but maybe it's nice to add instances whose name is given explicitly, and classes (and inductive/structure/abbreviation if you aren't doing those already).

Floris van Doorn (Jul 15 2022 at 10:19):

E.g. https://github.com/leanprover-community/mathlib/commit/ede73b2571f6b40dba488d565a66c2bb1a91602f should be "removing" both regular_space and regular_space.t1_space.

Floris van Doorn (Jul 15 2022 at 10:26):

And let me also say: this website is indeed useful to figure out the new name of a renamed lemma!

David Chanin (Jul 15 2022 at 22:23):

I think I've got the proper namespace/section parsing stuff working now, so hopefully the names of all the lemmas/etc should include the correct namespace now

David Chanin (Jul 15 2022 at 22:25):

Floris van Doorn said:

One other feature request / bug report: I noticed it isn't tracking classes and instances. I guess unnamed instances might be hard to include without proper parsing, but maybe it's nice to add instances whose name is given explicitly, and classes (and inductive/structure/abbreviation if you aren't doing those already).

Does this work the same as with lemmas/defs/theorems? e.g. just looking for a keyword like structure <name> or instance <name>? I don't have a very good grasp of how instances / classes and stuff work in Lean yet, so for now I just added theorem/def/lemma since they seem pretty straightforward

Kyle Miller (Jul 15 2022 at 22:26):

structure <name> and inductive <name> work just the same as theorem/def/lemma. instance takes an optional name.

Kyle Miller (Jul 15 2022 at 22:28):

If instance doesn't have a name, then there are some relatively complicated rules for how it automatically generates a name for you (you'd pretty much need to run Lean to get that name)

Kyle Miller (Jul 15 2022 at 22:31):

abbreviation, def, theorem, and lemma all parse the same.

David Chanin (Jul 15 2022 at 23:15):

awesome, I'll see if I can get abbreviation, structure, and inductive added then! Seems like it should be a quick change

David Chanin (Jul 16 2022 at 00:11):

Just added abbreviation, structure, and inductive - turned out to be pretty straightforward

Johan Commelin (Jul 16 2022 at 08:03):

@David Chanin I just want to say that I really like how you had a vision of a useful tool. At first, we couldn't understand what you wanted that git log didn't offer already. Then you just went ahead and built the tool. And now we are ally excited about it. "Show, don't tell" and all that. But it works! Great job!

David Chanin (Jul 16 2022 at 09:40):

That's so nice of you to say, Johan! I wasn't really expecting to build anything either, but everyone on this thread was so supportive and offered so many ideas, it just sort of happened. Really a testament to the community you all have built!

David Chanin (Jul 16 2022 at 11:41):

Floris van Doorn said:

One potential feature request: on the search page, can you link to the most relevant Github page (including line number)?
For example, in Heather's example of
https://mathlib-changelog.vercel.app/lemma/lt_omega_of_linear_independent
it would be nice if the top result contains a link to the correct line in the commit diff and/or the correct line of the file after the commit (for deletions I understand this is tricky to figure out exactly). I don't know if this is too hard to add, but I think that would make it easier to see what the commit has changed.

I was able to get it to link directly to the file in the diff where the change occurred in Github from the search result page, but figuring out the line number will be a bit trickier. Hopefully this should still help make it easier to get to the relevant part of the diff than having to search through the entire list of changes at least

Kyle Miller (Jul 18 2022 at 15:19):

@David Chanin Thanks for building https://mathlib-changelog.org/!

I think it would be great if we eventually linked to it from the mathlib documentation. It could be integrated in a few different ways: (1) just a link, (2) an additional button for the search box when you don't find what you're wanting, (3) maybe even links from definitions/lemmas for when you want to know what something used to be called (though the definition area is already getting rather busy).

David Chanin (Jul 18 2022 at 22:22):

Hrm it's hard to tell what lemmas/theorems used to be called automatically, since it's just looking at the git diffs, but that would be awesome if it were possible to figure out, even if just in limited situations. Adding missing theorems to the search though should be possible. There's a full list of every lemma/theorem/etc that ever existed at https://mathlib-changelog.org/searchItems.json which is what the mathlib-changelog search is using internally (it's just a client-side static site, so it's just looking through that JSON file basically).

David Chanin (Jul 18 2022 at 22:22):

Everything in that list maps 1-to-1 to a page on mathlib-changelog. The prefix is the first letter of the type of item it is, so a means abbreviation, t means theorem, etc

David Chanin (Jul 18 2022 at 22:30):

Another idea I had was that it would be really helpful if there was a way to see examples of each theorem in mathlib used in real proofs. I find that I have a lot of trouble understanding how to use the theorems and definitions in Mathlib just from seeing the doc, it really helps to see it used in practice in a real proof. Maybe it would be possible to automatically scan all the repos in Github that are a lean project with mathlib that come up searching e.g https://github.com/search?l=Lean&p=13&q=lean&type=Repositories, and keep an index of which repos make use of which theorems so it could be linked to from the mathlib docs? Or it could just be another site like mathlib-changelog that auto-updates as new repos / versions of mathlib are created

Eric Wieser (Jul 19 2022 at 07:05):

I've seen tools online for other languages that do that

Ruben Van de Velde (Aug 05 2022 at 08:47):

@David Chanin hope it's okay if I use this thread for a bug report - the "view on github" link on https://mathlib-changelog.org/commit/9f55ed7 doesn't work; it seems to need the long form hash like https://github.com/leanprover-community/mathlib/commit/9f55ed7054dcf01bf8a3a69afc03a1ceebcfc09c

David Chanin (Aug 05 2022 at 09:15):

Ooh good catch! It looks like 7 chars isn't enough. 8 chars seems to work though. I'll update to use 8 chars moving forward!

Ruben Van de Velde (Aug 05 2022 at 11:08):

And thanks for making it, it's helped me a lot today

Eric Wieser (Aug 05 2022 at 11:57):

It's probably a good idea to use the full hash, since there might be collisions in 8 chars

David Chanin (Aug 05 2022 at 15:25):

https://stackoverflow.com/questions/18134627/how-much-of-a-git-sha-is-generally-considered-necessary-to-uniquely-identify-a It looks like 10-12 should be enough. This is saying the Linux kernel uses 12 chars with hundreds of thousands of commits, which is ~10x what mathlib has so far.

Patrick Massot (Aug 19 2022 at 12:37):

I just used https://mathlib-changelog.org for the first time today, tracking what happened to unit_interval.mk_zero and its friends and then matrix.minor_apply in order to update mathlib in the sphere eversion project. It was really nice!

Eric Wieser (Aug 19 2022 at 18:41):

I'll second that it's really useful for a lean bump, especially if you're not keeping an eye on PRs. Perhaps the documentation for leanproject up should even recommend using this to help deal with build errors.

Yaël Dillies (Aug 19 2022 at 18:42):

If it does, then I would feel more comfortable having the website migrated to the community repo somewhere (possibly part of the mathlib website), to avoid it going rot.


Last updated: Dec 20 2023 at 11:08 UTC