Zulip Chat Archive

Stream: Lean Together 2024

Topic: Leaff: a Lean library diff tool - Alex Best


namibj (Jan 10 2024 at 17:51):

Semi-automated bisect over dependency versions might be quite helpful, too, at least for narrowing in to a commit that might be specific enough that the leaf output won't have much clutter to wade through.

David Thrane Christiansen (Jan 10 2024 at 18:17):

@Alex J. Best this tool looks great! My understanding was that each trait contributes a fixed "cost" - what would it take to consider tree diffs of types rather than hash comparisons?

Johan Commelin (Jan 10 2024 at 18:18):

Lovely talk! Looking forward to having this integrated in mathlib CI!

Mario Carneiro (Jan 10 2024 at 18:23):

I am definitely concerned about the case of lean version changes, because these happen periodically in mathlib and the probability of a lean version change tends to 1 as the mathlib window width grows

Mario Carneiro (Jan 10 2024 at 18:25):

I think it will require changes to the olean format though, because currently all you have is a lean4 commit sha and these aren't really ordered unless you maintain a list of all lean4 commit shas

Joachim Breitner (Jan 10 2024 at 18:25):

Tangential comment:
In case these automatic comments on PRs become too noisy (need to be scrolled through, trigger notification), one can also use the github API to write a “check” including markdown-formatted status pages:
https://docs.github.com/en/rest/checks/runs?apiVersion=2022-11-28#create-a-check-run
Or a bit simpler using job summaries:
https://github.blog/2022-05-09-supercharging-github-actions-with-job-summaries/
Then it’s one click away, so only really visible to those who know it’s there, but it's less verbose.

Mario Carneiro (Jan 10 2024 at 18:26):

I think we could also just remove proof changes from the PR comment output, I don't think that is very useful unless you are debugging core changes

Alex J. Best (Jan 10 2024 at 18:27):

It does already print a job summary, which is helpful for commits not attached to a PR. I agree that it can be spammy, I was planning to explore using <summary> tags on github markdown to give the user the option of which type of diffs you want to see.

Alex J. Best (Jan 10 2024 at 18:28):

Mario Carneiro said:

I think we could also just remove proof changes from the PR comment output, I don't think that is very useful unless you are debugging core changes

Yeah this is also probably true!

Mario Carneiro (Jan 10 2024 at 18:30):

my thought regarding version changes is to use leangz, which can double as a olean migration tool because it is version aware and is essentially the only tool in the lean ecosystem which is capable of reading oleans generated by multiple versions of lean

Alex J. Best (Jan 10 2024 at 18:30):

Mario Carneiro said:

I am definitely concerned about the case of lean version changes, because these happen periodically in mathlib and the probability of a lean version change tends to 1 as the mathlib window width grows

Yeah this is one reason why I think keeping a list of diffs for each commit to master and then composing them is a better way of getting to a diff for a long period.

If CHECK_OLEAN_VERSION was overridable at runtime that would make it much simpler to diff over changed lean version

Alex J. Best (Jan 10 2024 at 18:32):

David Thrane Christiansen said:

Alex J. Best this tool looks great! My understanding was that each trait contributes a fixed "cost" - what would it take to consider tree diffs of types rather than hash comparisons?

Thanks, I still think it would be good to use the hashes as a first pass so we can ignore everything that definitely didn't change, but outputting tree diffs when the type did change would definitely be a nice feature for usability.

Mario Carneiro (Jan 10 2024 at 18:33):

You can also probably get a performance boost by not building an Environment at all and instead reading the ModuleDatas directly. This is what lake exe shake does

Mario Carneiro (Jan 10 2024 at 18:33):

that also gives you an easy place to early-out when a module / constant is unchanged

Mario Carneiro (Jan 10 2024 at 18:35):

it may also help with the issue of partially failed builds since it makes you less sensitive to errors

David Thrane Christiansen (Jan 10 2024 at 18:45):

Alex J. Best said:

Thanks, I still think it would be good to use the hashes as a first pass so we can ignore everything that definitely didn't change, but outputting tree diffs when the type did change would definitely be a nice feature for usability.

Agreed on hashing first! But it seems that the tree diff might also be useful in figuring out whether something was renamed, moved, and had a new type class requirement added to its signature, vs a delete/add

David Thrane Christiansen (Jan 10 2024 at 18:46):

RE prior work - I believe Elm has something like this

David Thrane Christiansen (Jan 10 2024 at 18:46):

Also, the GHC devs use a similar technique to ensure that there's no unexpected changes to the standard library from one release to another

David Thrane Christiansen (Jan 10 2024 at 18:47):

Though they just dump the interface to a text file and then use text diff of that, I suppose

David Thrane Christiansen (Jan 10 2024 at 18:50):

GHC

Christian Merten (Jan 12 2024 at 16:35):

Can Leaff somehow spit out binder annotation changes? When regrouping assumptions by grouping them up in variable lines, as for example in #9696, a double check by leaff would be great. Currently it does not seem to do that though.

Alex J. Best (Jan 12 2024 at 17:00):

I would say yes it's a planned feature to output the most useful information in such a situation at the very least. Can you say a bit more about what you'd like to see for this PR specifically (or what you'd hope to see if the PR made an unintended change) so I can make sure it behaves as you want in future.

Christian Merten (Jan 12 2024 at 17:15):

Minimal example:

theorem foo (α β : Type) (a b : α) (f : α  β) (h : a = b) : f a = f b :=
  congrArg f h

vs

theorem foo {α β : Type} (a b : α) (f : α  β) (h : a = b) : f a = f b :=
  congrArg f h

should be shown as different.

Yury G. Kudryashov (Jan 16 2024 at 00:34):

I'm getting the following error on recent Mathlib:

uncaught exception: unknown package 'ImportGraph'

Alex J. Best (Jan 16 2024 at 00:35):

What are the two mathlib versions?

Yury G. Kudryashov (Jan 16 2024 at 00:39):

branch#YK-sort-meas (3715cfa2fa) and 137ab0bb05 (master)

Mario Carneiro (Jan 16 2024 at 00:40):

there was some time when cache was broken and would not download ImportGraph. Try running lake build in both projects first

Yury G. Kudryashov (Jan 16 2024 at 00:40):

UPD: possibly, bad cache

Yury G. Kudryashov (Jan 16 2024 at 00:41):

For some reason, I have to lake clean before lake exe cache get, or I get bad oleans from time to time.

Mario Carneiro (Jan 16 2024 at 00:41):

repros are helpful for that (but we may not be able to do anything about issues in an old version)

Mario Carneiro (Jan 16 2024 at 00:42):

or at least a description of what you did and what the error message is

Yury G. Kudryashov (Jan 16 2024 at 00:43):

Can I ask leaff to pretty-print old and new types for "type changed"?

Alex J. Best (Jan 16 2024 at 00:52):

Not right now, but its a feature I'd like to add very soon

Yury G. Kudryashov (Jan 19 2024 at 18:34):

I'm going to open a bunch of Github issues on https://github.com/alexjbest/leaff so that we can refer to particular missing features by numbers.

Alex J. Best (Jan 19 2024 at 18:35):

Great idea, thanks! I'll try and copy anything else I can think of there when you are done

Yury G. Kudryashov (Jan 19 2024 at 18:53):

I opened 6 issues. I'm done for now.

Andrew Yang (Apr 25 2024 at 07:52):

Is it possible to run leaff on a mathlib PR? I remember seeing some instructions but cannot find it now.


Last updated: May 02 2025 at 03:31 UTC