Zulip Chat Archive

Stream: mathlib4

Topic: Declaration diff with environment awareness


Damiano Testa (Mar 04 2025 at 17:01):

I have a prototype for a declaration diff that takes the actual environment into account, rather than the text-based script that is currently running in CI: #22464.

Damiano Testa (Mar 04 2025 at 17:01):

What I view as the main disadvantage with this script is that it requires a successful CI run to post its output. For this reason, I would not remove the old script, at least not yet, and would simply add another self-updating comment that contains the "real" declaration diff, when it is available.

Damiano Testa (Mar 04 2025 at 17:01):

If you want to see a sample outcome, this is how I would convert into Zulip-formatting what the output looks like in CI:

Declaration diff in Lean

Damiano Testa (Mar 04 2025 at 17:01):

You can see the actual comment on the testing branch #22497.

Damiano Testa (Mar 04 2025 at 17:02):

The remaining issue is that this is, as far as I know, the first action in the mathlib repository that is contingent on another action being successful. This of course means lots of new ways of script failing in CI. In particular, the action can only be tested if the workflow files are already in master.

Eric Wieser (Mar 04 2025 at 17:03):

The fact it hijacked a mergify comment seems like a second remaining issue!

Damiano Testa (Mar 04 2025 at 17:04):

So, if someone is familiar with workflow_run in CI, please do let me know if I am doing something wrong! Otherwise, it will have to be tested with multiple iterations! :smile:

Damiano Testa (Mar 04 2025 at 17:05):

Eric Wieser said:

The fact it hijacked a mergify comment seems like a second remaining issue!

Actually, that is my doing: I did not want to have to scroll up or down all the time, so I made the script overwrite the first available comment, that happened to be by mergify,

Damiano Testa (Mar 04 2025 at 17:06):

If you look at the commit whose message has a unicode character, that is where I rewrote the mergify message.

Johan Commelin (Mar 05 2025 at 05:55):

This is a great tool to have around! Do you think you can also make it available as some sort of lake exe diff {git-hash}?

Johan Commelin (Mar 05 2025 at 05:55):

And yes, let's just post two diff reports in the PR summary for the time being.

Damiano Testa (Mar 05 2025 at 07:13):

I would have to experiment a bit with the command line lean executable, but I can easily extract a command-line option for the script.

Damiano Testa (Mar 05 2025 at 07:14):

It would still have to switch branches and download the two caches and then run for a couple of minutes.

Damiano Testa (Mar 05 2025 at 07:14):

I think that it may be a bit unwieldy.

Damiano Testa (Mar 05 2025 at 07:15):

Currently, there is no heuristic that just inspects the modified files: the script load all the oleans and looks at all declarations.

Damiano Testa (Mar 05 2025 at 07:20):

I can write a modification of the script that only inspects the declarations contained in the modified files. This would be probably a good heuristic and would definitely be quicker.

Damiano Testa (Mar 05 2025 at 07:21):

However, if you change, for instance, the naming scheme of to_additive, you would not notice the difference downstream.

Damiano Testa (Mar 05 2025 at 07:22):

So, there could be a quick-and-dirty version that just inspects the declarations in the modified files and a thorough-but-slow version that inspects the whole environment.

Johan Commelin (Mar 05 2025 at 07:31):

Having those two options sounds good to me.

Johan Commelin (Mar 05 2025 at 07:31):

And switching branches + downloading cache is not a problem. I happy to wait for a few seconds, to get this accurate information.

Damiano Testa (Mar 05 2025 at 07:41):

How about the per-file-diff comes as a next PR?

Damiano Testa (Mar 05 2025 at 07:41):

I'm keen to test that the workflow action works...

Damiano Testa (Mar 05 2025 at 07:42):

Also, it's more than a few seconds: CI takes 5 minutes.

Damiano Testa (Mar 05 2025 at 08:36):

So, here is a plan.

Damiano Testa (Mar 05 2025 at 08:36):

I'm going to add the option of passing to the script a list of modules and a flag that can be

  • upstream, meaning diff in the given modules and anything that imports them (good for, e.g. to_additive changes);
  • downstream, meaning diff in the given modules and anything that they import (good for, e.g. detecting upstreaming to Batteries);
  • only, meaning diff just the declarations defined in the given module (quick, but not fully reliable).

Damiano Testa (Mar 05 2025 at 08:36):

If nothing is passed, then the default is the current behaviour: import all of mathlib and its libraries and compute the huge diff of everything.

Johan Commelin (Mar 05 2025 at 09:36):

That sounds like a good plan. Although my picture of up/downstream is the opposite of yours.
And 5 minutes is maybe a bit much. Could only be the option that we run in CI on every PR?

Johan Commelin (Mar 05 2025 at 09:37):

Then commenting !decl_diff all or !decl_diff upstream could maybe trigger a more complete diff.

Damiano Testa (Mar 05 2025 at 09:54):

The CI run is separate from the main build and, ideally, is triggered by the main build becoming green.

Damiano Testa (Mar 05 2025 at 09:56):

Anyway, once I have the new script, I may also optimise it a little bit: currently, it just dumps all the declarations in the two environments to two files and computes the diff of the files.

Damiano Testa (Mar 05 2025 at 09:56):

It can probably ignore all declarations in "common" modules and be quicker.

Damiano Testa (Mar 05 2025 at 09:57):

I probably won't have time to look at this until later this Friday, though.

Damiano Testa (Mar 05 2025 at 11:22):

I wonder if the difference between upstream and downstream is just one of perspective:

  • I was thinking of upstream as "the declarations that see that there have been upstream changes", whereas
  • I think that you were thinking of upstream as "the modules that are upstream of the changes".

I'll see if there is a less confusing terminology that makes it clear whose perspective is taken into account!

Johan Commelin (Mar 05 2025 at 11:48):

Aah, yep, that explains the confusion (-;


Last updated: May 02 2025 at 03:31 UTC