Zulip Chat Archive

Stream: mathlib4

Topic: There are 1617 files with changed transitive imports ...


Kim Morrison (Aug 07 2024 at 05:12):

@Damiano Testa, when your import diff script decides there are too many difference to display, could we provide a command line tool to get this information? (Or just post it anyway? It's folded away...) I often find myself wanting to know, despite it being many lines.

Damiano Testa (Aug 07 2024 at 05:41):

I'm not at a computer now, but I will look at this. The reason I introduced the cut-off is that the very first test that I made had so many changes that the whole comment got corrupted.

Damiano Testa (Aug 07 2024 at 05:41):

However, I think that I have been too conservative with the length and much longer reports can still be displayed.

Damiano Testa (Aug 07 2024 at 05:42):

And a command-line version should not have this limitation either.

Damiano Testa (Aug 07 2024 at 05:42):

(Btw, to give credit where it is due, Johan wrote the import script, I wrote the declaration diff.)

Damiano Testa (Aug 07 2024 at 07:27):

Kim, can you tell me on which PR this happened, so that I can test?

Damiano Testa (Aug 07 2024 at 07:30):

To test this, I removed every import line that contained the substring red and ran the script to make sure that it would say that it was too many changes (and it did complain).

I then ran the modified script that has a flag to show all import changes, no matter what and pasted the output in #14675 (a PR that has nothing to do with the script!). Git happily accepted the resulting comment as you can see.

Damiano Testa (Aug 07 2024 at 07:31):

I think that a better approach would be to limit the combined size of the import diff and the decl diff scripts, since they get written out to the same message.

Kim Morrison (Aug 07 2024 at 07:31):

#15579

Damiano Testa (Aug 07 2024 at 07:31):

(Although arguably, if you are making a lot of import changes, you should not be touching declarations and conversely.)

Damiano Testa (Aug 07 2024 at 07:34):

I just commented on the PR!

Damiano Testa (Aug 07 2024 at 07:36):

I'll try to find out what are the limits for GitHub messages and will use half of that as a cutoff. Besides that, I will also add the flag to print all the diffs anyway.

Damiano Testa (Aug 07 2024 at 07:41):

#15583 adds the all flag to see all imports.

Damiano Testa (Aug 07 2024 at 07:41):

For increasing the limit size of what the import diff reports, I need to investigate further!

Damiano Testa (Aug 07 2024 at 07:58):

Looking further into this specific commit, here is a wc for the output of the all script:

$ ./scripts/import_trans_difference.sh all | wc
[fluff]
     20    1679   71479

and according to "the internet", GitHub comment size limit is just one less than the largest known Fermat prime. This means that GitHub could likely take a message that is roughly an order of magnitude bigger than what the script currently rejects.

Ruben Van de Velde (Aug 07 2024 at 08:25):

Question is more whether it's useful for humans at that scale, I think

Damiano Testa (Aug 07 2024 at 08:26):

I agree, but if you look at the output for the PR, with the collapsible tabs it is very informative.

Damiano Testa (Aug 07 2024 at 08:29):

The script bases its decision on whether to post the full output or not based on the number of files that have import changes.

GitHub's limit is on the size of the message, not on the number of lines (although of course there are implications between the two!).

I wonder whether the "human" metric should take into account number of lines of output, since the lines correspond to knowing how many files with each possible import change there are.

Damiano Testa (Aug 07 2024 at 08:30):

Hence, if there are 2000 files with 3 import changes and 500 files with 10 import changes, the script maybe should report that, even if you are then not going to open the tabs to look at the 2000 files in the first block.

Damiano Testa (Aug 07 2024 at 12:06):

After investigating this a bit, it seems that comments can be roughly 260k characters long, really (probably yet another difference between the meaning of "character" with the 65536 bound that GitHub claims).

Damiano Testa (Aug 07 2024 at 12:07):

If there is consensus, I can change the check for the import summary to being half the allowed comment length on github. That would amply cover the case in question and probably most cases anyway, given how the number of characters of every .lean file in mathlib is now 206526.

Eric Wieser (Aug 08 2024 at 00:48):

You mean of every .lean file name, right?

Damiano Testa (Aug 08 2024 at 02:09):

Yes, indeed! The script would only report file names and this is what I had in mind!

Damiano Testa (Aug 08 2024 at 02:16):

I also ended up using 120k characters as a limit for the message in the PR.


Last updated: May 02 2025 at 03:31 UTC