Zulip Chat Archive

Stream: mathlib4

Topic: verifying ported files against mathlib


Jireh Loreaux (Nov 15 2022 at 21:58):

What exactly is involved in verifying a ported file against a commit in mathlib? The port-status script says that a bunch of files haven't yet been verified. Is this verification process something the author of the ported file needs to do, or should it be anyone? I'm happy to go around verifying ports.

Scott Morrison (Nov 15 2022 at 23:48):

That would be amazing.

Scott Morrison (Nov 15 2022 at 23:49):

Currently there is no process, because no one has ever done it. :-)

Scott Morrison (Nov 15 2022 at 23:49):

But the idea could be:

Scott Morrison (Nov 15 2022 at 23:50):

  • open up mathlib at today's master
  • open up a fully ported file in mathlib4 in a side by side window
  • scroll through the two, making sure nothing has been missed, or at least that any significant changes are marked with -- Porting note: ...
  • if you find anything bad, either make a mathlib4 PR to fix it, or raise it here
  • if you don't find anything bad, edit the #port-status wiki to add the git sha for today's mathlib master

Mario Carneiro (Nov 15 2022 at 23:54):

The process just described sounds almost as hard as doing the port in the first place. I think we should instead review the diff between the mathlib version that was ported, and the latest version, and look for equivalents of new things in mathlib4

Scott Morrison (Nov 15 2022 at 23:55):

The problem is that for now we have no record of the mathlib version that was ported. :-(

Mario Carneiro (Nov 15 2022 at 23:55):

You can make some inferences from the date that the file was ported

Scott Morrison (Nov 15 2022 at 23:56):

Sounds good. It will definitely be a bit approximate, but I agree it's too painful to do "properly".

Mario Carneiro (Nov 15 2022 at 23:56):

it's not trivial to diff mathlib and mathlib4 directly because theorems might be entirely gone (having been upstreamed)

Scott Morrison (Nov 15 2022 at 23:57):

How about we try to fill in best guesses of git sha's today-ish, and then we'll be ready to write some automation that reports changes in mathlib after those sha's.

Mario Carneiro (Nov 15 2022 at 23:58):

one thing that is a bit closer to a diff (in the "set subtraction" sense) of mathlib - mathlib4 is the post-port contents of the mathport file

Scott Morrison (Nov 15 2022 at 23:59):

So how about this as an imperfect-but-doable-today plan:

  • for each ported file in #port-status with a mathlib4 PR number, but no sha
  • look up the date that PR was opened
  • take the first mathlib sha from that date, and put it in #port-status
  • decide that any discrepancies that slipped in will come out in wash as we continue porting

Jireh Loreaux (Nov 16 2022 at 00:31):

Okay, I can do that tonight.

Kevin Buzzard (Nov 16 2022 at 01:37):

When I was finishing up porting a file, I went to mathlib3, looked at the history of the relevant file, checked that I'd ported the last one or two PRs to that file, and then just said that I'd ported the version of the file corresponding to the last commit which touched that file.

Jireh Loreaux (Nov 16 2022 at 04:24):

Okay, I've added all the SHAs. Unfortunately, now the port-status script says that all these files have been modified since the commit at which they were verified (presumably because of adding the mathlib4 synchronization label in almost all cases).

Scott Morrison (Nov 16 2022 at 04:30):

Okay, but now we just need to script that shows the diffs in mathlib3 between the recorded commit and master.

Scott Morrison (Nov 16 2022 at 04:30):

And that will let us quickly verify that we haven't really missed anything.

Jireh Loreaux (Nov 16 2022 at 04:32):

Sure, but presumably we don't want it continually bugging us and making it look like there's things that have changed when they haven't changed meaningfully.

Mario Carneiro (Nov 16 2022 at 04:35):

yeah, the bot needs to detect synchronized message changes (which is one of the reasons I don't like the message, you have to syncronize the synchronization)

Mario Carneiro (Nov 16 2022 at 04:36):

it could be organized by a tag, or by just ignoring lines with specific content

Jireh Loreaux (Nov 16 2022 at 04:37):

A real hack that could potentially cause false negatives is to just see if the diff is smaller than 5 lines.

Mario Carneiro (Nov 16 2022 at 04:38):

yeah no

Mario Carneiro (Nov 16 2022 at 04:38):

checking if the diff contains SYNCHRONIZED is probably less likely to have a false negative

Jireh Loreaux (Nov 16 2022 at 04:45):

The conjunction of those two conditions is probably sufficient to guarantee no false negatives.


Last updated: Dec 20 2023 at 11:08 UTC