Zulip Chat Archive

Stream: mathlib4

Topic: submodules


Floris van Doorn (Nov 28 2022 at 10:17):

I believe that in mathlib4#700 @Scott Morrison accidentally added the dependencies of mathlib as submodules. As a result many bors batches are now failing. I tried fixing this in mathlib4#766, but maybe we also need to reset the runner? (Previous discussion: https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/Weird.20CI.20failure)

Scott Morrison (Nov 28 2022 at 13:47):

Gah, sorry. When lake moved its files around I must have added them. :-(

Scott Morrison (Nov 28 2022 at 13:48):

Changing branches too often.

Floris van Doorn (Nov 28 2022 at 14:18):

A tip for next time: never use git add .. Use git add -u instead. This will only ever add files that are already tracked (or use git commit -am <commit message> directly).

Jireh Loreaux (Nov 28 2022 at 14:38):

Can someone merge mathlib4#766 so that we can all merge master and stop getting these lint style errors?

Floris van Doorn (Nov 28 2022 at 14:50):

The build failed. I don't know if the error is even related to submodules: https://github.com/leanprover-community/mathlib4/actions/runs/3565817172/jobs/5991456034

Jireh Loreaux (Nov 28 2022 at 14:52):

Not that I really know anything about it, but it seems to me like it just lost a connection causing the download to fail. So maybe retry?

Ruben Van de Velde (Nov 28 2022 at 14:52):

Scott retried already

Heather Macbeth (Nov 28 2022 at 14:53):

I was so excited to see the title of this thread and learn that the port has reached submodules already ...

Floris van Doorn (Nov 28 2022 at 14:53):

sorry to disappoint :-)

Ruben Van de Velde (Nov 28 2022 at 14:59):

How far could they be...
/user_uploads/3121/TU_rIDpuL0uke4iJ-b9V8Xp0/Screenshot-from-2022-11-28-15-59-25.png

... oh

Scott Morrison (Nov 28 2022 at 15:01):

Even at that blurry resolution, I can recognise that big chunk on the left as "finiteness", which presumably isn't really needed to say what a submodule is...

Ruben Van de Velde (Nov 28 2022 at 15:04):

You've been looking at too many of those graphs, I think :)

At least one of the ways it depends on finiteness is through algebra.group.conj, which might be a candidate for splitting

Ruben Van de Velde (Nov 28 2022 at 15:05):

And order.atoms

Jireh Loreaux (Nov 28 2022 at 15:10):

Seems like the CI runner may need to be reset. Seems like it's still looking for those submodules: see mathlib4#612 and mathlib4#762

Scott Morrison (Nov 28 2022 at 15:23):

Do you know what needs to be done? Happy to have instructions here if there's something I can do.

Jireh Loreaux (Nov 28 2022 at 15:30):

No, note really. My guess: there are (now untracked) folders or files that are sitting around on the CI runner relating to these submodules that need to be deleted.

Jireh Loreaux (Nov 28 2022 at 15:31):

In case anyone wants a full resolution version of the submodule graph: algebra.module.submodule.basic.pdf

Jeremy Avigad (Nov 28 2022 at 17:57):

Does this involve the hoskinson runners? Sorry to be clueless -- if someone can explain to me what needs to be done, I'll try to do it (or ask @Wojciech Nawrocki for help).

Wojciech Nawrocki (Nov 28 2022 at 18:07):

It looks like the linked actions did finish, are any runners still stuck? Happy to reset them.

Jireh Loreaux (Nov 28 2022 at 19:20):

It took a few tries, but they eventually got there. Not sure exactly what magic made them succeed. I'll ping you if I encounter it again.

Scott Morrison (Nov 28 2022 at 19:45):

I wonder if we have a poisoned runner now, and they succeeded because on later attempts they were given to non-poisoned runners...

Wojciech Nawrocki (Nov 28 2022 at 20:19):

Hm, for example this build failed with an error as opposed to getting stuck in a loop, and runners should start every new build with a clean state. So as long as one is not stuck looping, it should be fine?

Gabriel Ebner (Nov 28 2022 at 23:42):

runners should start every new build with a clean state

This is not the case for self-hosted runners. We need to run rm -rf * manually.

Wojciech Nawrocki (Nov 28 2022 at 23:54):

Ok, this is useful info.

Scott Morrison (Nov 29 2022 at 19:04):

We're still getting transient errors https://github.com/leanprover-community/mathlib4/actions/runs/3577158206/jobs/6015832572.

Scott Morrison (Nov 29 2022 at 19:05):

So I suspect at least one of the runners is poisoned.

Scott Morrison (Nov 29 2022 at 19:05):

I'm not entirely sure at what point, or where, we need to run run -rf *, and I'm a little hesitant to experiment. :-)

Scott Morrison (Nov 29 2022 at 19:12):

@Wojciech Nawrocki, do you know where we should do this? Or could you just rebuild the hoskinson runners? We're currently unable to reliably merge mathlib4 PRs.

Jireh Loreaux (Nov 29 2022 at 19:14):

My guess is that the .git/config lists a submodule but there is no .gitmodules file and so it complains.

Jireh Loreaux (Nov 29 2022 at 19:15):

removing it there would probably be sufficient.

Scott Morrison (Nov 29 2022 at 19:16):

Okay -- but that is a file that only exists on the runner, right?

Jireh Loreaux (Nov 29 2022 at 19:16):

This would persist even between branches and updates to master because .git/config isn't tracked, so it fits the symptoms.

Jireh Loreaux (Nov 29 2022 at 19:16):

Yes, that's correct.

Scott Morrison (Nov 29 2022 at 19:16):

https://github.com/leanprover-community/mathlib4/pull/787 attempts to cleanup before starting.

Jireh Loreaux (Nov 29 2022 at 19:17):

Actually, everyone has a copy of that file, but they will have different things listed there.

Scott Morrison (Nov 29 2022 at 19:17):

It seems to do okay in CI, so I guess we can just merge it and see if it resolves the problem?

Scott Morrison (Nov 29 2022 at 19:19):

Looks like bors is accepting the rm -rf * PR. It passed lint style.

Jireh Loreaux (Nov 29 2022 at 19:20):

Are we sure rm -rf * is running the right directory? If so, that seems fine. Otherwise, it seems dangerous.

Scott Morrison (Nov 29 2022 at 19:21):

Nope, not sure at all. :-)

Jireh Loreaux (Nov 29 2022 at 19:22):

(Note: I know virtually nothing about the CI workflows)

Scott Morrison (Nov 29 2022 at 19:22):

In the worst case, these CI machines are docker images generated from a script, so we can always rebuild them from scratch.

Wojciech Nawrocki (Nov 29 2022 at 19:43):

Scott Morrison said:

Wojciech Nawrocki, do you know where we should do this? Or could you just rebuild the hoskinson runners? We're currently unable to reliably merge mathlib4 PRs.

I will rebuild all runners with the mathlib4 tag now.

Wojciech Nawrocki (Nov 29 2022 at 19:44):

Oh, there is no such tag. I guess mathlib4 tasks are put on the general bors queue?

Scott Morrison (Nov 29 2022 at 19:49):

It seems my rm -rf * step in mathlib4 CI has resolved the issue for now anyway.

Wojciech Nawrocki (Nov 29 2022 at 19:55):

Okay, I will not restart everything with bors for now because it would kill the currently running jobs. Please ping me if you see any more transient failures.

Wojciech Nawrocki (Nov 29 2022 at 19:59):

Gabriel Ebner said:

runners should start every new build with a clean state

This is not the case for self-hosted runners. We need to run rm -rf * manually.

did you mean rm -rf actions-runner/_work/mathlib4/*?

Gabriel Ebner (Nov 29 2022 at 20:06):

I mean a rm -rf * build step in the work file.

Gabriel Ebner (Nov 29 2022 at 20:06):

If you want to run it manually that's rm -rf actions-runner/_work/mathlib4, yes.

Scott Morrison (Nov 29 2022 at 20:17):

Yes, I added rm -rf * as a build step, and that seems to have worked.

Jireh Loreaux (Nov 29 2022 at 20:39):

uh... https://github.com/leanprover-community/mathlib4/actions/runs/3577730611/jobs/6017062376

Scott Morrison (Nov 29 2022 at 20:40):

:-(

Scott Morrison (Nov 29 2022 at 20:40):

@Wojciech Nawrocki, could you rebuild all the runners?

Scott Morrison (Nov 29 2022 at 20:42):

I guess rm -rf * is not actually deleting .git or .gitmodules or whatever the infected file is.

Scott Morrison (Nov 29 2022 at 20:45):

Okay, I've tried again, with find . -name . -o -prune -exec rm -rf -- {} + instead.

Scott Morrison (Nov 29 2022 at 20:46):

(Following https://unix.stackexchange.com/questions/77127/rm-rf-all-files-and-all-hidden-files-without-error)

Jireh Loreaux (Nov 29 2022 at 20:47):

IIRC the * glob doesn't pick up on things with a . prefix, like .git/config, which is probably why it failed before.

Jireh Loreaux (Nov 29 2022 at 20:47):

ah, you just linked that.

Scott Morrison (Nov 29 2022 at 20:55):

That seems to have worked. I'll tell bors to try again on everything.


Last updated: Dec 20 2023 at 11:08 UTC