Zulip Chat Archive

Stream: mathlib4

Topic: Bors failure investigation


Ruben Van de Velde (Feb 13 2024 at 10:00):

Seems like #9811 likely caused the failure in https://github.com/leanprover-community/mathlib4/actions/runs/7884007422/job/21512091078 ; can we move it off the queue?

Johan Commelin (Feb 13 2024 at 10:04):

Done

Yury G. Kudryashov (Feb 13 2024 at 20:45):

Seems like bors is down.

Antoine Chambert-Loir (Feb 13 2024 at 20:45):

I don't know what to do to help, but bors is failing repeatedly for reasons I can't understand, in several instances, each time in a file related to #9309.

See, for example, https://github.com/leanprover-community/mathlib4/actions/runs/7891418515/job/21535716237

Patrick Massot (Feb 13 2024 at 20:46):

There are clear errors in this run:

 Error: ./././Mathlib/RingTheory/PowerSeries/Basic.lean:854:68: error: unknown constant 'FunLike.congr_fun'
Error: ./././Mathlib/RingTheory/PowerSeries/Basic.lean:2072:42: error: unexpected token '|'; expected ')', ',' or ':'
Error: ./././Mathlib/RingTheory/PowerSeries/Basic.lean:2070:92: error: unsolved goals

Patrick Massot (Feb 13 2024 at 20:47):

However they come from interactions with other PRs in the same batch.

Yury G. Kudryashov (Feb 13 2024 at 20:47):

I wonder what will happen to the current batch

Ruben Van de Velde (Feb 13 2024 at 20:47):

It's #9309

Eric Wieser (Feb 13 2024 at 20:47):

With #bors down, presumably we can't tell what the current batch is!

Antoine Chambert-Loir (Feb 13 2024 at 20:47):

Yes, but this file (for which Maria Ines and I are responsible) worked well in our PR.

Ruben Van de Velde (Feb 13 2024 at 20:47):

Can someone take that off the queue?

Ruben Van de Velde (Feb 13 2024 at 20:48):

This is not your fault

Antoine Chambert-Loir (Feb 13 2024 at 20:48):

OK, thanks.

Ruben Van de Velde (Feb 13 2024 at 20:48):

Bors takes a bunch of PRs and tests them all together; one of the others is broken

Eric Wieser (Feb 13 2024 at 20:48):

Unfortunately, #bors is down so we can't see what the batches are!

Yury G. Kudryashov (Feb 13 2024 at 20:49):

I think that you didn't merge master for a while

Yury G. Kudryashov (Feb 13 2024 at 20:49):

Because FunLike.congr_fun is called DFunLike.congr_fun now.

Antoine Chambert-Loir (Feb 13 2024 at 20:49):

OK, so I'll merge, and we'll start again.

Yury G. Kudryashov (Feb 13 2024 at 20:49):

https://github.com/leanprover-community/mathlib4/compare/master...staging

Antoine Chambert-Loir (Feb 13 2024 at 20:50):

(Fortunately, our PR doesn't touch many files)

Ruben Van de Velde (Feb 13 2024 at 20:50):

The batch that caused this error is https://github.com/leanprover-community/mathlib4/compare/master...d3d63b1e42072cec7fc45588761af9991aaffcf3

Yury G. Kudryashov (Feb 13 2024 at 20:51):

Did we hit another limit?

Ruben Van de Velde (Feb 13 2024 at 20:51):

Oh, I feel like I've only been confusing things more in this thread. Yeah, Antoine, it is your PR and merging master will make the error show up

Antoine Chambert-Loir (Feb 13 2024 at 20:56):

Mergin' and fixin'…

Ruben Van de Velde (Feb 13 2024 at 21:08):

More annoyingly, the last batch seems to have succeeded 10 minutes ago but hasn't been merged: https://github.com/leanprover-community/mathlib4/actions/runs/7891867398/job/21539499674

Yury G. Kudryashov (Feb 13 2024 at 21:09):

bors is up but it lost connection to the batch. Can we make it rerun the same batch?

Eric Wieser (Feb 13 2024 at 21:10):

If we're lucky it will just happen

Yury G. Kudryashov (Feb 13 2024 at 21:14):

It looks like bors merged the batch.

Ruben Van de Velde (Feb 13 2024 at 21:15):

:+1:

Anne Baanen (Feb 13 2024 at 21:24):

Just kicked #10032 off the queue, that should resolve the failure in batch 1590.

Antoine Chambert-Loir (Feb 13 2024 at 21:25):

merged and fixed.

Yury G. Kudryashov (Feb 13 2024 at 21:26):

I applied your suggestion in that PR (@Xavier Roblot, I hope you don't mind).

Yury G. Kudryashov (Feb 13 2024 at 21:27):

Also, I merged master branch there.

Eric Wieser (Feb 13 2024 at 23:44):

Ruben Van de Velde said:

Seems like #9811 likely caused the failure in https://github.com/leanprover-community/mathlib4/actions/runs/7884007422/job/21512091078 ; can we move it off the queue?

Perhaps worth noting that lake exe shake now increases the chance of a batch failing due to compounding PRs

Eric Wieser (Feb 13 2024 at 23:44):

(though I suspect this was just caused by a stale master before shake was introduced)

Emilie (Shad Amethyst) (Feb 14 2024 at 01:21):

Yeah, idk how expensive that would be, but the CI could run on a rebased version before BORS tries to batch them together, so that it can alert that despite git not complaining about a merge conflict, there's still a virtual merge conflict because of some breaking change not being taken into account in the current PR state

Emilie (Shad Amethyst) (Feb 14 2024 at 01:23):

Or maybe treat breaking changes differently

Eric Wieser (Feb 14 2024 at 01:32):

That would be more expensive than the batching that already happens

Emilie (Shad Amethyst) (Feb 14 2024 at 01:35):

Could BORS otherwise bisect the batch, by only attempting to merge half of the PRs after a failure, and keeping track of the PRs that it put aside? I don't know how often the build fails as a fluke

Yury G. Kudryashov (Feb 14 2024 at 01:36):

bors does bisecting.

Xavier Roblot (Feb 14 2024 at 07:02):

Yury G. Kudryashov said:

I applied your suggestion in that PR (Xavier Roblot, I hope you don't mind).

Oh, I am sorry this PR caused so much problem. Thanks for the fix!

Ruben Van de Velde (Feb 14 2024 at 07:41):

No worries Xavier, we had a large number of PRs approved yesterday (> 50), of which several ended up failing, so bors had a lot of work to do

Yaël Dillies (Feb 14 2024 at 08:25):

Eric Wieser said:

Perhaps worth noting that lake exe shake now increases the chance of a batch failing due to compounding PRs

Yeah, I was thinking about this. Since shaking is not vital to the build, would it be worth making it a daily cron job rather than running it on every PR?

Yury G. Kudryashov (Feb 14 2024 at 08:30):

Or even weekly...

Yury G. Kudryashov (Feb 14 2024 at 08:30):

It can create a PR with the output of lake exe shake --fix

Mario Carneiro (Feb 14 2024 at 08:33):

I'd rather not apply fixes automatically, it's not that trustworthy

Mario Carneiro (Feb 14 2024 at 08:35):

I also disagree with the idea of delaying shake applications, from my experience with the first two shake PRs. Mathlib moves way too fast for this if PRs aren't applying shake fixes as they go, and preparing the PR is very computationally intensive (because making the PR actually compile involves tweaks to files and then receiving errors 1000 files later)

Mario Carneiro (Feb 14 2024 at 08:39):

This was the reason I pushed for it to be a CI step in the first place, if you let it slide then it becomes almost impossible to keep up

Yury G. Kudryashov (Feb 14 2024 at 08:43):

It's hard to tell without making an experiment. I expect that shaking the tree after a day or a week of merged PRs is much easier than the initial run, and most of the time lake exe shake --fix will create a PR that will compile.

Yury G. Kudryashov (Feb 14 2024 at 08:43):

OTOH, I'm not ready to be the person responsible for fixing it if it fails.

Yury G. Kudryashov (Feb 14 2024 at 08:44):

And it will slide unless we have a designated person or we force it on every PR.

Mario Carneiro (Feb 14 2024 at 08:45):

The problem is that a lot of the time the shake PR will have conflicts, and by the time you've fixed the conflicts there are more conflicts

Mario Carneiro (Feb 14 2024 at 08:45):

Yury G. Kudryashov said:

And it will slide unless we have a designated person or we force it on every PR.

which is why we force it on every PR

Mario Carneiro (Feb 14 2024 at 08:46):

I don't see a reason it is any worse than any other kind of lint here, it's not even a particularly global analysis and AFAIK it's rare for two PRs to cause a shake error where neither one individually has one

Antoine Chambert-Loir (Feb 14 2024 at 08:46):

I wake up and see that it sufficed! Thank y'all and sorry for the mess.

Mario Carneiro (Feb 14 2024 at 08:48):

Yury G. Kudryashov said:

I expect that shaking the tree after a day or a week of merged PRs is much easier than the initial run,

It's a smaller diff, but just as much work to prepare

Mario Carneiro (Feb 14 2024 at 08:50):

shake results on individual PRs are generally just one file or a few, this is where shake has the highest success rate. When there are a few dozen PRs mashed together the results contain many files and the suggestions overlap, and these cause shake results to be lower quality and/or cause errors

Anne Baanen (Feb 14 2024 at 13:46):

Should we rename and/or merge the two active bors threads by the way? Maybe this can become "Bors failure investigation" and the other can become "Optimizing bors batch size".

Notification Bot (Feb 14 2024 at 13:51):

19 messages were moved here from #mathlib4 > bors failing by Anne Baanen.


Last updated: May 02 2025 at 03:31 UTC