Zulip Chat Archive

Stream: general

Topic: bors build failure


Kyle Miller (Oct 06 2020 at 07:13):

Bors has failed building #4444 twice now, and it seems to be a deterministic timeout in mv_polynomial. The PR touches control/traversable, but only by adding documentation and reordering a definition.

/home/runner/work/mathlib/mathlib/src/data/mv_polynomial/basic.lean:414:0: error: (deterministic) timeout

Bryan Gin-ge Chen (Oct 06 2020 at 07:16):

I guess the first thing to do is to try merging master on that branch and letting the CI run

Bryan Gin-ge Chen (Oct 06 2020 at 07:20):

Here’s the line referenced in the error: https://github.com/leanprover-community/mathlib/blob/523bddb4f0cbfdc401b3b2f943ca6eba92ad90cc/src/data/mv_polynomial/basic.lean#L414

Kyle Miller (Oct 06 2020 at 07:25):

Ok, I've merged master and pushed it. Is the CI just to see if it can still build?

Johan Commelin (Oct 06 2020 at 07:26):

I'm having the same issue with #4461

Johan Commelin (Oct 06 2020 at 07:28):

The problem is coeff_mul

Johan Commelin (Oct 06 2020 at 07:29):

It's known to be slow... I guess we need to work on it

Kyle Miller (Oct 06 2020 at 07:32):

So it's not @Aaron Anderson's docstrings that broke the build? :upside_down:

Kyle Miller (Oct 06 2020 at 07:47):

Is the deterministic timeout actually nondeterministic? (at least when it comes to building all of mathlib?) It seems sort of improbable that a commit with only documentation comments would change much...

Bryan Gin-ge Chen (Oct 06 2020 at 07:59):

Yeah, I'm not sure what deterministic really means here. For what it's worth, I ran lean --make src/data/mv_polynomial/basic.lean on the #4444 branch of mathlib and didn't get a timeout when opening the file in VS Code, even though it should be enforcing the same -T100000 limit.

According to the profiler, the tactic execution in mv_polynomial.coeff_mul is taking 19.7 seconds, so it's definitely ripe for reworking. I don't know how to interpret the more detailed output though:

elaboration: tactic execution took 19.7s
num. allocated objects:  7835
num. allocated closures: 10779
19748ms   100.0%   tactic.istep
19748ms   100.0%   _interaction
19748ms   100.0%   tactic.istep._lambda_1
19748ms   100.0%   scope_trace
19748ms   100.0%   _interaction._lambda_2
19748ms   100.0%   tactic.step
19667ms    99.6%   tactic.solve1
19525ms    98.9%   _interaction._lambda_19
19140ms    96.9%   tactic.interactive.propagate_tags
18716ms    94.8%   interaction_monad_orelse
18665ms    94.5%   _interaction._lambda_17
18615ms    94.3%   tactic.interactive.simp_core
18554ms    94.0%   tactic.interactive.simp_core_aux
18550ms    93.9%   tactic.simplify
18550ms    93.9%   tactic.try
18502ms    93.7%   tactic.interactive.simpa._lambda_6
18501ms    93.7%   tactic.interactive.simpa
17747ms    89.9%   tactic.simp_target
17747ms    89.9%   tactic.interactive.simp_core_aux._lambda_5
13118ms    66.4%   _interaction._lambda_16
13061ms    66.1%   _interaction._lambda_14
 5162ms    26.1%   _interaction._lambda_10
 1697ms     8.6%   interaction_monad.monad._lambda_9
  807ms     4.1%   tactic.interactive.simp_core_aux._lambda_4
  807ms     4.1%   tactic.interactive.simp_core_aux._lambda_2
  525ms     2.7%   rw_core
  524ms     2.7%   interactive.loc.apply
  521ms     2.6%   interaction_monad.orelse'
  492ms     2.5%   _private.3291397495.rw_goal._lambda_4
  481ms     2.4%   _private.3291397495.rw_goal._lambda_2
  415ms     2.1%   tactic.rewrite_target
  415ms     2.1%   tactic.rewrite
  414ms     2.1%   tactic.rewrite_core
  368ms     1.9%   tactic.mk_instance
  287ms     1.5%   _interaction._lambda_12
  286ms     1.4%   tactic.apply_instance
  268ms     1.4%   tactic.all_goals'
  268ms     1.4%   all_goals'_core
  268ms     1.4%   _private.3677394557.all_goals'_core._main._lambda_2
  268ms     1.4%   tactic.to_expr
  235ms     1.2%   _interaction._lambda_18
  147ms     0.7%   _interaction._lambda_7
  142ms     0.7%   _interaction._lambda_4
  139ms     0.7%   tactic.interactive.have._lambda_1
   99ms     0.5%   tactic.interactive.concat_tags._lambda_5
   98ms     0.5%   repeat_aux
   98ms     0.5%   tactic.try_core
   98ms     0.5%   tactic.repeat
   98ms     0.5%   _interaction._lambda_3
   98ms     0.5%   _private.1616170719.repeat_aux._main._lambda_1
   94ms     0.5%   _interaction._lambda_11
   93ms     0.5%   tactic.interactive.by_cases._lambda_2
   93ms     0.5%   tactic.interactive.by_cases
   84ms     0.4%   tactic.interactive.to_expr'
   83ms     0.4%   tactic.by_cases
   59ms     0.3%   tactic.mk_simp_set_core
   59ms     0.3%   tactic.mk_simp_set
   57ms     0.3%   tactic.interactive.convert
   57ms     0.3%   tactic.focus1
   54ms     0.3%   tactic.congr'
   52ms     0.3%   _interaction._lambda_13
   39ms     0.2%   _interaction._lambda_8
   39ms     0.2%   simp_lemmas.add_pexpr
   39ms     0.2%   simp_lemmas.append_pexprs
   39ms     0.2%   tactic.mk_simp_set_core._lambda_5
   36ms     0.2%   tactic.apply_eq_congr_core
   36ms     0.2%   tactic.congr_core'
   35ms     0.2%   tactic.congr'._main._lambda_3
   32ms     0.2%   tactic.mk_specialized_congr_lemma
   31ms     0.2%   rw_hyp
   30ms     0.2%   _private.859230899.rw_hyp._main._lambda_3
   23ms     0.1%   _private.859230899.rw_hyp._main._lambda_2
   23ms     0.1%   tactic.apply_core
   21ms     0.1%   tactic.rewrite_hyp
   21ms     0.1%   _interaction._lambda_5
   20ms     0.1%   tactic.seq'
   18ms     0.1%   tactic.join_user_simp_lemmas
   18ms     0.1%   simp_lemmas.mk_default
   17ms     0.1%   relation_tactic
   17ms     0.1%   _interaction._lambda_6
   16ms     0.1%   _private.1624076617.relation_tactic._lambda_1
   16ms     0.1%   tactic.exact
   15ms     0.1%   tactic.congr'._main._lambda_4
   14ms     0.1%   tactic.replace_target
   10ms     0.1%   tactic.interactive.simpa._lambda_1
   10ms     0.1%   tactic.replace_hyp
   10ms     0.1%   _private.3291397495.rw_goal._lambda_3
    9ms     0.0%   tactic.save_info_with_widgets
    8ms     0.0%   tactic.apply
    8ms     0.0%   tactic.mk_eq_mpr
    8ms     0.0%   tactic.save_widget
    7ms     0.0%   tactic.save_type_info
    6ms     0.0%   tactic.assert
    6ms     0.0%   tactic.interactive.apply._lambda_1
    5ms     0.0%   tactic.assumption._lambda_1
    5ms     0.0%   tactic.find_same_type
    5ms     0.0%   tactic.refine
    4ms     0.0%   tactic.apply_congr_core
    4ms     0.0%   tactic.mk_eq_mp
    3ms     0.0%   tactic.rintro
    3ms     0.0%   tactic.find_same_type._main._lambda_1
    3ms     0.0%   simp_lemmas.resolve_and_add
    3ms     0.0%   tactic.unify
    3ms     0.0%   tactic.assert_core
    3ms     0.0%   _private.3676539765.simp_lemmas.resolve_and_add._lambda_2
    3ms     0.0%   tactic.interactive.rintro
    3ms     0.0%   tactic.infer_type
    3ms     0.0%   tactic.rintro._lambda_3
    3ms     0.0%   simp_lemmas.add_simp
    2ms     0.0%   tactic.mk_app
    2ms     0.0%   simp_lemmas.erase
    2ms     0.0%   tactic.rcases_core._main._lambda_5
    2ms     0.0%   tactic.interactive.exact
    2ms     0.0%   tactic.rcases_core._main._lambda_9
    2ms     0.0%   tactic.rcases_core
    2ms     0.0%   tactic.mk_id_eq
    2ms     0.0%   tactic.rcases.continue
    2ms     0.0%   tactic.cases_core
    2ms     0.0%   _interaction._lambda_9
    1ms     0.0%   tactic.has_opt_auto_param._lambda_2
    1ms     0.0%   tactic.clear_goals._lambda_5
    1ms     0.0%   tactic.apply_congr_core._lambda_1
    1ms     0.0%   tactic.applyc
    1ms     0.0%   tactic.mk_const
    1ms     0.0%   tactic.has_opt_auto_param_for_apply._lambda_2
    1ms     0.0%   tactic.intro_core
    1ms     0.0%   tactic.get_eqn_lemmas_for
    1ms     0.0%   tactic.i_to_expr_for_apply
    1ms     0.0%   tactic.interactive.get_rule_eqn_lemmas
    1ms     0.0%   tactic.clear
    1ms     0.0%   format.join
    1ms     0.0%   environment.get_ext_eqn_lemmas_for
    1ms     0.0%   tactic.interactive.dsimp
    1ms     0.0%   tactic.save_info_thunk
    1ms     0.0%   tactic.dsimp_target
    1ms     0.0%   tactic.interactive.get_rule_eqn_lemmas._lambda_3
    1ms     0.0%   tactic.clear_goals
    1ms     0.0%   tactic.assertv
    1ms     0.0%   expr.get_app_fn
    1ms     0.0%   resolve_name'
    1ms     0.0%   tactic.clear'
    1ms     0.0%   expr.is_napp_of
    1ms     0.0%   list.foldl
    1ms     0.0%   tactic.mk_const._lambda_1
    1ms     0.0%   tactic.intro
    1ms     0.0%   expr.const
    1ms     0.0%   simp_lemmas.dsimplify
    1ms     0.0%   format.compose
    1ms     0.0%   expr.is_app_of
    1ms     0.0%   tactic.swap
    1ms     0.0%   tactic.instantiate_mvars
    1ms     0.0%   _interaction._lambda_15
    1ms     0.0%   tactic.assertv_core
    1ms     0.0%   tactic.congr'._main._lambda_2

Scott Morrison (Oct 06 2020 at 08:03):

I think that is saying simpa is the culprit.

Bryan Gin-ge Chen (Oct 06 2020 at 08:05):

Indeed, changing the two simpas to simpa only speeds it up to 2.64 seconds!

Johan Commelin (Oct 06 2020 at 08:07):

I almost have a PR ready

Johan Commelin (Oct 06 2020 at 08:07):

I'm restructuring it a bit more

Bryan Gin-ge Chen (Oct 06 2020 at 08:08):

Ah, OK. I was just about to submit one myself (with just the simpa only change).

Johan Commelin (Oct 06 2020 at 08:10):

#4469

Johan Commelin (Oct 06 2020 at 08:10):

It's below 2s on my box now

Scott Morrison (Oct 06 2020 at 08:21):

oops, I did it as well...

Scott Morrison (Oct 06 2020 at 08:22):

I'll close.

Bryan Gin-ge Chen (Oct 06 2020 at 08:34):

Hmm, coeff_mul just caused the most recent bors batch to fail again: https://github.com/leanprover-community/mathlib/runs/1213623252

So maybe we should see if we can try and merge that first.

Johan Commelin (Oct 06 2020 at 08:36):

Yup, we might want to cancel everything, and squeeze this in first.

Bryan Gin-ge Chen (Oct 06 2020 at 08:43):

I just added #4469 to the queue. Even if the currently running batch fails, #4469 should be in the next batch and then everything should be good, I think.

Eric Wieser (Oct 06 2020 at 09:14):

bors r+ p=10 will give it higher priority

Bryan Gin-ge Chen (Oct 06 2020 at 09:16):

I considered doing that but since #4469 won't conflict with any of the other PRs, it should be more efficient just to have it run in a batch.

Bryan Gin-ge Chen (Oct 06 2020 at 09:18):

Also, the batch including #4469 is now running, since the batch that was running failed again. I went ahead and reconsolidated those PRs along with #4444 into a new batch.

Johan Commelin (Oct 06 2020 at 09:36):

#4472 is not critical for bors, but it brings coeff_mul down into the 1.0s - 1.2s range, and shortens the proof.

Moritz Doll (Aug 09 2022 at 12:56):

@maintainers #15837 and #15836 should not be on the same batch, can someone jank one of them from the queue?

Johan Commelin (Aug 09 2022 at 12:57):

Done

Johan Commelin (Aug 09 2022 at 12:57):

I pulled #15837 off the queue

Moritz Doll (Aug 09 2022 at 12:58):

thanks

Oliver Nash (Aug 09 2022 at 12:58):

Bors handles this sort of this automatically though. It just tries again, dividing and conquering as it goes, right?

Moritz Doll (Aug 09 2022 at 13:05):

but since we know what the issue is (#15836 proves stuff with positive_cone and #15837 renames positive_cone to positive), it is easier to cancel on of them and not waste time for builds that will fail anyways.

Eric Wieser (Aug 09 2022 at 13:06):

It's a shame there's no way to tell bors to undo a divide and conquer when we know better

Johan Commelin (Aug 09 2022 at 13:22):

I wish bors priorities worked properly

Andrew Yang (Aug 09 2022 at 14:26):

By the way, I think anyone can bors cancel?
At least it work for me in #15436 where I was not delegated.

María Inés de Frutos Fernández (Aug 15 2022 at 11:02):

Could someone take a look at why #14115 failed? The check that failed was bors — Merge conflict, but I hadn't seen any merge conflicts before approving the PR.

Yaël Dillies (Aug 15 2022 at 11:04):

It's about to have a conflict with #15594.

Kevin Buzzard (Aug 15 2022 at 11:05):

That's not the answer to the question though.

Yaël Dillies (Aug 15 2022 at 11:06):

The answer is that bors detected the conflict.

Kevin Buzzard (Aug 15 2022 at 11:06):

With an open PR??

Kevin Buzzard (Aug 15 2022 at 11:07):

Oh I see! My mistake. It's also on the queue.

Eric Wieser (Aug 15 2022 at 11:16):

This question comes up over and over again; is this something we need to document, or that we need to ask bors to document better?

Kevin Buzzard (Aug 15 2022 at 11:20):

How does one fix this?

Kevin Buzzard (Aug 15 2022 at 11:20):

I mean the conflict, not the fact that it confused some people

Yaël Dillies (Aug 15 2022 at 11:21):

Wait for bors to be done with either PR, then you can merge the one that went through in the one that didn't.

Mario Carneiro (Aug 15 2022 at 11:22):

you can also head the process off by merging onto bors' staging commit

Yaël Dillies (Aug 15 2022 at 11:22):

Given that this batch already failed once, I'm not sure that's wise.

Yaël Dillies (Aug 15 2022 at 11:23):

A maintainer could check in the build logs whether #15594 was the faulty PR in that batch.

Arthur Paulino (Aug 15 2022 at 11:23):

I think this question comes up often because the way that bors operates is not widely known, especially if you're not a programmer

María Inés de Frutos Fernández (Aug 15 2022 at 11:24):

Eric Wieser said:

This question comes up over and over again; is this something we need to document, or that we need to ask bors to document better?

I think some documentation would be helpful.

Mario Carneiro (Aug 15 2022 at 11:24):

Once you have done the merge work on that staging commit, it shouldn't be too bad to merge on whatever ends up on master since that will probably also have the conflicting PR

Mario Carneiro (Aug 15 2022 at 11:24):

(when I say "merge" I mean "rebase")

Mario Carneiro (Aug 15 2022 at 11:25):

or you can just wait the 4 hours or so

Eric Wieser (Aug 15 2022 at 11:27):

Yaël Dillies said:

A maintainer could check in the build logs whether #15994 was the faulty PR in that batch.

So can a non-maintainer

Yaël Dillies (Aug 15 2022 at 11:29):

No

Yaël Dillies (Aug 15 2022 at 11:30):

Sometimes I can see the logs on the commit, but here https://github.com/leanprover-community/mathlib/commit/6eb146e873f32077111041472656aa3153f88b6f doesn't have them.

Eric Wieser (Aug 15 2022 at 11:34):

The logs are here: https://github.com/leanprover-community/mathlib/actions/workflows/bors.yml

Eric Wieser (Aug 15 2022 at 11:35):

Your page just links to the appropiate entry on that one

Yaël Dillies (Aug 15 2022 at 13:09):

Okay, #15594 was indeed the one failing. I just fixed the error (to_additive was now guessing the wrong name). Can someone put it back on the queue?

Anatole Dedecker (Aug 15 2022 at 16:26):

Now it's my turn to cause build failures with #14534

Anatole Dedecker (Aug 15 2022 at 16:27):

It is a quite old PRs so there are "implicit conflicts" e.g some rewrites no longer work

Anatole Dedecker (Aug 15 2022 at 16:27):

I'm fixing it, sorry for the inconvenience

Oliver Nash (Aug 15 2022 at 16:43):

No need at all to apologise; this sort of thing is a normal part of collaborative development.

Junyan Xu (Aug 15 2022 at 16:57):

strangely #bors is currently showing no jobs running

Junyan Xu (Aug 15 2022 at 16:58):

but at the bottom of my PR #16046 it shows bors is running

Johan Commelin (Aug 15 2022 at 18:40):

It looks like bors is still feeling ill :sad:

Joseph Myers (Aug 15 2022 at 18:43):

I don't know why it hasn't restarted automatically with a smaller batch, but the failure of the last batch shows a genuine conflict between #16011 (uses topological_group_is_uniform) and #16027 (renames topological_group_is_uniform), so one of those needs to be removed from the batch before it's restarted.

Johan Commelin (Aug 15 2022 at 18:55):

Thanks. I asked it to take #16011 of the queue

Johan Commelin (Aug 15 2022 at 18:57):

Ha!

Johan Commelin (Aug 15 2022 at 18:57):

I just kicked a PR on the queue, and bors started working again.

Johan Commelin (Aug 15 2022 at 18:58):

So it seems like it just completely lost track of all its queues :oops:

Junyan Xu (Aug 15 2022 at 19:00):

Does bors cancel vs. bors r- matter?
(Edit: apparently they're equivalent: https://bors.tech/documentation/ )

Thomas Browning (Aug 15 2022 at 19:01):

Johan Commelin said:

I just kicked a PR on the queue, and bors started working again.

But does #bors still remember the other PRs?

Johan Commelin (Aug 15 2022 at 19:08):

Nope. All the queues were gone.

Johan Commelin (Aug 15 2022 at 19:08):

I'm adding everything back now.

Johan Commelin (Aug 15 2022 at 19:09):

Doing an experiment with manually putting 10 PRs in a batch using priorities.

Johan Commelin (Aug 15 2022 at 19:15):

I think everything that had ready-to-merge should now be back on a #bors queue. Let's hope it doesn't forget the queues again.

Anatole Dedecker (Aug 15 2022 at 19:50):

#14534 is ready again, should I just bors r+ it again?

Johan Commelin (Aug 15 2022 at 19:52):

Feel free to try

Bryan Gin-ge Chen (Aug 15 2022 at 20:08):

It looks like all the queues were dropped because bors crashed (only maintainers can view this link). If it happens again it could be worth reporting.

Ruben Van de Velde (Aug 15 2022 at 21:34):

Hmm, I didn't know bors would combine PRs with different priorities
Screenshot-from-2022-08-15-23-33-33.png

Thomas Browning (Aug 15 2022 at 21:44):

If you go to the PR, I think it's because the priority was set twice

Joseph Myers (Aug 16 2022 at 00:05):

The next build failure says that #15065 should be removed from the queue until it's been updated for the has_scalar rename. (But bors seems to have carried on automatically with a different subset of the queue after that build failure.)

Bryan Gin-ge Chen (Aug 16 2022 at 00:26):

Thanks! It looks like #15065 has since been removed from the queue.

Johan Commelin (Aug 16 2022 at 05:27):

Crazy. Bors is now building a queue with prio 0, even though there are two queues with prio 4.

Johan Commelin (Aug 16 2022 at 05:28):

Thomas Browning said:

If you go to the PR, I think it's because the priority was set twice

But all queues had a unique prio ~ 8hrs ago. So there is no good reason for mixing them.

Johan Commelin (Aug 16 2022 at 05:28):

I completely don't understand how bors priorities work.

Junyan Xu (Aug 16 2022 at 05:36):

It seems that when bors bisects (due to failed build), the half that's not chosen forms a batch that goes to the end of the queue.


Last updated: Dec 20 2023 at 11:08 UTC