Zulip Chat Archive

Stream: general

Topic: slow proof

Sebastien Gouezel (Apr 17 2021 at 10:47):

I am in the process of speeding up proofs that fail when gsmul is made into data, on my branch gsmul_data. First incredible example:

def forget₂_AddCommGroup_preserves_limits_aux (F : J ⥤ Ring) :
  is_limit ((forget₂ Ring AddCommGroup).map_cone (limit_cone F)) :=
AddCommGroup.limit_cone_is_limit (F ⋙ forget₂ Ring AddCommGroup)

Takes 35s on my branch. Changing it to

def forget₂_AddCommGroup_preserves_limits_aux (F : J ⥤ Ring) :
  is_limit ((forget₂ Ring AddCommGroup).map_cone (limit_cone F)) :=
begin
  have := AddCommGroup.limit_cone_is_limit (F ⋙ forget₂ Ring AddCommGroup),
  exact this
end

goes down to 185ms. I haven't found a term-mode way to achieve this (even with (... : _) or id insertion or whatever).

Jason Rute (Apr 17 2021 at 11:11):

I have a silly question: What is the term proof you get by #print forget₂_AddCommGroup_preserves_limits_aux (tactic version), and how long does it take to run that term proof?

Scott Morrison (Apr 17 2021 at 11:22):

Wow. That's really terrible, and largely my fault. (I don't know a fix, I guess I just have too high tolerance for slow-to-compile proofs.)

Sebastien Gouezel (Apr 17 2021 at 11:22):

That's not your fault, that's Lean's fault, obviously.

Sebastien Gouezel (Apr 17 2021 at 11:23):

The question is whether there is some way to let it switch automatically to the other elaboration procedure that makes the proof essentially instant.

Sebastien Gouezel (Apr 17 2021 at 11:34):

Jason Rute said:

I have a silly question: What is the term proof you get by #print forget₂_AddCommGroup_preserves_limits_aux (tactic version), and how long does it take to run that term proof?

The proof term is the same for the tactic mode and the term mode version. Even using the pp.all output, like this:

noncomputable def yet_another_try {J : Type u} [_inst_1 : small_category J] (F : J ⥤ Ring) :
  is_limit ((forget₂ Ring AddCommGroup).map_cone (limit_cone F)) :=
@AddCommGroup.limit_cone_is_limit.{u} J _inst_1
    (@category_theory.functor.comp.{u u u u u+1 u+1} J _inst_1 Ring.{u} Ring.large_category.{u} AddCommGroup.{u}
       AddCommGroup.large_category.{u}
       F
       (@category_theory.forget₂.{u+1 u+1 u u u} Ring.{u} AddCommGroup.{u} Ring.large_category.{u}
          Ring.concrete_category.{u}
          AddCommGroup.large_category.{u}
          AddCommGroup.concrete_category.{u}
          Ring.has_forget_to_AddCommGroup.{u}))

This takes 21s to compile (so, a little bit faster than the original term mode, but still 150 times slower than the tactic mode version, that gives exactly the same proof). I don't understand how this is possible.

Kevin Buzzard (Apr 17 2021 at 11:42):

Does the @ trick work? (:= @AddCommGroup.limit_cone_is_limit _ _ ...)This changes default elaboration procedure

Sebastien Gouezel (Apr 17 2021 at 11:43):

The pp.all version I gave uses @ versions for everything, and it is super-slow.

Kevin Buzzard (Apr 17 2021 at 11:48):

This is very strange. Can I see it in action on some branch?

Sebastien Gouezel (Apr 17 2021 at 11:50):

Yes, on branch#gsmul_data, in the file algebra/category/CommRing/limits.lean, line 282. Should be the same in master, by the way (although a little bit faster, so a little bit easier to investigate).

Sebastien Gouezel (Apr 17 2021 at 12:10):

There is a motive here. Next super-slow proof in the file is

/-- Auxiliary construction for the `creates_limit` instance below. -/
def is_limit_lifted_cone (F : J ⥤ CommRing) : is_limit (lifted_cone F) :=
is_limit.of_faithful (forget₂ _ Ring.{u}) (Ring.limit_cone_is_limit _)
  (λ s, (Ring.limit_cone_is_limit _).lift ((forget₂ _ Ring.{u}).map_cone s))
  (λ s, rfl)

taking 38s seconds. Changing it to

/-- Auxiliary construction for the `creates_limit` instance below. -/
def is_limit_lifted_cone (F : J ⥤ CommRing) : is_limit (lifted_cone F) :=
begin
  let := Ring.limit_cone_is_limit (F ⋙ forget₂ CommRing Ring),
  exact is_limit.of_faithful (forget₂ _ Ring.{u}) this
    (λ s, (Ring.limit_cone_is_limit _).lift ((forget₂ _ Ring.{u}).map_cone s)) (λ s, rfl),
end

it takes 250 ms.

Sebastien Gouezel (Apr 17 2021 at 12:11):

Just understanding what is going on here and giving some good practice rules could be extremely helpful (I just found this by random trial and error!)

Oliver Nash (Apr 17 2021 at 14:00):

I wonder could we have a job that maintains a list of the slowest proofs (and defs) in Mathlib and publishes the slowest 100 somewhere, not to apply pressure to anyone, but just to give some visibility.

Floris van Doorn (Apr 17 2021 at 17:11):

I'm starting to investigate. The problem seems to be that the type of the def is not fully elaborated before the term gets elaborated.
Just adding by exact at the start of the proof speeds it up (so far I'm only investigating yet_another_try)

Floris van Doorn (Apr 17 2021 at 17:24):

Oh, but that doesn't necessarily help in other examples.

Johan Commelin (Apr 17 2021 at 17:27):

Fwiw, in https://leanprover.zulipchat.com/#narrow/stream/267928-condensed-mathematics/topic/thm95.2Ehomotopy/near/234983175 we needed by convert

Floris van Doorn (Apr 17 2021 at 17:44):

Yeah by apply or by convert work better. They have basically the same effect as what Sebastien did in the first post (by { have := foo, exact foo })

Floris van Doorn (Apr 17 2021 at 18:12):

I don't have much to report after investigating. The elaborator has to do a lot of definitional unfolding to make sense of these proofs, and presumably it starts unfolding the wrong things to try to make two types definitionally equal.

Sebastien Gouezel (Apr 17 2021 at 18:51):

I really like the by convert strategy. It really means: build this term as you like, and afterwards try to match it up with the required type. And it works more efficiently than putting ( : _) (which is supposed to do the same) or adding an id somewhere. I just used it to speed up another proof, by the way, from

  refine jacobson_bot_of_integral_localization (quotient_map P C le_rfl) quotient_map_injective
      _ _ (localization.of (submonoid.powers (p.map (quotient.mk (P.comap C))).leading_coeff))
      (localization.of _) (is_integral_localization_map_polynomial_quotient P _ pP _ _),

  refine jacobson_bot_of_integral_localization (quotient_map P C le_rfl) quotient_map_injective
      _ _ (localization.of (submonoid.powers (p.map (quotient.mk (P.comap C))).leading_coeff))
      (localization.of _)
      (by convert (is_integral_localization_map_polynomial_quotient P _ pP _ _)),

(goes down from 20s to 1s).

Sebastien Gouezel (Apr 17 2021 at 18:52):

I just notice and fix these along the way because the build fails otherwise, but I think it's great to force us to do this, because otherwise these would get unnoticed.

Sebastien Gouezel (Apr 17 2021 at 18:53):

(And I am afraid this PR will globally add a few minutes again, because all the basic structures we use all the time get more complicated).

Sebastien Gouezel (Apr 18 2021 at 16:48):

@Eric Wieser has (rightly!) raised in #7253 the concern that by convert proofs are not suitable for defs, because they introduce additional eq.mpr. On the other hand, the pattern

begin
  have := ...
  exact this
end

produces exactly the same pp.all output as ... (so in particular have does not forget the data it is given, which is surprising to me), and in the situations of interest it is much faster than just ... (ratio 1 to 100)! Its main problem is that it is too verbose compared to by convert .... A tactic speedup (or unusual_elaboration or whatever) doing this would be better, but I am completely unable to write this (which should normally be a one-liner). I tried something like

import ring_theory.subring

open tactic
setup_tactic_parser

namespace tactic
namespace interactive

/-- Try another elaboration tactic -/
meta def speedup (q : parse texpr) : tactic unit :=
`[ have := q, exact this]

end interactive
end tactic

but of course it doesn't work since have := q is not referring to parse texpr. Anyone know how to do this?

Patrick Massot (Apr 18 2021 at 16:51):

Do you have an example lemma where you want to apply this tactic?

Patrick Massot (Apr 18 2021 at 16:52):

I mean a MWE, not the snippets from this thread

Sebastien Gouezel (Apr 18 2021 at 16:59):

import algebra.category.CommRing.limits

open category_theory
open category_theory.limits

universe u

namespace SemiRing
open has_limits

variables {J : Type u} [small_category J]

set_option profiler true

noncomputable def forget₂_AddCommMon_preserves_limits_aux' (F : J ⥤ SemiRing) :
  is_limit ((forget₂ SemiRing AddCommMon).map_cone (limit_cone F)) :=
AddCommMon.limit_cone_is_limit (F ⋙ forget₂ SemiRing AddCommMon)

end SemiRing

takes 2s on my computer. With by convert or the begin have := ..., exact this end, it goes down to 100ms.

Sebastien Gouezel (Apr 18 2021 at 16:59):

(on mathlib master)

Patrick Massot (Apr 18 2021 at 17:01):

Could you try the naive

meta def speedup (q : parse texpr) : tactic unit :=
to_expr q >>= tactic.exact

Patrick Massot (Apr 18 2021 at 17:04):

I see

elaboration: tactic compilation took 2.09ms
scratch.lean:30:3
elaboration: tactic execution took 38ms
num. allocated objects:  76
num. allocated closures: 270
   38ms   100.0%   tactic.interactive.speedup
   38ms   100.0%   _interaction._lambda_2
   38ms   100.0%   tactic.step
   38ms   100.0%   _interaction
   38ms   100.0%   tactic.istep
   38ms   100.0%   tactic.istep._lambda_1
   38ms   100.0%   scope_trace
   35ms    92.1%   tactic.exact
    3ms     7.9%   tactic.to_expr

Sebastien Gouezel (Apr 18 2021 at 17:04):

Works like a charm! Thanks!

Sebastien Gouezel (Apr 18 2021 at 17:05):

Can you give us the figures you get with by exact, to compare?

Patrick Massot (Apr 18 2021 at 17:06):

I know you're busy, but one day you should take one hour to read https://leanprover-community.github.io/extras/tactic_writing.html. I'm sure you'll enjoy being able to write such a basic tactic yourself (and that one is totally covered by this simple minded tutorial).

Patrick Massot (Apr 18 2021 at 17:06):

Without any tactic I read

parsing took 0.337ms
scratch.lean:28:18
elaboration of forget₂_AddCommMon_preserves_limits_aux' took 1.53s
scratch.lean:28:18
type checking of forget₂_AddCommMon_preserves_limits_aux' took 23.3ms
scratch.lean:28:18
decl post-processing of forget₂_AddCommMon_preserves_limits_aux' took 21.2ms

Patrick Massot (Apr 18 2021 at 17:07):

Sorry, this is not quite the same trace.

Patrick Massot (Apr 18 2021 at 17:07):

With the tactic I get:

parsing took 1.09ms
scratch.lean:28:18
elaboration of forget₂_AddCommMon_preserves_limits_aux' took 48.1ms
scratch.lean:28:18
type checking of forget₂_AddCommMon_preserves_limits_aux' took 22.3ms
scratch.lean:28:18
decl post-processing of forget₂_AddCommMon_preserves_limits_aux' took 22.6ms

Patrick Massot (Apr 18 2021 at 17:08):

So I guess the comparison you are interested in is the 1.53s vs 48.1ms

Patrick Massot (Apr 18 2021 at 17:08):

But of course parsing takes away a tiny bit of this gain.

Patrick Massot (Apr 18 2021 at 17:09):

And with by exact

parsing took 1.13ms
scratch.lean:28:18
elaboration of forget₂_AddCommMon_preserves_limits_aux' took 1.37s
scratch.lean:28:18
type checking of forget₂_AddCommMon_preserves_limits_aux' took 22ms
scratch.lean:28:18
decl post-processing of forget₂_AddCommMon_preserves_limits_aux' took 20.8ms

Floris van Doorn (Apr 18 2021 at 17:10):

@Sebastien Gouezel apply q does almost the same as have := q, exact this. Does that work instead?

Patrick Massot (Apr 18 2021 at 17:11):

And yet another one, with (stuff : _):

parsing took 0.481ms
scratch.lean:28:18
elaboration of forget₂_AddCommMon_preserves_limits_aux' took 847ms
scratch.lean:28:18
type checking of forget₂_AddCommMon_preserves_limits_aux' took 20.3ms
scratch.lean:28:18
decl post-processing of forget₂_AddCommMon_preserves_limits_aux' took 20ms

Patrick Massot (Apr 18 2021 at 17:12):

And another for Floris, with by apply stuff:

parsing took 1.32ms
scratch.lean:28:18
elaboration of forget₂_AddCommMon_preserves_limits_aux' took 50.3ms
scratch.lean:28:18
type checking of forget₂_AddCommMon_preserves_limits_aux' took 21.7ms
scratch.lean:28:18
decl post-processing of forget₂_AddCommMon_preserves_limits_aux' took 21.4ms

Sebastien Gouezel (Apr 18 2021 at 17:15):

Yes, by apply is perfect, thanks (and it takes less characters to type than by speedup).

Sebastien Gouezel (Apr 18 2021 at 17:17):

If someone could explain to me what is going on under the hood, I'd be extremely happy. My impression was that a tactic proof was creating a term proof, and that at the end of the proof this term proof was checked by the kernel. So I don't get how a tactic proof can be so much faster than the pp.all term it generates, for which there is no elaboration do be done.

Patrick Massot (Apr 18 2021 at 17:19):

I know nothing about how Lean really works, but my only guesses are some kind of cache or subterm sharing is involved.

Floris van Doorn (Apr 18 2021 at 17:22):

If you give the pp.all term for the type also, then all examples (that I've tried) are really quick. The problem with yet_another_try above is that there are still implicit arguments in the type of the definition, which is being elaborated simultaneously with the term. Even though you give the term explicitly, because of all the definitional unfolding required to unify the term with the type (which still has implicit information), presumably the Lean elaborator goes into a wrong rabbit hole of unfolding the wrong thing and getting huge terms.
Note that for example making yet_another_try a lemma forces Lean to first elaborate the type before even looking at the term. That is also quick.

Sebastien Gouezel (Apr 18 2021 at 17:26):

Ah, I had not imagined it could come from the type of the definition, thanks!

Sebastien Gouezel (Apr 18 2021 at 17:28):

Unrelated question: it looks like fixing the nsmul diamond has been a huge performance hit for leanchecker (doubling its time, roughly from 30 minutes to 1 hour). Is this something we should worry about, or that should not surprise us?

Eric Wieser (Apr 18 2021 at 17:34):

Does fixing the gsmul diamond make things even worse?

Sebastien Gouezel (Apr 18 2021 at 17:36):

I don't know, as this PR hasn't reached the leanchecker step yet.

Mario Carneiro (Apr 18 2021 at 23:30):

Sebastien Gouezel said:

Unrelated question: it looks like fixing the nsmul diamond has been a huge performance hit for leanchecker (doubling its time, roughly from 30 minutes to 1 hour). Is this something we should worry about, or that should not surprise us?

I think there is a serious bug here, that I posted about on the PR about switching nat mul recursion order, but might be related to this as well. The issue that came up there is that the kernel is unfolding numerals, like int.add 23 45 = 23 + 45 will go and unfold 23 instead of the typeclass for +.

Sebastien Gouezel (Apr 19 2021 at 07:14):

The leanchecker step on #7255 fixing the gsmul diamonds is indeed taking 1h27...

Last updated: May 02 2025 at 03:31 UTC