Zulip Chat Archive

Stream: lean4

Topic: Performance issues with big case split


Jannis Limperg (Apr 08 2022 at 10:12):

A colleague who wanted to try Lean gave me this test case: automaton.lean (the file is too big for a Zulip message). It defines an automaton as a set of states and a transition function, then tries to prove a certain lemma. This proof involves three nested case splits with 35 possibilities each, so 35³ cases in total. Each case is proven by a trivial rfl.

Coq compiles the file (automaton.v) in 16s or so on my machine. Lean 4 (latest nightly) still wasn't finished after 1h. According to my mental model of Lean 4, it should be fast in this case, so I'd like to understand why it isn't.

Henrik Böving (Apr 08 2022 at 10:37):

It seems to still deal fine with cases x <;> cases y <;> sorry but if you add the additional split on z stuff starts blowing up massively, my system almost OOMed while trying to compute that.

Henrik Böving (Apr 08 2022 at 11:19):

I also have absolutely 0 knowledge in benching Lean but if I'm wrong on this someone will correct me sooner or later and I'll have learned something^^

From running perf mem record it seems like Level.collectMVars is storing tons of meta variables (which makes sense considering all of the 35^3 goals are mvars) away which is consuming a ton of memory and also among the parts that is consuming the most CPU time together with (of course) the garbage collection. So my bet is that it's related to this^^

Leonardo de Moura (Apr 08 2022 at 11:34):

@Jannis Limperg Do you have a generator for this file? A script that given the number of states, returns the .lean file.
When fixing the performance bottleneck, it would be great to have a smaller n < 35 that we can solve in a few seconds.

Jannis Limperg (Apr 08 2022 at 12:47):

This is a real proof they wanted to do, so I don't have a generator, but I can write one.

Leonardo de Moura (Apr 08 2022 at 12:51):

Jannis Limperg said:

This is a real proof they wanted to do, so I don't have a generator, but I can write one.

That would be very useful. Thanks.

Patrick Johnson (Apr 08 2022 at 13:43):

A simple generator written in Haskell:

import Data.List

states_num = 10
out_file = "test.lean"

main = writeFile out_file $ init $ unlines mk_lines
states = (\n -> 's' : show n) <$> [0 .. states_num - 1]
mk_lines =
  [ ("inductive States"++) $ states >>= \s -> " | " ++ s,
  "open States",
  "def f : States → States → States",
  tail $ states >>= \s1 -> states >>= \s2 -> concat ["\n| ", s1, ", ", s2, " => s0"],
  "set_option maxHeartbeats 0",
  "example : ∀ x y z, f (f (f s0 x) y) z = f (f x z) (f y z) := by",
  " intros x y z",
  " cases x <;> cases y <;> cases z <;> rfl" ]

Jannis Limperg (Apr 08 2022 at 18:11):

Thank you very much Patrick! For me, the test files scale like this:

states | time (s)
7      | 1.6
8      | 3.1
9      | 5.5
10     | 9.2
11     | 15.1
12     | 23.6

Jannis Limperg (Apr 08 2022 at 18:13):

For the benefit of people without a GHC handy, the Lean file for 8 states:

inductive States | s0 | s1 | s2 | s3 | s4 | s5 | s6 | s7
open States
def f : States  States  States
| s0, s0 => s0
| s0, s1 => s0
| s0, s2 => s0
| s0, s3 => s0
| s0, s4 => s0
| s0, s5 => s0
| s0, s6 => s0
| s0, s7 => s0
| s1, s0 => s0
| s1, s1 => s0
| s1, s2 => s0
| s1, s3 => s0
| s1, s4 => s0
| s1, s5 => s0
| s1, s6 => s0
| s1, s7 => s0
| s2, s0 => s0
| s2, s1 => s0
| s2, s2 => s0
| s2, s3 => s0
| s2, s4 => s0
| s2, s5 => s0
| s2, s6 => s0
| s2, s7 => s0
| s3, s0 => s0
| s3, s1 => s0
| s3, s2 => s0
| s3, s3 => s0
| s3, s4 => s0
| s3, s5 => s0
| s3, s6 => s0
| s3, s7 => s0
| s4, s0 => s0
| s4, s1 => s0
| s4, s2 => s0
| s4, s3 => s0
| s4, s4 => s0
| s4, s5 => s0
| s4, s6 => s0
| s4, s7 => s0
| s5, s0 => s0
| s5, s1 => s0
| s5, s2 => s0
| s5, s3 => s0
| s5, s4 => s0
| s5, s5 => s0
| s5, s6 => s0
| s5, s7 => s0
| s6, s0 => s0
| s6, s1 => s0
| s6, s2 => s0
| s6, s3 => s0
| s6, s4 => s0
| s6, s5 => s0
| s6, s6 => s0
| s6, s7 => s0
| s7, s0 => s0
| s7, s1 => s0
| s7, s2 => s0
| s7, s3 => s0
| s7, s4 => s0
| s7, s5 => s0
| s7, s6 => s0
| s7, s7 => s0
set_option maxHeartbeats 0
example :  x y z, f (f (f s0 x) y) z = f (f x z) (f y z) := by
 intros x y z
 cases x <;> cases y <;> cases z <;> rfl

Leonardo de Moura (Apr 09 2022 at 19:33):

@Jannis Limperg Thanks for posting the Lean file. It was quite handy for me since I don't have GHC installed on my machine.
I pushed some perf improvements. The 8 states file was taking 2.0s on my machine, and it now takes around 0.3s.
Could you please post the file for 12 states?

Henrik Böving (Apr 09 2022 at 19:35):

test.lean @Leonardo de Moura

Leonardo de Moura (Apr 09 2022 at 19:35):

Thanks!

Henrik Böving (Apr 09 2022 at 19:48):

Leonardo de Moura said:

Jannis Limperg Thanks for posting the Lean file. It was quite handy for me since I don't have GHC installed on my machine.
I pushed some perf improvements. The 8 states file was taking 2.0s on my machine, and it now takes around 0.3s.
Could you please post the file for 12 states?

Out of curiosity, I checked the last 3 perf commits to see what you did this one being the biggest performance increase https://github.com/leanprover/lean4/commit/7d99f6f555fd80a0dece420be9618f6d9b83f071.

And if I'm understanding it correctly there is basically a isClassQuick that says for sure if its not a class and undefined if its not sure which was acting overeager, causing too much calls to the expensive variant to happen which does unfoldings right? How did you figure that this was actually at fault for issues with state8.lean? Or was that just luck?

Leonardo de Moura (Apr 09 2022 at 19:55):

@Henrik Böving

causing too much calls to the expensive variant to happen which does unfoldings right?

Correct.

How did you figure that this was actually at fault for issues with state8.lean?

I used

perf record --call-graph dwarf stage1/bin/lean state8.lean
hotspot

perf is amazing. It is the most precise profiler I have ever used. Other profilers would often put me in the "wrong direction".

Leonardo de Moura (Apr 09 2022 at 19:57):

I took a quick look at state12.lean, and as far as I can tell we need to improve the kernel reduction engine to substantially reduce the runtime now. I created an issue for this https://github.com/leanprover/lean4/issues/1101.
We want to do it anyway to improve the "proof by reflection" support.

Patrick Johnson (Apr 09 2022 at 20:27):

Online test case generator: https://user7230724.github.io/lean/lean4-perf-test-gen.htm

Henrik Böving (Apr 09 2022 at 20:30):

I was about to suggest that as a joke when Leo asked for the next one xd

Leonardo de Moura (Apr 09 2022 at 21:52):

If you are willing to trust the Lean compiler and interpreter, then you can check the 35 states instance in approx. 1s (on my machine) using native_decide.
https://github.com/leanprover/lean4/blob/1db11ed5c750c7f5e782a31ad12e202d12e41832/tests/lean/run/states35.lean
This is may be a reasonable compromise until we have a better reduction engine in the kernel.


Last updated: Dec 20 2023 at 11:08 UTC