Zulip Chat Archive
Stream: lean4
Topic: time to execute `lake exe cache`
Scott Morrison (Jul 27 2023 at 23:52):
I'm not sure if this is a regression, a problem on my computer, or just how it's always been, but could I ask how long
time lake exe cache
reports on Mathlib4? (I'm interested in the second run of this command, to make sure no compilation is happening.) (This is calling cache
with no commands, so it shouldn't be doing anything except starting up and printing a message.)
Scott Morrison (Jul 27 2023 at 23:53):
I am seeing 3.2s
, which feels both too long, and slower than I remember previously.
Scott Morrison (Jul 27 2023 at 23:53):
I've jumped back through Mathlib commits over the last two weeks and don't see any difference.
Scott Morrison (Jul 28 2023 at 00:05):
I guess this is just how long it takes to hash all the file contents.
Gabriel Ebner (Jul 28 2023 at 00:09):
(deleted)
Gabriel Ebner (Jul 28 2023 at 00:11):
2.35s on my laptop, so yeah that's pretty slow.
Scott Morrison (Jul 28 2023 at 00:23):
What about lake exe cache get
(again, on a second run?)
Today I see:
$ time lake exe cache get
No files to download -- takes about 90 seconds before printing this line
Decompressing 3604 file(s). -- takes about 80 seconds before printing this line
unpacked in 5502 ms -- as claimed, takes about 5 seconds
lake exe cache get 9.64s user 62.29s system 40% cpu 2:55.85 total
which is completely unusable.
Gabriel Ebner (Jul 28 2023 at 00:25):
2.96s for me.
Scott Morrison (Jul 28 2023 at 00:29):
Indeed, my laptop is taking about 5s, so something on this computer is badly borked.
Mario Carneiro (Jul 28 2023 at 00:45):
@Scott Morrison I would be interested if you could get performance tests for https://github.com/digama0/lean-cache
Gabriel Ebner (Jul 28 2023 at 00:48):
I've jumped back through Mathlib commits over the last two weeks and don't see any difference.
According to perf, 40% of the runtime is spent in this line:
let contents := contents.replace "\r\n" "\n"
I think this is a recent change.
Scott Morrison (Jul 28 2023 at 01:13):
Deleting ~/.cache/mathlib
has restored normal performance.
Scott Morrison (Jul 28 2023 at 01:14):
Could something in cache
be traversing everything in the cache directory, even if not relevant?
Mario Carneiro (Jul 28 2023 at 01:14):
what OS are you on?
Mario Carneiro (Jul 28 2023 at 01:15):
and what filesystem
Scott Morrison (Jul 28 2023 at 01:15):
macos big sur (yes, I realise that is old).
Scott Morrison (Jul 28 2023 at 01:15):
APFS
Scott Morrison (Jul 28 2023 at 01:15):
I've just run smartmontools
, and it says that my SSD is perfectly fine.
Mario Carneiro (Jul 28 2023 at 01:16):
I don't think lake exe cache get
should do any directory traversal
Scott Morrison (Jul 28 2023 at 01:18):
It really seems that deleting ~/.cache/mathlib
was the solution. I now see
% time lake exe cache get
No files to download
Decompressing 3605 file(s)
unpacked in 295 ms
lake exe cache get 3.93s user 7.99s system 257% cpu 4.634 total
Scott Morrison (Jul 28 2023 at 01:19):
I should have just moved the cache to another directory so I could test putting it back. :-(
Mario Carneiro (Jul 28 2023 at 01:19):
I stand corrected, lake exe cache get
uses a directory traversal to filter the needed files by the existing files
Mario Carneiro (Jul 28 2023 at 01:19):
that seems really unnecessary
Scott Morrison (Jul 28 2023 at 01:21):
Ahha! Shall we bother fixing that, or is the timeline for lean-cache
short enough? :-)
Scott Morrison (Jul 28 2023 at 01:21):
(I'm getting it running now.)
Mario Carneiro (Jul 28 2023 at 01:23):
The status of lean-cache
is that the core implementation is done but distribution still needs to be worked out, as well as how to approach lake_ext
and possible integration into lake
.
Scott Morrison (Jul 28 2023 at 01:45):
@Mario Carneiro, #6197 changes the current cache
to avoid traversing the CACHEDIR
(and building an expensive RBSet out of the file names!)
Mac Malone (Jul 28 2023 at 09:23):
Gabriel Ebner said:
According to perf, 40% of the runtime is spent in this line:
let contents := contents.replace "\r\n" "\n"
I think this is a recent change.
If you are talking about this line in Lake, that comes from lake#170 (a May/June change), which was an attempt (by Sebastian) to make .lean
file hashes OS independent for mathlib.
Mario Carneiro (Jul 28 2023 at 09:38):
based on the variable name I think it is a reference to https://github.com/leanprover-community/mathlib4/blob/c84172212bad5a41d3f0a6cee27c049e0484c04a/Cache/Hashing.lean#L55, not lake
Sebastian Ullrich (Jul 28 2023 at 10:08):
Which, to be fair, is also by me, for the same reasons :big_smile:
Mario Carneiro (Jul 28 2023 at 10:17):
I think the semantics of this operation is good, but maybe there are issues in the implementation of replace
if it's that expensive?
Jannis Limperg (Jul 28 2023 at 10:21):
If this is a packed UTF8 or UTF16 string, replace
is necessarily expensive. Perhaps the hash
function could be changed to normalise line endings instead?
Mario Carneiro (Jul 28 2023 at 10:22):
not easily, it is written in C and you can't really make the necessary changes from lean without allocating a new string
Mario Carneiro (Jul 28 2023 at 10:23):
you could write the replace
in C as well, of course
Jannis Limperg (Jul 28 2023 at 10:26):
Ah, unfortunate.
Sebastian Ullrich (Jul 28 2023 at 10:26):
The issue is probably that replace
(or rather substrEq
) does not use get'
yet https://github.com/leanprover/lean4/commit/20eeb4202f1aa95a789c85d5822700503c80d007. So I'd suggest writing a custom specialization of replace
Mario Carneiro (Jul 28 2023 at 10:38):
SIMD memchr when? :sunglasses:
Henrik Böving (Jul 28 2023 at 14:42):
After zenbleed is patched!
Mario Carneiro (Jul 29 2023 at 02:52):
Last updated: Dec 20 2023 at 11:08 UTC