Zulip Chat Archive
Stream: lean4
Topic: curl version on Ubuntu 20.04
Tomas Skrivan (Mar 23 2023 at 16:11):
Today I wanted to get cached build of mathlib4 and got struck on curl
version. On Ubuntu 20.04, the version of curl is 7.68 but lean exe cache
needs 7.69 and above.
Googling for "update curl ubuntu" gives you this guide which downloads sources, complies and install them. Other guides on how to update curl compile the code too! This is not beginner friendly at all.
Talking briefly to Mario about this. Apparently this is because new version of curl supports parallelization. I would suggest to fallback to old version of curl. Non parallelized curl would be still faster then compiling mathlib.
I think Ubuntu 20.04 is relatively new and popular, so fixing this would be nice.
Sebastian Ullrich (Mar 23 2023 at 16:15):
AFAIR it is extremely slow without parallelization, to the point where I wondered whether building from scratch wouldn't be faster
Yaël Dillies (Mar 23 2023 at 16:15):
Relevant discussion: https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/.60lake.20exe.20cache.60.20curl.20version
Shreyas Srinivas (Mar 23 2023 at 19:07):
Sebastian Ullrich said:
AFAIR it is extremely slow without parallelization, to the point where I wondered whether building from scratch wouldn't be faster
Is it possible to parallelize inside Lean rather than letting curl do its parallelization? Maybe this helps overcome or work around the issues with Curl's parallelization in version 7.68 and below?
Arthur Paulino (Mar 23 2023 at 19:13):
Only experimentation can tell... I wouldn't know beforehand
Eric Wieser (Mar 23 2023 at 19:27):
I would imagine that curl will do a much better job of parallelism than what we can do in Lean
Mario Carneiro (Mar 23 2023 at 19:27):
yes, but only if the flag actually exists!
Mario Carneiro (Mar 23 2023 at 19:28):
the issue here is that many people don't have a curl that supports the option we want to use
Mario Carneiro (Mar 23 2023 at 19:29):
If we have a reliable way to get new curl installed on machines that's fine, but we clearly don't want end users to have to install curl from source
Eric Wieser (Mar 23 2023 at 19:33):
Kevin Buzzard said:
https://www.linuxcapable.com/how-to-install-upgrade-curl-on-ubuntu-20-04-lts/ fixed the problem for me, although I'm now running curl as modified by some random person on the internet
was one option
Kevin Buzzard (Mar 23 2023 at 21:34):
Clicking on the "upgrade to 22.04?" button is another one
Arthur Paulino (Mar 23 2023 at 21:39):
University computers are a concern though, since students don't have SU privileges
Shreyas Srinivas (Mar 23 2023 at 22:02):
Not to mention, upgrading a distro can break a lot of other things
Shreyas Srinivas (Mar 23 2023 at 22:03):
Also, it doesn't help all the Ubuntu derived distributions
Sebastian Ullrich (Mar 24 2023 at 07:42):
I think we could do worse than providing a statically linked recent version of curl per platform and download that (for which older curl is sufficient).
$ nix build github:NixOS/nixpkgs/nixos-unstable#pkgsStatic.curl
$ file ./result-bin/bin/curl
./result-bin/bin/curl: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=ee11a4cbde03a128ffa2c03b810d7c9d287f0506, not stripped
$ ./result-bin/bin/curl --version
curl 7.88.1 (x86_64-unknown-linux-musl) libcurl/7.88.1 OpenSSL/3.0.8 zlib/1.2.13 zstd/1.5.4 libidn2/2.3.2 libssh2/1.10.0 nghttp2/1.51.0
Release-Date: 2023-02-20
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS HSTS HTTP2 HTTPS-proxy IDN IPv6 Largefile libz NTLM NTLM_WB SSL threadsafe TLS-SRP UnixSockets zstd
Sebastian Ullrich (Mar 25 2023 at 12:07):
If your Linux curl is too old, please give https://github.com/leanprover-community/mathlib4/pull/3097 a try
Scott Morrison (Mar 28 2023 at 10:19):
Arthur Paulino said:
Only experimentation can tell... I wouldn't know beforehand
Using IO.asTask
to parallelize in Lean, rather than in curl, takes:
% time lake exe cache get!
Attempting to download 1876 file(s)
Decompressing 1875 file(s)
lake exe cache get! 57.12s user 97.32s system 85% cpu 3:00.54 total
which for me is barely slower than the current implementation using --parallel
:
% time lake exe cache get!
Attempting to download 1876 file(s)
Decompressing 1876 file(s)
lake exe cache get! 28.48s user 48.23s system 47% cpu 2:41.21 total
Scott Morrison (Mar 28 2023 at 10:26):
I can make a PR with my code as a fallback if we haven't obtained a suitable curl, but I'd want someone with more curl-fu to check my flags.
Sebastian Ullrich (Mar 28 2023 at 10:29):
I think we've effectively solved the problem, at least on Linux. I'm surprised though it's so slow for you, is it simply your downstream?
Scott Morrison (Mar 28 2023 at 10:38):
I had to upgrade curl on my macos system to satisfy cache
, so I think more general solutions are still worthwhile.
Scott Morrison (Mar 28 2023 at 10:38):
I'm not sure why it is so slow, but it is certainly painful. :-) Not sure whether to blame azure in australia, my ISP, or something else.
Scott Morrison (Mar 28 2023 at 10:47):
!4#3146, no problem if it is discarded.
Arthur Paulino (Mar 28 2023 at 12:42):
What about reliability? Users might end up in cycles of cache get!
if downloads fail too often
Scott Morrison (Mar 28 2023 at 20:43):
So far in testing I don't seem to have had any failures. What could be done to improve reliability? Why would we expect it to be worse with Task parallelization rather than curl parallelization?
Arthur Paulino (Mar 28 2023 at 20:46):
Unfortunately I don't know curl
well enough to give satisfying answers. I can only raise concerns
Henrik Böving (Mar 28 2023 at 20:52):
Scott Morrison said:
So far in testing I don't seem to have had any failures. What could be done to improve reliability? Why would we expect it to be worse with Task parallelization rather than curl parallelization?
So, just from a protocol POV HTTP 2 is actually fully capable of having multiple streams encoded in the same TCP connection which I would expect curl abuses for task parallelization. If you instead have multiple curl tasks they have to open seperate TCP connections for seperate HTTP 2 connections and each of those connections need to carry their meta data with them. Now, the meta data should (hopefully) not be too significant but I would expect that there is a measurable difference in multiple curls vs 1 curl doing multi tasking on its own assuming an HTTP 2 connection.
Arthur Paulino (Mar 28 2023 at 21:23):
Right, the difference in time was noticeable. But we're wondering if parallel tasks would be any riskier in terms of failing to download (aside from the fact that a longer download is intrinsically riskier in terms of failure)
Henrik Böving (Mar 28 2023 at 21:27):
Its all TCP based anyways so there is no reason to fear failures from a protocol POV again. The only reason I could see for your downloads to fail would be if Azure would decide to rate limit you because it feels like you having so many connections open might count as a DoS attack but I would hope that rate limiting on azure blobs is configurable to a point where this is not the case
Eric Wieser (Mar 29 2023 at 02:08):
This presumes that old curl doesn't have other bugs that cause a download to fail; given the problem in the first place was that --parallel
had a bug, this may or may not be a safe assumption.
Gabriel Ebner (Mar 29 2023 at 02:28):
I think Sebastian noticed that the Azure blob storage doesn't do HTTP/2. I think he got a nice speedup when trying out HTTP/2 with the CDN.
Gabriel Ebner (Mar 29 2023 at 02:29):
Here's the URL in case somebody wants to make a PR: https://lakecache-gjhdfybgcpf9dpfr.z01.azurefd.net/mathlib4/f/18296616905743833141.tar.gz
Scott Morrison (Mar 29 2023 at 03:41):
I tried the CDN earlier today and didn't see any speed-up. :-(
Scott Morrison (Mar 29 2023 at 03:42):
In fact, it was slightly slower. (For me a cache get!
takes about three minutes.)
Gabriel Ebner (Mar 29 2023 at 03:45):
I should've put the cache in Australia instead of Ohio. :rofl:
Gabriel Ebner (Mar 29 2023 at 03:46):
Was it just slower the first time though? Did it get faster if you tried cache get!
a second time?
Gabriel Ebner (Mar 29 2023 at 03:46):
That was my impression with the mathlib3 cache at least.
Sebastian Ullrich (Mar 29 2023 at 08:10):
Henrik Böving said:
Its all TCP based anyways
Up until Azure CDN support HTTP/3 :) ... but yes, I was surprised to find that protocol overhead apparently is negligible (when querying actually existent files) (and --parallel
still seems to help even when using HTTP/2). It's all about the raw speed of the connection on both ends, and everywhere in between
Scott Morrison (Mar 29 2023 at 09:58):
Unfortunately the CDN is still just as slow as the original, for me, even after repeat invocations.
Scott Morrison (Mar 29 2023 at 09:59):
How long is lake exe cache get!
taking for others?
Mario Carneiro (Mar 29 2023 at 10:21):
$ time lake exe cache get!
Attempting to download 1895 file(s)
Decompressing 1895 file(s)
________________________________________________________
Executed in 178.09 secs fish external
usr time 20.99 secs 0.00 micros 20.99 secs
sys time 19.61 secs 374.00 micros 19.60 secs
Mario Carneiro (Mar 29 2023 at 10:22):
that is for downloading today's mathlib master
Mario Carneiro (Mar 29 2023 at 10:26):
run no. 2 is indeed a bit faster:
Executed in 133.16 secs fish external
usr time 20.35 secs 609.00 micros 20.35 secs
sys time 17.46 secs 0.00 micros 17.46 secs
Mauricio Collares (Mar 29 2023 at 12:38):
Would it be crazy to use a protocol designed for synchronization (e.g., rsync or whatever azcopy sync
uses)?
Henrik Böving (Mar 29 2023 at 13:22):
Does windows have rsync type of stuff nowadays?
Gabriel Ebner (Mar 29 2023 at 17:17):
Oh, wow. I didn't realize it was that slow for other people. (It would be great though if we had numbers for just the downloading part instead of downloading+unpacking combined.)
$ time lake exe cache get!
Attempting to download 1898 file(s)
Decompressing 1898 file(s)
________________________________________________________
Executed in 13.72 secs fish external
usr time 10.96 secs 0.00 micros 10.96 secs
sys time 24.94 secs 351.00 micros 24.94 secs
Gabriel Ebner (Mar 29 2023 at 17:19):
Would it be crazy to use a protocol designed for synchronization (e.g., rsync or whatever
azcopy sync
uses)?
I don't think rsync
works that well if you have a huge list of small files you want to download (instead of a whole directory). And I believe azcopy sync
uses the same HTTP endpoints that we access via curl
.
Gabriel Ebner (Mar 29 2023 at 17:25):
I'd also caution you not to expect any miracles. All olean caches for mathlib combined are 445 megabytes right now (which is what you test with lake exe cache get!
). Divide this by your connection speed, and you get a lower bound on how quickly you can get them.
Gabriel Ebner (Mar 29 2023 at 17:40):
Doing this back-of-the-envelope calculation, all the results posted here give connection speeds between 30 and 500 mbit/s. Which is perfectly reasonable, and I don't think we can improve it much using only protocol changes. The unfortunate truth is that Lean 4 oleans are an order of magnitude more voluminous than their Lean 3 counterparts.
Ruben Van de Velde (Mar 29 2023 at 17:44):
I've heard this a few times (oleans getting much bigger) - is that considered an issue that might see improvements in the future or should we assume it's going to remain the case?
Ruben Van de Velde (Mar 29 2023 at 17:44):
Also, are they amenable to compression, either individually or as a group?
Floris van Doorn (Mar 29 2023 at 17:45):
Gabriel Ebner said:
Doing this back-of-the-envelope calculation, all the results posted here give connection speeds between 30 and 500 mbit/s. Which is perfectly reasonable, and I don't think we can improve it much using only protocol changes. The unfortunate truth is that Lean 4 oleans are an order of magnitude more voluminous than their Lean 3 counterparts.
This is a really big problem when teaching, giving tutorials, or any other time that you ask new users to install Lean 4 and mathlib4.
Gabriel Ebner (Mar 29 2023 at 17:48):
Also, are they amenable to compression, either individually or as a group?
We're already compressing them, they'd be 1.4 gigabytes otherwise.
Eric Wieser (Mar 29 2023 at 18:00):
This is a really big problem when teaching, giving tutorials, or any other time that you ask new users to install Lean 4 and mathlib4.
I can imagine this is expecially bad if a room full of 25 people, with mediocre wifi, all need to download 500mb each at the start of a tutorial
Eric Wieser (Mar 29 2023 at 18:01):
Perhaps this is an argument for more heavily leaning on vscode remote development for teaching, be it via code-spaces, gitpod, or remote access to university servers
Scott Morrison (Mar 29 2023 at 21:44):
Maybe I'll have to write a script the pre-emptively checks out modified branches from github and pulls the oleans, that runs while I'm getting my breakfast. :-)
Scott Morrison (Mar 29 2023 at 21:44):
(I realise this is probably a bad idea, misusing our azure resources.)
Gabriel Ebner (Mar 29 2023 at 22:28):
You have my official permission to do that. :smile:. Just don't do get!
. Your productivity is worth more than our Azure bills. Besides, you're not downloading anything twice. The branches probably share a lot of common Orleans.
Sebastian Ullrich (Mar 30 2023 at 07:32):
How much of a concern is storing and re-downloading identical but differently named olean tars yet? I'm assuming not all that much at this point because people most work on mathlib4 leaves, invalidating few existing modules, and we do not update Lean or other dependencies that often either. But as soon as either of these changes, I believe we should take a serious look at moving towards a content-addressed storage model to potentially save massive amounts of bandwidth. Which could be as simple as replacing the current <input-hash>.tar.gz
with a symlink-like text file <input-hash>
that contains <output-hash.tar.gz>
.
Sebastian Ullrich (Mar 30 2023 at 07:51):
Just to state the maybe-obvious, no work has been done on optimizing the size of Lean 4 .oleans yet, or analyses on the size difference to Lean 3 .oleans. Similarly, we might not be doing a good job yet at avoiding to perturb a .olean's contents on upstream changes, which would affect the efficiency of such a content address model.
Reid Barton (Mar 30 2023 at 07:56):
The Lean 4 oleans are basically mmapped to build the in-memory environment, right?
So if we want to keep doing that, there would be some constraints on the format/size?
Sebastian Ullrich (Mar 30 2023 at 08:04):
Yes, the overhead of that is one of the unclear parts. But in theory, the cache could use a completely different representation that we transform from after unpacking, so it would only be disk space and unpacking overhead.
Sebastian Ullrich (Apr 01 2023 at 16:59):
Sebastian Ullrich said:
Just to state the maybe-obvious, no work has been done on optimizing the size of Lean 4 .oleans yet, or analyses on the size
Now we know a little bit more
array 526.661 ( 1%) 42.704.504B ( 3%)
string 1.148.716 ( 3%) 53.699.609B ( 4%)
bignum 1 ( 0%) 40B ( 0%)
ctor 35.000.295 ( 95%) 1.191.012.032B ( 92%)
tag 0 3.962.740 ( 10%) 128.823.376B ( 10%)
tag 1 2.658.974 ( 7%) 76.023.200B ( 5%)
tag 2 943.200 ( 2%) 28.268.920B ( 2%)
tag 3 237.476 ( 0%) 7.211.992B ( 0%)
tag 4 1.019.304 ( 2%) 31.936.928B ( 2%)
tag 5 20.710.716 ( 56%) 662.427.408B ( 51%)
tag 6 2.717.678 ( 7%) 125.459.136B ( 9%)
tag 7 1.964.034 ( 5%) 91.803.944B ( 7%)
tag 8 604.649 ( 1%) 33.537.784B ( 2%)
tag 9 17.037 ( 0%) 408.888B ( 0%)
tag 10 81.597 ( 0%) 2.823.552B ( 0%)
tag 11 77.122 ( 0%) 2.182.864B ( 0%)
tag 12 5.768 ( 0%) 104.040B ( 0%)
total 36.675.673 ( 100%) 1.287.416.185B ( 100%)
shared pointers 39.796.971 ( 52%)
lake exe oleanparser ~/lean/mathlib4/build/**/*.olean --stats 46.29s user 0.62s system 99% cpu 47.127 total
No low-hanging fruits like "we included 10MB of strings or empty arrays" unfortunately. For constructors, the vast majority of objects and bytes, there is no direct type information available, but given that Expr.app
has constructor index 5, it's fair to guess that these are mostly Expr
objects, which of course is information we cannot directly avoid storing in some form usually.
Sebastian Ullrich (Apr 01 2023 at 17:03):
Not that I'm saying that the .olean format is particularly efficient for storing Expr.app
s: we have 8 bytes of object header followed by two 8 byte pointers followed by another 8 bytes of Expr.Data
cache. In principle, this information could be packed much more tightly for transporting (which would not even have to be part of core).
Reid Barton (Apr 01 2023 at 17:04):
Compression achieves some of that space savings automatically, right?
Reid Barton (Apr 01 2023 at 17:05):
I assume there should be quite a lot of sharing among Exprs?
Sebastian Ullrich (Apr 01 2023 at 17:08):
We know compression helps quite a bit, but I couldn't say how it would affect such manual optimizations. The redundant headers are probably great for compression, but the cached metadata is more variable.
Sebastian Ullrich (Apr 01 2023 at 17:08):
There are about twice as many pointers as there are objects because of sharing, yes, that's the 52%
Sebastian Ullrich (Apr 01 2023 at 20:52):
It is unclear how to make a more detailed analysis; here is a breakdown of the environment extensions and the constants used by the old compiler for the largest two mathlib4 files
$ oleanparser --env-stats build/lib/Mathlib/LinearAlgebra/TensorProduct.olean
objects reachable from
282.073 ( 72%) old compiler constants
36.710 ( 9%) Lean.IR.declMapExt
9.863 ( 2%) Lean.Compiler.LCNF.baseExt
6.484 ( 1%) Lean.IR.UnreachableBranches.functionSummariesExt
5.218 ( 1%) Lean.Meta.simpExtension
3.786 ( 0%) Lean.declRangeExt
2.901 ( 0%) Lean.Compiler.LCNF.monoExt
2.731 ( 0%) Lean.namespacesExt
649 ( 0%) Mathlib.Prelude.Rename.renameExtension
588 ( 0%) Lean.Compiler.specExtension
394 ( 0%) Lean.Meta.instanceExtension
347 ( 0%) Lean.Compiler.LCNF.specExtension
169 ( 0%) Lean.Compiler.LCNF.UnreachableBranches.functionSummariesExt
169 ( 0%) Lean.docStringExt
67 ( 0%) Std.Tactic.Lint.stdLinterExt
61 ( 0%) Lean.Meta.globalInstanceExtension
50 ( 0%) Lean.Parser.parserExtension
49 ( 0%) Lean.Meta.Match.Extension.extension
41 ( 0%) Lean.protectedExt
29 ( 0%) Lean.reducibilityAttrs
27 ( 0%) Lean.Elab.macroAttribute
24 ( 0%) Lean.Compiler.inlineAttrs
15 ( 0%) Lean.structureExt
13 ( 0%) Lean.projectionFnInfoExt
13 ( 0%) Lean.PrettyPrinter.Delaborator.appUnexpanderAttribute
13 ( 0%) Lean.auxRecExt
11 ( 0%) Lean.moduleDocExt
7 ( 0%) Lean.classExtension
7 ( 0%) Lean.noConfusionExt
7 ( 0%) Lean.Server.Completion.completionBlackListExt
5 ( 0%) Lean.Elab.Term.elabAsElim
$ oleanparser --env-stats build/lib/Mathlib/Tactic/Ring/Basic.olean
objects reachable from
200.570 ( 62%) Lean.IR.declMapExt
117.781 ( 36%) old compiler constants
49.448 ( 15%) Lean.Compiler.LCNF.baseExt
41.720 ( 13%) Lean.Compiler.LCNF.monoExt
7.280 ( 2%) Lean.Elab.Structural.eqnInfoExt
3.238 ( 1%) Lean.declRangeExt
2.793 ( 0%) Lean.IR.UnreachableBranches.functionSummariesExt
1.036 ( 0%) Lean.namespacesExt
940 ( 0%) Lean.Compiler.specExtension
842 ( 0%) Lean.Meta.simpExtension
790 ( 0%) Lean.Compiler.LCNF.UnreachableBranches.functionSummariesExt
693 ( 0%) Lean.Compiler.LCNF.Specialize.specCacheExt
494 ( 0%) Lean.Meta.Match.Extension.extension
329 ( 0%) Lean.Compiler.inlineAttrs
304 ( 0%) Lean.docStringExt
196 ( 0%) Lean.Meta.instanceExtension
189 ( 0%) Lean.reducibilityAttrs
92 ( 0%) Lean.structureExt
77 ( 0%) Lean.projectionFnInfoExt
69 ( 0%) Lean.protectedExt
67 ( 0%) Std.Tactic.Lint.stdLinterExt
51 ( 0%) Lean.Parser.parserExtension
48 ( 0%) Lean.Meta.globalInstanceExtension
45 ( 0%) Lean.auxRecExt
37 ( 0%) Lean.Server.Completion.completionBlackListExt
29 ( 0%) Lean.noConfusionExt
23 ( 0%) Lean.regularInitAttr
13 ( 0%) Lean.Elab.macroAttribute
13 ( 0%) Lean.Elab.Tactic.tacticElabAttribute
11 ( 0%) Lean.moduleDocExt
(note that these categories are not necessarily disjoint with each other and the rest of the environment because of sharing)
For TensorProduct there is some unfortunate code duplication from specialization in the old compiler, but the new compiler looks much better.
Curiously there are some smaller files where the analysis takes much longer despite the reachable objects closures being smaller, so I don't have number for all of mathlib4 yet.
Reid Barton (Apr 01 2023 at 21:08):
Do you try somehow (maybe by a collection step before writing out the data) to ensure some degree of locality for the pointers?
Sebastian Ullrich (Apr 01 2023 at 21:15):
Yes of course, there are no gaps between objects except for 8-byte alignment. And everything is maximally shared.
Reid Barton (Apr 01 2023 at 21:26):
I mean, if you have part of the data that is structured like a linked list, then the consecutive elements in the list should appear consecutively in memory, as far as is possible
Sebastian Ullrich (Apr 01 2023 at 21:38):
Ah, the post-order traversal with sharing naturally leads to some locality, but otherwise the system has no idea about what kind of data it is writing to disk
Sebastian Ullrich (Apr 01 2023 at 21:40):
In a custom transport format it might even make sense to implement n-ary applications even if it destroys some sharing
Last updated: Dec 20 2023 at 11:08 UTC