Zulip Chat Archive

Stream: lean4

Topic: curl version on Ubuntu 20.04


Tomas Skrivan (Mar 23 2023 at 16:11):

Today I wanted to get cached build of mathlib4 and got struck on curl version. On Ubuntu 20.04, the version of curl is 7.68 but lean exe cache needs 7.69 and above.

Googling for "update curl ubuntu" gives you this guide which downloads sources, complies and install them. Other guides on how to update curl compile the code too! This is not beginner friendly at all.

Talking briefly to Mario about this. Apparently this is because new version of curl supports parallelization. I would suggest to fallback to old version of curl. Non parallelized curl would be still faster then compiling mathlib.

I think Ubuntu 20.04 is relatively new and popular, so fixing this would be nice.

Sebastian Ullrich (Mar 23 2023 at 16:15):

AFAIR it is extremely slow without parallelization, to the point where I wondered whether building from scratch wouldn't be faster

Yaël Dillies (Mar 23 2023 at 16:15):

Relevant discussion: https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/.60lake.20exe.20cache.60.20curl.20version

Shreyas Srinivas (Mar 23 2023 at 19:07):

Sebastian Ullrich said:

AFAIR it is extremely slow without parallelization, to the point where I wondered whether building from scratch wouldn't be faster

Is it possible to parallelize inside Lean rather than letting curl do its parallelization? Maybe this helps overcome or work around the issues with Curl's parallelization in version 7.68 and below?

Arthur Paulino (Mar 23 2023 at 19:13):

Only experimentation can tell... I wouldn't know beforehand

Eric Wieser (Mar 23 2023 at 19:27):

I would imagine that curl will do a much better job of parallelism than what we can do in Lean

Mario Carneiro (Mar 23 2023 at 19:27):

yes, but only if the flag actually exists!

Mario Carneiro (Mar 23 2023 at 19:28):

the issue here is that many people don't have a curl that supports the option we want to use

Mario Carneiro (Mar 23 2023 at 19:29):

If we have a reliable way to get new curl installed on machines that's fine, but we clearly don't want end users to have to install curl from source

Eric Wieser (Mar 23 2023 at 19:33):

Kevin Buzzard said:

https://www.linuxcapable.com/how-to-install-upgrade-curl-on-ubuntu-20-04-lts/ fixed the problem for me, although I'm now running curl as modified by some random person on the internet

was one option

Kevin Buzzard (Mar 23 2023 at 21:34):

Clicking on the "upgrade to 22.04?" button is another one

Arthur Paulino (Mar 23 2023 at 21:39):

University computers are a concern though, since students don't have SU privileges

Shreyas Srinivas (Mar 23 2023 at 22:02):

Not to mention, upgrading a distro can break a lot of other things

Shreyas Srinivas (Mar 23 2023 at 22:03):

Also, it doesn't help all the Ubuntu derived distributions

Sebastian Ullrich (Mar 24 2023 at 07:42):

I think we could do worse than providing a statically linked recent version of curl per platform and download that (for which older curl is sufficient).

$ nix build github:NixOS/nixpkgs/nixos-unstable#pkgsStatic.curl
$ file ./result-bin/bin/curl
./result-bin/bin/curl: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=ee11a4cbde03a128ffa2c03b810d7c9d287f0506, not stripped
$ ./result-bin/bin/curl --version
curl 7.88.1 (x86_64-unknown-linux-musl) libcurl/7.88.1 OpenSSL/3.0.8 zlib/1.2.13 zstd/1.5.4 libidn2/2.3.2 libssh2/1.10.0 nghttp2/1.51.0
Release-Date: 2023-02-20
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS HSTS HTTP2 HTTPS-proxy IDN IPv6 Largefile libz NTLM NTLM_WB SSL threadsafe TLS-SRP UnixSockets zstd

Sebastian Ullrich (Mar 25 2023 at 12:07):

If your Linux curl is too old, please give https://github.com/leanprover-community/mathlib4/pull/3097 a try

Scott Morrison (Mar 28 2023 at 10:19):

Arthur Paulino said:

Only experimentation can tell... I wouldn't know beforehand

Using IO.asTask to parallelize in Lean, rather than in curl, takes:

% time lake exe cache get!
Attempting to download 1876 file(s)
Decompressing 1875 file(s)
lake exe cache get!  57.12s user 97.32s system 85% cpu 3:00.54 total

which for me is barely slower than the current implementation using --parallel:

% time lake exe cache get!
Attempting to download 1876 file(s)
Decompressing 1876 file(s)
lake exe cache get!  28.48s user 48.23s system 47% cpu 2:41.21 total

Scott Morrison (Mar 28 2023 at 10:26):

I can make a PR with my code as a fallback if we haven't obtained a suitable curl, but I'd want someone with more curl-fu to check my flags.

Sebastian Ullrich (Mar 28 2023 at 10:29):

I think we've effectively solved the problem, at least on Linux. I'm surprised though it's so slow for you, is it simply your downstream?

Scott Morrison (Mar 28 2023 at 10:38):

I had to upgrade curl on my macos system to satisfy cache, so I think more general solutions are still worthwhile.

Scott Morrison (Mar 28 2023 at 10:38):

I'm not sure why it is so slow, but it is certainly painful. :-) Not sure whether to blame azure in australia, my ISP, or something else.

Scott Morrison (Mar 28 2023 at 10:47):

!4#3146, no problem if it is discarded.

Arthur Paulino (Mar 28 2023 at 12:42):

What about reliability? Users might end up in cycles of cache get! if downloads fail too often

Scott Morrison (Mar 28 2023 at 20:43):

So far in testing I don't seem to have had any failures. What could be done to improve reliability? Why would we expect it to be worse with Task parallelization rather than curl parallelization?

Arthur Paulino (Mar 28 2023 at 20:46):

Unfortunately I don't know curl well enough to give satisfying answers. I can only raise concerns

Henrik Böving (Mar 28 2023 at 20:52):

Scott Morrison said:

So far in testing I don't seem to have had any failures. What could be done to improve reliability? Why would we expect it to be worse with Task parallelization rather than curl parallelization?

So, just from a protocol POV HTTP 2 is actually fully capable of having multiple streams encoded in the same TCP connection which I would expect curl abuses for task parallelization. If you instead have multiple curl tasks they have to open seperate TCP connections for seperate HTTP 2 connections and each of those connections need to carry their meta data with them. Now, the meta data should (hopefully) not be too significant but I would expect that there is a measurable difference in multiple curls vs 1 curl doing multi tasking on its own assuming an HTTP 2 connection.

Arthur Paulino (Mar 28 2023 at 21:23):

Right, the difference in time was noticeable. But we're wondering if parallel tasks would be any riskier in terms of failing to download (aside from the fact that a longer download is intrinsically riskier in terms of failure)

Henrik Böving (Mar 28 2023 at 21:27):

Its all TCP based anyways so there is no reason to fear failures from a protocol POV again. The only reason I could see for your downloads to fail would be if Azure would decide to rate limit you because it feels like you having so many connections open might count as a DoS attack but I would hope that rate limiting on azure blobs is configurable to a point where this is not the case

Eric Wieser (Mar 29 2023 at 02:08):

This presumes that old curl doesn't have other bugs that cause a download to fail; given the problem in the first place was that --parallel had a bug, this may or may not be a safe assumption.

Gabriel Ebner (Mar 29 2023 at 02:28):

I think Sebastian noticed that the Azure blob storage doesn't do HTTP/2. I think he got a nice speedup when trying out HTTP/2 with the CDN.

Gabriel Ebner (Mar 29 2023 at 02:29):

Here's the URL in case somebody wants to make a PR: https://lakecache-gjhdfybgcpf9dpfr.z01.azurefd.net/mathlib4/f/18296616905743833141.tar.gz

Scott Morrison (Mar 29 2023 at 03:41):

I tried the CDN earlier today and didn't see any speed-up. :-(

Scott Morrison (Mar 29 2023 at 03:42):

In fact, it was slightly slower. (For me a cache get! takes about three minutes.)

Gabriel Ebner (Mar 29 2023 at 03:45):

I should've put the cache in Australia instead of Ohio. :rofl:

Gabriel Ebner (Mar 29 2023 at 03:46):

Was it just slower the first time though? Did it get faster if you tried cache get! a second time?

Gabriel Ebner (Mar 29 2023 at 03:46):

That was my impression with the mathlib3 cache at least.

Sebastian Ullrich (Mar 29 2023 at 08:10):

Henrik Böving said:

Its all TCP based anyways

Up until Azure CDN support HTTP/3 :) ... but yes, I was surprised to find that protocol overhead apparently is negligible (when querying actually existent files) (and --parallel still seems to help even when using HTTP/2). It's all about the raw speed of the connection on both ends, and everywhere in between

Scott Morrison (Mar 29 2023 at 09:58):

Unfortunately the CDN is still just as slow as the original, for me, even after repeat invocations.

Scott Morrison (Mar 29 2023 at 09:59):

How long is lake exe cache get! taking for others?

Mario Carneiro (Mar 29 2023 at 10:21):

$ time lake exe cache get!
Attempting to download 1895 file(s)
Decompressing 1895 file(s)

________________________________________________________
Executed in  178.09 secs    fish           external
   usr time   20.99 secs    0.00 micros   20.99 secs
   sys time   19.61 secs  374.00 micros   19.60 secs

Mario Carneiro (Mar 29 2023 at 10:22):

that is for downloading today's mathlib master

Mario Carneiro (Mar 29 2023 at 10:26):

run no. 2 is indeed a bit faster:

Executed in  133.16 secs    fish           external
   usr time   20.35 secs  609.00 micros   20.35 secs
   sys time   17.46 secs    0.00 micros   17.46 secs

Mauricio Collares (Mar 29 2023 at 12:38):

Would it be crazy to use a protocol designed for synchronization (e.g., rsync or whatever azcopy sync uses)?

Henrik Böving (Mar 29 2023 at 13:22):

Does windows have rsync type of stuff nowadays?

Gabriel Ebner (Mar 29 2023 at 17:17):

Oh, wow. I didn't realize it was that slow for other people. (It would be great though if we had numbers for just the downloading part instead of downloading+unpacking combined.)

$ time lake exe cache get!
Attempting to download 1898 file(s)
Decompressing 1898 file(s)

________________________________________________________
Executed in   13.72 secs    fish           external
   usr time   10.96 secs    0.00 micros   10.96 secs
   sys time   24.94 secs  351.00 micros   24.94 secs

Gabriel Ebner (Mar 29 2023 at 17:19):

Would it be crazy to use a protocol designed for synchronization (e.g., rsync or whatever azcopy sync uses)?

I don't think rsync works that well if you have a huge list of small files you want to download (instead of a whole directory). And I believe azcopy sync uses the same HTTP endpoints that we access via curl.

Gabriel Ebner (Mar 29 2023 at 17:25):

I'd also caution you not to expect any miracles. All olean caches for mathlib combined are 445 megabytes right now (which is what you test with lake exe cache get!). Divide this by your connection speed, and you get a lower bound on how quickly you can get them.

Gabriel Ebner (Mar 29 2023 at 17:40):

Doing this back-of-the-envelope calculation, all the results posted here give connection speeds between 30 and 500 mbit/s. Which is perfectly reasonable, and I don't think we can improve it much using only protocol changes. The unfortunate truth is that Lean 4 oleans are an order of magnitude more voluminous than their Lean 3 counterparts.

Ruben Van de Velde (Mar 29 2023 at 17:44):

I've heard this a few times (oleans getting much bigger) - is that considered an issue that might see improvements in the future or should we assume it's going to remain the case?

Ruben Van de Velde (Mar 29 2023 at 17:44):

Also, are they amenable to compression, either individually or as a group?

Floris van Doorn (Mar 29 2023 at 17:45):

Gabriel Ebner said:

Doing this back-of-the-envelope calculation, all the results posted here give connection speeds between 30 and 500 mbit/s. Which is perfectly reasonable, and I don't think we can improve it much using only protocol changes. The unfortunate truth is that Lean 4 oleans are an order of magnitude more voluminous than their Lean 3 counterparts.

This is a really big problem when teaching, giving tutorials, or any other time that you ask new users to install Lean 4 and mathlib4.

Gabriel Ebner (Mar 29 2023 at 17:48):

Also, are they amenable to compression, either individually or as a group?

We're already compressing them, they'd be 1.4 gigabytes otherwise.

Eric Wieser (Mar 29 2023 at 18:00):

This is a really big problem when teaching, giving tutorials, or any other time that you ask new users to install Lean 4 and mathlib4.

I can imagine this is expecially bad if a room full of 25 people, with mediocre wifi, all need to download 500mb each at the start of a tutorial

Eric Wieser (Mar 29 2023 at 18:01):

Perhaps this is an argument for more heavily leaning on vscode remote development for teaching, be it via code-spaces, gitpod, or remote access to university servers

Scott Morrison (Mar 29 2023 at 21:44):

Maybe I'll have to write a script the pre-emptively checks out modified branches from github and pulls the oleans, that runs while I'm getting my breakfast. :-)

Scott Morrison (Mar 29 2023 at 21:44):

(I realise this is probably a bad idea, misusing our azure resources.)

Gabriel Ebner (Mar 29 2023 at 22:28):

You have my official permission to do that. :smile:. Just don't do get!. Your productivity is worth more than our Azure bills. Besides, you're not downloading anything twice. The branches probably share a lot of common Orleans.

Sebastian Ullrich (Mar 30 2023 at 07:32):

How much of a concern is storing and re-downloading identical but differently named olean tars yet? I'm assuming not all that much at this point because people most work on mathlib4 leaves, invalidating few existing modules, and we do not update Lean or other dependencies that often either. But as soon as either of these changes, I believe we should take a serious look at moving towards a content-addressed storage model to potentially save massive amounts of bandwidth. Which could be as simple as replacing the current <input-hash>.tar.gz with a symlink-like text file <input-hash> that contains <output-hash.tar.gz>.

Sebastian Ullrich (Mar 30 2023 at 07:51):

Just to state the maybe-obvious, no work has been done on optimizing the size of Lean 4 .oleans yet, or analyses on the size difference to Lean 3 .oleans. Similarly, we might not be doing a good job yet at avoiding to perturb a .olean's contents on upstream changes, which would affect the efficiency of such a content address model.

Reid Barton (Mar 30 2023 at 07:56):

The Lean 4 oleans are basically mmapped to build the in-memory environment, right?
So if we want to keep doing that, there would be some constraints on the format/size?

Sebastian Ullrich (Mar 30 2023 at 08:04):

Yes, the overhead of that is one of the unclear parts. But in theory, the cache could use a completely different representation that we transform from after unpacking, so it would only be disk space and unpacking overhead.

Sebastian Ullrich (Apr 01 2023 at 16:59):

Sebastian Ullrich said:

Just to state the maybe-obvious, no work has been done on optimizing the size of Lean 4 .oleans yet, or analyses on the size

Now we know a little bit more

array        526.661 (   1%)      42.704.504B (   3%)
string     1.148.716 (   3%)      53.699.609B (   4%)
bignum             1 (   0%)              40B (   0%)
ctor      35.000.295 (  95%)    1.191.012.032B (  92%)
    tag 0      3.962.740 (  10%)     128.823.376B (  10%)
    tag 1      2.658.974 (   7%)      76.023.200B (   5%)
    tag 2        943.200 (   2%)      28.268.920B (   2%)
    tag 3        237.476 (   0%)       7.211.992B (   0%)
    tag 4      1.019.304 (   2%)      31.936.928B (   2%)
    tag 5     20.710.716 (  56%)     662.427.408B (  51%)
    tag 6      2.717.678 (   7%)     125.459.136B (   9%)
    tag 7      1.964.034 (   5%)      91.803.944B (   7%)
    tag 8        604.649 (   1%)      33.537.784B (   2%)
    tag 9         17.037 (   0%)         408.888B (   0%)
    tag 10        81.597 (   0%)       2.823.552B (   0%)
    tag 11        77.122 (   0%)       2.182.864B (   0%)
    tag 12         5.768 (   0%)         104.040B (   0%)
total     36.675.673 ( 100%)    1.287.416.185B ( 100%)

shared pointers   39.796.971 (  52%)
lake exe oleanparser ~/lean/mathlib4/build/**/*.olean --stats  46.29s user 0.62s system 99% cpu 47.127 total

No low-hanging fruits like "we included 10MB of strings or empty arrays" unfortunately. For constructors, the vast majority of objects and bytes, there is no direct type information available, but given that Expr.app has constructor index 5, it's fair to guess that these are mostly Expr objects, which of course is information we cannot directly avoid storing in some form usually.

Sebastian Ullrich (Apr 01 2023 at 17:03):

Not that I'm saying that the .olean format is particularly efficient for storing Expr.apps: we have 8 bytes of object header followed by two 8 byte pointers followed by another 8 bytes of Expr.Data cache. In principle, this information could be packed much more tightly for transporting (which would not even have to be part of core).

Reid Barton (Apr 01 2023 at 17:04):

Compression achieves some of that space savings automatically, right?

Reid Barton (Apr 01 2023 at 17:05):

I assume there should be quite a lot of sharing among Exprs?

Sebastian Ullrich (Apr 01 2023 at 17:08):

We know compression helps quite a bit, but I couldn't say how it would affect such manual optimizations. The redundant headers are probably great for compression, but the cached metadata is more variable.

Sebastian Ullrich (Apr 01 2023 at 17:08):

There are about twice as many pointers as there are objects because of sharing, yes, that's the 52%

Sebastian Ullrich (Apr 01 2023 at 20:52):

It is unclear how to make a more detailed analysis; here is a breakdown of the environment extensions and the constants used by the old compiler for the largest two mathlib4 files

$ oleanparser --env-stats build/lib/Mathlib/LinearAlgebra/TensorProduct.olean
objects reachable from
     282.073 (  72%) old compiler constants
      36.710 (   9%) Lean.IR.declMapExt
       9.863 (   2%) Lean.Compiler.LCNF.baseExt
       6.484 (   1%) Lean.IR.UnreachableBranches.functionSummariesExt
       5.218 (   1%) Lean.Meta.simpExtension
       3.786 (   0%) Lean.declRangeExt
       2.901 (   0%) Lean.Compiler.LCNF.monoExt
       2.731 (   0%) Lean.namespacesExt
         649 (   0%) Mathlib.Prelude.Rename.renameExtension
         588 (   0%) Lean.Compiler.specExtension
         394 (   0%) Lean.Meta.instanceExtension
         347 (   0%) Lean.Compiler.LCNF.specExtension
         169 (   0%) Lean.Compiler.LCNF.UnreachableBranches.functionSummariesExt
         169 (   0%) Lean.docStringExt
          67 (   0%) Std.Tactic.Lint.stdLinterExt
          61 (   0%) Lean.Meta.globalInstanceExtension
          50 (   0%) Lean.Parser.parserExtension
          49 (   0%) Lean.Meta.Match.Extension.extension
          41 (   0%) Lean.protectedExt
          29 (   0%) Lean.reducibilityAttrs
          27 (   0%) Lean.Elab.macroAttribute
          24 (   0%) Lean.Compiler.inlineAttrs
          15 (   0%) Lean.structureExt
          13 (   0%) Lean.projectionFnInfoExt
          13 (   0%) Lean.PrettyPrinter.Delaborator.appUnexpanderAttribute
          13 (   0%) Lean.auxRecExt
          11 (   0%) Lean.moduleDocExt
           7 (   0%) Lean.classExtension
           7 (   0%) Lean.noConfusionExt
           7 (   0%) Lean.Server.Completion.completionBlackListExt
           5 (   0%) Lean.Elab.Term.elabAsElim
$ oleanparser --env-stats build/lib/Mathlib/Tactic/Ring/Basic.olean
objects reachable from
     200.570 (  62%) Lean.IR.declMapExt
     117.781 (  36%) old compiler constants
      49.448 (  15%) Lean.Compiler.LCNF.baseExt
      41.720 (  13%) Lean.Compiler.LCNF.monoExt
       7.280 (   2%) Lean.Elab.Structural.eqnInfoExt
       3.238 (   1%) Lean.declRangeExt
       2.793 (   0%) Lean.IR.UnreachableBranches.functionSummariesExt
       1.036 (   0%) Lean.namespacesExt
         940 (   0%) Lean.Compiler.specExtension
         842 (   0%) Lean.Meta.simpExtension
         790 (   0%) Lean.Compiler.LCNF.UnreachableBranches.functionSummariesExt
         693 (   0%) Lean.Compiler.LCNF.Specialize.specCacheExt
         494 (   0%) Lean.Meta.Match.Extension.extension
         329 (   0%) Lean.Compiler.inlineAttrs
         304 (   0%) Lean.docStringExt
         196 (   0%) Lean.Meta.instanceExtension
         189 (   0%) Lean.reducibilityAttrs
          92 (   0%) Lean.structureExt
          77 (   0%) Lean.projectionFnInfoExt
          69 (   0%) Lean.protectedExt
          67 (   0%) Std.Tactic.Lint.stdLinterExt
          51 (   0%) Lean.Parser.parserExtension
          48 (   0%) Lean.Meta.globalInstanceExtension
          45 (   0%) Lean.auxRecExt
          37 (   0%) Lean.Server.Completion.completionBlackListExt
          29 (   0%) Lean.noConfusionExt
          23 (   0%) Lean.regularInitAttr
          13 (   0%) Lean.Elab.macroAttribute
          13 (   0%) Lean.Elab.Tactic.tacticElabAttribute
          11 (   0%) Lean.moduleDocExt

(note that these categories are not necessarily disjoint with each other and the rest of the environment because of sharing)
For TensorProduct there is some unfortunate code duplication from specialization in the old compiler, but the new compiler looks much better.
Curiously there are some smaller files where the analysis takes much longer despite the reachable objects closures being smaller, so I don't have number for all of mathlib4 yet.

Reid Barton (Apr 01 2023 at 21:08):

Do you try somehow (maybe by a collection step before writing out the data) to ensure some degree of locality for the pointers?

Sebastian Ullrich (Apr 01 2023 at 21:15):

Yes of course, there are no gaps between objects except for 8-byte alignment. And everything is maximally shared.

Reid Barton (Apr 01 2023 at 21:26):

I mean, if you have part of the data that is structured like a linked list, then the consecutive elements in the list should appear consecutively in memory, as far as is possible

Sebastian Ullrich (Apr 01 2023 at 21:38):

Ah, the post-order traversal with sharing naturally leads to some locality, but otherwise the system has no idea about what kind of data it is writing to disk

Sebastian Ullrich (Apr 01 2023 at 21:40):

In a custom transport format it might even make sense to implement n-ary applications even if it destroys some sharing


Last updated: Dec 20 2023 at 11:08 UTC