Stream: general

Topic: olean cache

Yury G. Kudryashov (May 10 2020 at 17:17):

I have two questions about olean cache:

1. Is there any reason to use tar.gz and not, e.g., tar.xz?
2. Is there a command to remove old files from ~/.mathlib? I know about rm ~/.mathlib/*.tar* but I'd like to preserve, e.g., caches corresponding to live branch heads.

Yury G. Kudryashov (May 15 2020 at 10:13):

Ping here. Can we switch to tar.xz?

Patrick Massot (May 15 2020 at 10:14):

I think this is a question for @Rob Lewis

Patrick Massot (May 15 2020 at 10:14):

Well, I would need to adjust leanproject too, but not much (maybe not at all)

Rob Lewis (May 15 2020 at 10:16):

I think 2 is for @Patrick Massot , this seems like a leanproject command. For 1, I don't know if there was a reason. We could change it but it will break urls for leanproject, so there would need to be a change there. And we'd need to keep checking gz for backward compatibility.

Patrick Massot (May 15 2020 at 10:20):

I checked, the unpacking function is generic enough but of course the url contains .tar.gz

Patrick Massot (May 15 2020 at 10:20):

The lazy transition solution would be to first try tar.xz and fall back to tar.gz

Patrick Massot (May 15 2020 at 10:21):

I don't see any issue it could cause

Rob Lewis (May 15 2020 at 10:21):

@Gabriel Ebner was there a reason to use gz? I don't remember which of us set that up. If I did it, gz was probably just inertia.

Rob Lewis (May 15 2020 at 10:22):

The only transition issue is people with outdated leanproject will start getting errors.

Patrick Massot (May 15 2020 at 10:22):

About deleting old archives, I can implement it if you tell me precisely what you want. Or I can merge a PR, this is probably just as fast to do it rather than telling me

Patrick Massot (May 15 2020 at 10:22):

The only transition issue is people with outdated leanproject will start getting errors.

Patrick Massot (May 15 2020 at 10:23):

Are .xz so much smaller than .gz?

Repacking

Yury G. Kudryashov (May 15 2020 at 10:26):

I wonder why some of today's files are tar.bz2

local cache?

yes

Rob Lewis (May 15 2020 at 10:27):

It saves ~7mb, not insignificant.

Yury G. Kudryashov (May 15 2020 at 10:31):

xz: 19M, bz2: 23M, gz: 26M

Yury G. Kudryashov (May 15 2020 at 10:32):

xz: 2.2s

Yury G. Kudryashov (May 15 2020 at 10:33):

gz: 0.8s

Yury G. Kudryashov (May 15 2020 at 10:33):

bz2: 3.8s

Patrick Massot (May 15 2020 at 10:34):

Ok, this is a huge difference. Probably negligible compared to download time, but maybe not compared to local copy

Yury G. Kudryashov (May 15 2020 at 10:35):

So probably local cache should be stored in tar.gz

Yury G. Kudryashov (May 15 2020 at 10:37):

(yes, I probably should change the ISP)

Rob Lewis (May 15 2020 at 10:40):

That's your normal download speed for a 27mb file? We pay "a lot" for the fancy Azure CDN for fast download speeds, so I just want to make sure it's not a limitation there.

Yury G. Kudryashov (May 15 2020 at 10:40):

Unfortunately, "forced to use home Internet all the time" and "hard to switch the ISP" came together.

Yury G. Kudryashov (May 15 2020 at 10:41):

The bottleneck is definitely on my side.

Patrick Massot (May 15 2020 at 10:41):

Even Zulip must be slow at this rate

Yury G. Kudryashov (May 15 2020 at 10:42):

Even Amazon video / Netflix work at this rate (never tried HD).

Rob Lewis (May 15 2020 at 10:45):

The optimal compression method is gonna depend on both bandwidth and processing time. My guess is I'd see a small gain from xz but nothing that significant. Slow laptop on a university connection will prefer gz.

Rob Lewis (May 15 2020 at 10:45):

But there are clear gains from xz on the Azure side, it basically cuts 25% off the bill.

Sebastien Gouezel (May 15 2020 at 10:45):

Yury G. Kudryashov said:

For me, it's more like 90 seconds, and this is the normal speed due to my internet connection...

Not that saving $3/mo matters so much. Sebastien Gouezel (May 15 2020 at 10:46): And decompressing time will be completely negligible, so I vote strongly for .xz. Rob Lewis (May 15 2020 at 10:47): I had no idea it was still possible to get 300kbps lines. Sebastien Gouezel (May 15 2020 at 10:47): All the houses in the neighborhood have optical fiber, except mine and my direct neighbor's because of bad luck. Johan Commelin (May 15 2020 at 10:47): I lived in Papua New Guinea for half a year... glad I didn't know about leanproject back then (-; We had a satellite connection to Asia, and could receive email / browse a 2 kbps, or something like that. Patrick Massot (May 15 2020 at 10:49): Now I picture Yury living on a small island in the middle of a huge Canadian lake, surrounded by 100km of forest with no other human being (but lots of animals). Patrick Massot (May 15 2020 at 10:50): although Toronto didn't look exactly like that last time I checked Rob Lewis (May 15 2020 at 10:50): And Sebastien the same, except it's a French lake and the animals all have optical fiber. Patrick Massot (May 15 2020 at 10:50): We don't have the right kind of lakes in France Yury G. Kudryashov (May 15 2020 at 10:51): I have a 6Mbps Internet via a phone line. Sebastien Gouezel (May 15 2020 at 10:51): Same here: when I am really in trouble, I use the phone :) Patrick Massot (May 15 2020 at 10:53): Johan Commelin (May 15 2020 at 10:55): Ooh well... it's only 2 months ago that I had 0 kbps for 3 days because they cut an internet cable at a construction site in town. Reid Barton (May 15 2020 at 10:55): I've been on a small island in the middle of a lake in Canada surrounded by forests that looks remarkably like that one. It took me a while to convince myself that that was actually not a photo of the island I've been to. Gabriel Ebner (May 15 2020 at 11:06): Rob Lewis said: Gabriel Ebner was there a reason to use gz? I don't remember which of us set that up. If I did it, gz was probably just inertia. Yes, it was inertia. I don't see any reason why we couldn't switch to xz. We could also deploy both xz and gz for a while while leanproject support is being rolled out. Gabriel Ebner (May 15 2020 at 11:07): Another idea: we could drop the .lean files and just distribute the .olean files. Yury G. Kudryashov (May 15 2020 at 11:12): $ ls -l *.tar.xz
-rw-r--r-- 1 urkud users 17979212 May 15 07:04 without-leans.tar.xz
-rw-r--r-- 1 urkud users 19744024 May 15 07:10 with-leans.tar.xz


Sebastian Ullrich (May 15 2020 at 11:47):

You might also be interested in zstd.

Arch Linux added support for zstd as a package compression method in October 2019 with the release of the pacman 5.2 package manager,[25] and in January 2020 switched from xz to zstd for the packages in the official repository. Arch uses zstd -c -T0 --ultra -20 -, the size of all compressed packages combined increased by 0.8% (compared to xz), the decompression speed is 1300% faster, decompression memory increased by 50 MiB when using multiple threads, compression memory increases but scales with the number of threads used.

Johan Commelin (May 15 2020 at 12:10):

Sounds good!

Last updated: May 11 2021 at 22:14 UTC