Zulip Chat Archive

Stream: general

Topic: olean cache


Yury G. Kudryashov (May 10 2020 at 17:17):

I have two questions about olean cache:

  1. Is there any reason to use tar.gz and not, e.g., tar.xz?
  2. Is there a command to remove old files from ~/.mathlib? I know about rm ~/.mathlib/*.tar* but I'd like to preserve, e.g., caches corresponding to live branch heads.

Yury G. Kudryashov (May 15 2020 at 10:13):

Ping here. Can we switch to tar.xz?

Patrick Massot (May 15 2020 at 10:14):

I think this is a question for @Rob Lewis

Yury G. Kudryashov (May 15 2020 at 10:14):

This would save bandwidth and upload/download times.

Patrick Massot (May 15 2020 at 10:14):

Well, I would need to adjust leanproject too, but not much (maybe not at all)

Rob Lewis (May 15 2020 at 10:16):

I think 2 is for @Patrick Massot , this seems like a leanproject command. For 1, I don't know if there was a reason. We could change it but it will break urls for leanproject, so there would need to be a change there. And we'd need to keep checking gz for backward compatibility.

Patrick Massot (May 15 2020 at 10:20):

I checked, the unpacking function is generic enough but of course the url contains .tar.gz

Patrick Massot (May 15 2020 at 10:20):

The lazy transition solution would be to first try tar.xz and fall back to tar.gz

Patrick Massot (May 15 2020 at 10:21):

I don't see any issue it could cause

Rob Lewis (May 15 2020 at 10:21):

@Gabriel Ebner was there a reason to use gz? I don't remember which of us set that up. If I did it, gz was probably just inertia.

Rob Lewis (May 15 2020 at 10:22):

The only transition issue is people with outdated leanproject will start getting errors.

Patrick Massot (May 15 2020 at 10:22):

About deleting old archives, I can implement it if you tell me precisely what you want. Or I can merge a PR, this is probably just as fast to do it rather than telling me

Patrick Massot (May 15 2020 at 10:22):

The only transition issue is people with outdated leanproject will start getting errors.

True, this is already pretty bad

Patrick Massot (May 15 2020 at 10:23):

Are .xz so much smaller than .gz?

Yury G. Kudryashov (May 15 2020 at 10:24):

Repacking

Patrick Massot (May 15 2020 at 10:25):

Also need to add download time and unpacking time

Yury G. Kudryashov (May 15 2020 at 10:26):

I wonder why some of today's files are tar.bz2

Yury G. Kudryashov (May 15 2020 at 10:26):

local cache?

Patrick Massot (May 15 2020 at 10:27):

yes

Rob Lewis (May 15 2020 at 10:27):

It saves ~7mb, not insignificant.

Yury G. Kudryashov (May 15 2020 at 10:31):

xz: 19M, bz2: 23M, gz: 26M

Patrick Massot (May 15 2020 at 10:32):

What about unpacking times?

Yury G. Kudryashov (May 15 2020 at 10:32):

xz: 2.2s

Yury G. Kudryashov (May 15 2020 at 10:33):

gz: 0.8s

Yury G. Kudryashov (May 15 2020 at 10:33):

bz2: 3.8s

Patrick Massot (May 15 2020 at 10:34):

Ok, this is a huge difference. Probably negligible compared to download time, but maybe not compared to local copy

Yury G. Kudryashov (May 15 2020 at 10:35):

So probably local cache should be stored in tar.gz

Yury G. Kudryashov (May 15 2020 at 10:37):

For me download time is 40-80s depending on I-don't-know-what

Yury G. Kudryashov (May 15 2020 at 10:37):

(yes, I probably should change the ISP)

Rob Lewis (May 15 2020 at 10:40):

That's your normal download speed for a 27mb file? We pay "a lot" for the fancy Azure CDN for fast download speeds, so I just want to make sure it's not a limitation there.

Yury G. Kudryashov (May 15 2020 at 10:40):

Unfortunately, "forced to use home Internet all the time" and "hard to switch the ISP" came together.

Yury G. Kudryashov (May 15 2020 at 10:41):

The bottleneck is definitely on my side.

Patrick Massot (May 15 2020 at 10:41):

Even Zulip must be slow at this rate

Yury G. Kudryashov (May 15 2020 at 10:42):

Even Amazon video / Netflix work at this rate (never tried HD).

Rob Lewis (May 15 2020 at 10:45):

The optimal compression method is gonna depend on both bandwidth and processing time. My guess is I'd see a small gain from xz but nothing that significant. Slow laptop on a university connection will prefer gz.

Rob Lewis (May 15 2020 at 10:45):

But there are clear gains from xz on the Azure side, it basically cuts 25% off the bill.

Sebastien Gouezel (May 15 2020 at 10:45):

Yury G. Kudryashov said:

For me download time is 40-80s depending on I-don't-know-what

For me, it's more like 90 seconds, and this is the normal speed due to my internet connection...

Rob Lewis (May 15 2020 at 10:46):

Not that saving $3/mo matters so much.

Sebastien Gouezel (May 15 2020 at 10:46):

And decompressing time will be completely negligible, so I vote strongly for .xz.

Rob Lewis (May 15 2020 at 10:47):

I had no idea it was still possible to get 300kbps lines.

Sebastien Gouezel (May 15 2020 at 10:47):

All the houses in the neighborhood have optical fiber, except mine and my direct neighbor's because of bad luck.

Johan Commelin (May 15 2020 at 10:47):

I lived in Papua New Guinea for half a year... glad I didn't know about leanproject back then (-;
We had a satellite connection to Asia, and could receive email / browse a 2 kbps, or something like that.

Patrick Massot (May 15 2020 at 10:49):

Now I picture Yury living on a small island in the middle of a huge Canadian lake, surrounded by 100km of forest with no other human being (but lots of animals).

Patrick Massot (May 15 2020 at 10:50):

although Toronto didn't look exactly like that last time I checked

Rob Lewis (May 15 2020 at 10:50):

And Sebastien the same, except it's a French lake and the animals all have optical fiber.

Patrick Massot (May 15 2020 at 10:50):

We don't have the right kind of lakes in France

Yury G. Kudryashov (May 15 2020 at 10:51):

I have a 6Mbps Internet via a phone line.

Sebastien Gouezel (May 15 2020 at 10:51):

Same here: when I am really in trouble, I use the phone :)

Patrick Massot (May 15 2020 at 10:53):

Something like https://www.privateislandsonline.com/uploads/resize/_1548_image_cbb0871e3b.jpg-2148-1600.jpg

Johan Commelin (May 15 2020 at 10:55):

Ooh well... it's only 2 months ago that I had 0 kbps for 3 days because they cut an internet cable at a construction site in town.

Reid Barton (May 15 2020 at 10:55):

I've been on a small island in the middle of a lake in Canada surrounded by forests that looks remarkably like that one. It took me a while to convince myself that that was actually not a photo of the island I've been to.

Gabriel Ebner (May 15 2020 at 11:06):

Rob Lewis said:

Gabriel Ebner was there a reason to use gz? I don't remember which of us set that up. If I did it, gz was probably just inertia.

Yes, it was inertia. I don't see any reason why we couldn't switch to xz. We could also deploy both xz and gz for a while while leanproject support is being rolled out.

Gabriel Ebner (May 15 2020 at 11:07):

Another idea: we could drop the .lean files and just distribute the .olean files.

Yury G. Kudryashov (May 15 2020 at 11:12):

$ ls -l *.tar.xz
-rw-r--r-- 1 urkud users 17979212 May 15 07:04 without-leans.tar.xz
-rw-r--r-- 1 urkud users 19744024 May 15 07:10 with-leans.tar.xz

Sebastian Ullrich (May 15 2020 at 11:47):

You might also be interested in zstd.

Arch Linux added support for zstd as a package compression method in October 2019 with the release of the pacman 5.2 package manager,[25] and in January 2020 switched from xz to zstd for the packages in the official repository. Arch uses zstd -c -T0 --ultra -20 -, the size of all compressed packages combined increased by 0.8% (compared to xz), the decompression speed is 1300% faster, decompression memory increased by 50 MiB when using multiple threads, compression memory increases but scales with the number of threads used.

Johan Commelin (May 15 2020 at 12:10):

Sounds good!


Last updated: Dec 20 2023 at 11:08 UTC