Zulip Chat Archive

Stream: general

Topic: Reducing environment size


Joachim Breitner (Oct 06 2023 at 09:51):

It (maybe unreasonably) bothered me a bit that the loogle deployment size contains 2.8 GB of Mathlib olean when it really doesn't need all the information therein. So I experimented with pruning from the environment the data that loogle doesn’t need.

  • Removing the value of all theorems it went down to 1.8GB (I was actually expecting a bigger portion of the olean to be the proof terms. Hmm.)
  • Also removing the value of all defns brought it down to 1.3GB (This is not correct, all reducible definitions need to be kept around. But loogle is a bit buggy with regard to reducible definitions anyways, it should probably look through them consistently.)
    That seems to be the low-hanging fruit here, overall probably not worth the complexity – but maybe someone finds these numbers interesting.

Alex J. Best (Oct 06 2023 at 09:55):

https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/olean.20dump.20tool/near/376045522 is quite relevant prior analysis of this. I think some centralised support for producing these reduced oleans would be useful (even if experimental)

Mario Carneiro (Oct 06 2023 at 10:00):

I would be supportive of adding various kind of "unpack everything except theorem bodies" mechanisms to leantar. The tricky part is avoiding failures related to missing data

MohanadAhmed (Oct 06 2023 at 16:17):

Joachim Breitner said:

  • Removing the value of all theorems it went down to 1.8GB (I was actually expecting a bigger portion of the olean to be the proof terms. Hmm.)

That is a significant reduction.
Can you share a code sample showing how to remove theorem bodies?

Joachim Breitner (Oct 06 2023 at 18:37):

Very hackily: https://github.com/nomeata/loogle/commit/b64813474121e427d2d518bd4d9a0726759d170d#diff-fabb39123a00724e8270ff4d896f44a96a967a115c9d979379bf0e98f2fd21a7R119


Last updated: Dec 20 2023 at 11:08 UTC