Zulip Chat Archive

Stream: general

Topic: what’s in an olean file?


Joachim Breitner (Feb 10 2022 at 17:21):

What’s in an olean file? In particular, does it contain proof objects for Props, or merely everything that’s needed by downstream module (i.e. an “interface file”)?
(I’m asking because I like leanproject get-cache when juggling PRs, but downloading 50MB everytime can be annoying or even prohibitively when working while traveling, so I’m curious if there is some redundancy here.)

Eric Wieser (Feb 10 2022 at 17:23):

There's certainly redundancy, because the cache you download typically contains the same logic/basic.olean file every time

Joachim Breitner (Feb 10 2022 at 17:24):

That’s yet another axis of redundancy, indeed. (And one that’s very temping for nerdy plans. rsync. content-addressable stores. etc :-))

Mario Carneiro (Feb 10 2022 at 17:28):

Joachim Breitner said:

What’s in an olean file? In particular, does it contain proof objects for Props, or merely everything that’s needed by downstream module (i.e. an “interface file”)?

Sadly, the answer to both of these is "yes"

Mario Carneiro (Feb 10 2022 at 17:29):

since tactics and #print can see the content of an upstream definition, changes to the proof are visible downstream, so the "interface" can't exclude them

Mario Carneiro (Feb 10 2022 at 17:30):

#reduce and dec_trivial can also show different behavior depending on how a proposition is proved

Henrik Böving (Feb 10 2022 at 17:30):

The Lean 4 ones basically come down to a dump of an entire Lean Environment aka everything you have access to while doing meta programming, I'd suspect its similar in 3?

Mario Carneiro (Feb 10 2022 at 17:31):

yes

Joachim Breitner (Feb 10 2022 at 17:36):

Ah, so it even #print looks through lemmas in Prop? Hmm, that makes such a separation into “interface” and “full proof objects” harder. Too bad, I guess I got overly excited switching to a theorem prover with proof irrelevance :-)

Gabriel Ebner (Feb 10 2022 at 17:39):

The olean file also includes line numbers and docstrings. There's a lot more in there than just the interface.

Joachim Breitner (Feb 10 2022 at 17:39):

That I’d understand as the interface, as I need it to jump to a definition.

Sebastian Ullrich (Feb 10 2022 at 17:39):

There's a lot more that should not be in there :) . Like bytecode/IR.

Joachim Breitner (Feb 10 2022 at 17:40):

Maybe interface is a unsuitable word. “Anything of interest when importing the module”.

Sebastian Ullrich (Feb 10 2022 at 17:40):

Line numbers are metadata that should be loaded by the server, but not lead to recompilation. Lean 4 is going in the right direction by moving some of this information to a separate .ilean file.

Joachim Breitner (Feb 10 2022 at 17:45):

GHC has a notion of “I need to look at this file, because some dependency’s .hi file changed, but actually I notice that none of the things affecting this module has actually changed, so I don't need to further process this file”. So it’s more fine-granular than just interface-changed-or-did-not-change, and you don’t need to produce separate files for each possible consumer (with their own “projection of interest” of the information), but I don't now how fine it actually gets.

Sebastian Ullrich (Feb 10 2022 at 17:54):

Ah, that's interesting. And incompatible with a Nix-based build system (unless you do materialize the projection into its own file), ha ha...

Reid Barton (Feb 10 2022 at 18:25):

IIRC the level of granularity depends on the "subsystem" in question. For example I think ordinary definitions are hashed individually, but there's only one hash for all the type family instances--that's because GHC checks that type family instances are consistent and adding any instance could break this property. (I guess maybe it can be done on a per-family basis.)
In Lean, I guess the fact that tactics have free reign of the environment means that there are not many useful "subsystems" to hash separately, unless you want to speculate that there will be client modules that use only specific known tactics (e.g., hash the simp set in case a client module only uses simp).

Joachim Breitner (Feb 10 2022 at 18:26):

Incompatible is maybe a tad too strong a word. It's an optimization that wouldn't apply in nix-based system.

Joachim Breitner (Feb 10 2022 at 18:28):

Ah, maybe we'd even have to worry about tactics inspecting the docstring or the line numbers :-)

Reid Barton (Feb 10 2022 at 18:32):

Right and there are tactics (or tactic-like things) that really do inspect docstrings (to_additive).
But in general, you could try to guess some partial "view" of the information in an olean file, that excludes stuff like position information and non-simp lemmas. Then when compiling a client module, track whether or not the process only involved information available through that "view", and if so record this fact--then you don't need to recompile the client if the upstream module changes but the view doesn't change.
It sounds like a tricky thing to get right.

Joachim Breitner (Feb 10 2022 at 18:51):

Yes, it would be better if information could be left out completely (looking at you, irrelevant proof object) :-)

Joachim Breitner (Feb 10 2022 at 18:52):

And bad recompilation avoidance is far worse than less recompilation avoidance.

Henrik Böving (Feb 10 2022 at 18:53):

But the irrelevant proof object wants to be looked at by stuff such as documentation generators :p

Joachim Breitner (Feb 10 2022 at 18:54):

Really? The docs on https://leanprover-community.github.io/mathlib_docs/group_theory/is_free_group.html only show equations for defs, not for lemmas.

Henrik Böving (Feb 10 2022 at 18:56):

Yeah that's a WIP thing for Lean 4, https://github.com/insightmind/LeanInk, we're planning to eventually integrate it with doc-gen4

Henrik Böving (Feb 10 2022 at 18:57):

I don't actually know the extent to which it uses olean over the raw source files though so maybe we can get rid of it at least in the olean.

Sebastian Ullrich (Feb 10 2022 at 19:06):

It doesn't use oleans directly

Sebastian Ullrich (Feb 10 2022 at 19:07):

Joachim Breitner said:

Incompatible is maybe a tad too strong a word. It's an optimization that wouldn't apply in nix-based system.

Incompatible in the sense that it would make me rather use a different build system that does support it :)

Sebastian Ullrich (Feb 11 2022 at 11:53):

@Joachim Breitner FWIW, I had this SE question in mind even before you started this thread, so I finally wrote it down: https://proofassistants.stackexchange.com/questions/335/what-is-the-state-of-recompilation-avoidance-in-proof-assistants. Let's see if it actually works as an SE question...


Last updated: Dec 20 2023 at 11:08 UTC