Zulip Chat Archive

Stream: lean4

Topic: Dynamically importing olean files


Jure Taslak (Slovenia) (Apr 05 2023 at 14:54):

Hi, I have the following situation, I have a bunch of olean files that I pre-computed. Each of the oleans contains a structure (call it A) which has a bunch of fields, some of the fields contain data which describes the structure and some are boolean and numerical (Nat, Int), which describe the properties of the data and some are proofs that those fields indeed describe the correct things.

Now I also have oleans, which have structures that are the "skeleton" versions of the above structure (call it B) in that they only contain the boolean and numerical fields, but not the data or the proofs.

I want to do the following, in a proof I want to search for an A which satisfies a given boolean predicate p and add it to my hypothesis, or give an error if the search cannot find such an A. A straightforward approach would be to import all the oleans with A's and put them all in a list and search the list. The problem is, there are many A's and all-together the oleans are too big to import all of them. So my alternative would be to import all the B's since they are much smaller as they don't contain all the data, run the search on a list of B's and, if the search finds a suitable B, dynamically import the olean with the A whose skeleton this B is and then add the A to the list of hypothesis.

My question is, can this be done and how would one go about doing this?

I found the function importModules, but I'm not quite sure how to use it. My attempt was something like this

open Lean Lean.Elab.Tactic Expr Meta in
elab "import_graph_for_invariant" : tactic =>
  withMainContext do
    let sysRoot  findSysroot
    initSearchPath my_path
    let env  importModules [{ module := `Data.A1 : Import }] {} 0

But the env that the function returns is not an extension of the environment given by IO.getEnv as it only contains the import for Data.A1. Also I'm not quite sure how to use env in this situation. And I got a bit stuck here.

Scott Morrison (Apr 06 2023 at 03:12):

@Jure Taslak (Slovenia), did you see Mario's code for pickling and unpickling recently? I'm not sure exactly what you want, but it may be helpful.

Jure Taslak (Slovenia) (Apr 06 2023 at 07:45):

@Scott Morrison That is very helpful, thank you. I can't however see how to use this in a theorem to unpickle my data structure there, since I can't have unsafe theorems.

Scott Morrison (Apr 06 2023 at 07:47):

I don't think I understand what you're doing, but the unsafe modifier here is in no way increasing your difficulties relative to other ways to get things off the disk. You might enjoy Std.Term.Unsafe.

Scott Morrison (Apr 06 2023 at 07:48):

/--
Load an object from disk. Better to use `withUnpickle`, which frees memory.
-/
def unpickle (α : Type) [Nonempty α] (path : FilePath) : IO (α × CompactedRegion) := unsafe do
  let (x, region)  readModuleData path
  pure (unsafeCast x, region)

/-- Load an object from disk and run some continuation on it, freeing memory afterwards. -/
def withUnpickle [Monad m] [MonadLiftT IO m] {α β : Type} [Nonempty α]
    (path : FilePath) (f : α  m β) : m β := do
  let (x, region)  unpickle α path
  let r  f x
  unsafe region.free
  pure r

Jure Taslak (Slovenia) (Apr 06 2023 at 07:55):

I probably explained it badly, I want to read an olean from disk and use a term from this olean to fill a goal in a proof.
I suppose my confusion is in how unsafe works, I'll read up on it.

Sebastian Ullrich (Apr 06 2023 at 07:57):

You cannot dynamically add an import to the Environment, except by copying over all declarations manually

Sebastian Ullrich (Apr 06 2023 at 08:00):

Jure Taslak (Slovenia) said:

The problem is, there are many A's and all-together the oleans are too big to import all of them

Are you sure this is a problem in practice? The OS should take care of mapping and unmapping .olean pages into memroy as availably memory allows

Sebastian Ullrich (Apr 06 2023 at 08:02):

In any case, this sounds like an interesting use case, I would love to hear more about what you are doing here

Jure Taslak (Slovenia) (Apr 06 2023 at 08:09):

I actually haven't tried to import all of the oleans (I will try it later, have a meeting now), but there are about 20k of them, with a total size of ~35GB. They store finite graphs and some of their invariant. We want to use this to try to search for counterexamples for a given hypothesis about graphs.
So we want to, given a hypothesis forall g : Graph, p g, search for a counterexample via the invariants, which are mostly numeric and boolean (at least the ones we implemented so far). But then to actually produce a proof we need the actual graph not just the invariants (i.e. also the edges).
Perhaps we can try to reduce the total size of the oleans by optimizing our representation, but it will be large in any case, that's why I though dynamically importing only the ones we need might be a solution.

Sebastian Ullrich (Apr 06 2023 at 08:17):

Cool application, that's probably a new world record for .olean sizes! And would be an interesting stress test for the .olean mmaping indeed. In theory it should be perfectly lazy, slowing down considerably only when your actual working set, e.g. a single graph involved in a proof, doesn't fit into memory.

Sebastian Ullrich (Apr 06 2023 at 08:18):

In the import phase, Lean only needs to access the Declaration objects of all imports, not their contents


Last updated: Dec 20 2023 at 11:08 UTC