Zulip Chat Archive

Stream: lean4 dev

Topic: More compact DiscrTree

Joachim Breitner (Sep 24 2023 at 15:27):

I did some analysis of the DiscrTree that’s stored for library_search. The .extra file is 237M large. According to my calculations, we have:

    Number of entries: 227051

    Status Quo:
    Nodes:             2828803
    Maximum depth:     2596
    Words:             28495285
    MBs:               217

    New data structure:
    Nodes:             2823363
    Maximum depth:     27
    Words:             5450286
    MBs:               41

The depth reduction should fix the stack problems (which, incidentially, I had to work-around in my analysis code using @[inline]).

Presumably, this size reduction will also turn into a speed up, if not only because of less RAM transfer needed.

My proposal for the trie would be

inductive Trie α ß where
 | .empty
 | .values (Array ß) (Trie α ß)
 | .path (Array  α) (Trie α ß)
 | .branch  (Array  (α × Trie α ß))

This is my analysis code, I hope I didn’t make too many mistakes in my understanding of Lean’s heap layout.
Please do be mean and tell me all the things I did wrong here:

import Mathlib.Tactic.LibrarySearch
import Mathlib.Tactic.RunCmd

set_option autoImplicit false

@[inline] -- else we run out of stack space :-)
def Array.max {α} (a : Array α) (f : α → Nat) : Nat :=
  a.foldl (init := 0) (fun m x => m.max (f x))

@[inline]
def Array.sum {α} (a : Array α) (f : α → Nat) : Nat :=
  a.foldl (init := 0) (fun s x => s + (f x))

namespace Lean.Meta

variable (α : Type _)
variable (b : Bool)

partial
def DiscrTree.Trie.nodes : DiscrTree.Trie α b → Nat := fun (.node _vs children) =>
  1 + children.sum (·.2.nodes)

partial
def DiscrTree.Trie.maxDepth : DiscrTree.Trie α b → Nat := fun (.node _vs children) =>
  children.max (1 + ·.2.maxDepth)

partial
def DiscrTree.Trie.words : DiscrTree.Trie α b → Nat := fun (.node vs children) =>
  3 +
  (if vs.size = 0 then 0 else (3 + vs.size)) +
  (if children.size = 0 then 0 else (3 + 4 * children.size)) +
  children.sum (·.2.words)

partial
def DiscrTree.Trie.singletonChain (t : DiscrTree.Trie α b) : (Nat × DiscrTree.Trie α b) :=
  match t with
  | .node vs cs =>
    if vs.isEmpty && cs.size = 1
    then
      let (n, t') := (cs.get! 0).2.singletonChain
      (n + 1, t')
    else
      (0, t)

partial
def DiscrTree.Trie.newNodes (t : DiscrTree.Trie α b) : Nat :=
  let (n, .node vs cs) := t.singletonChain
  -- one constructor for compressed series
  (if n = 0 then 0 else 1) +
  -- one constructor for possible values
  (if vs.size = 0 then 0 else 1) +
  -- branching node
  (if cs.size = 0 then 0 else 1) +
  cs.sum (·.2.nodes)

partial
def DiscrTree.Trie.newWords (t : DiscrTree.Trie α b) : Nat :=
  let (n, .node vs cs) := t.singletonChain
  -- one constructor for compressed series
  (if n = 0 then 0 else 2 + 3 + n) +
  -- one constructor for possible values
  (if vs.size = 0 then 0 else (2 + 3 + vs.size)) +
  -- branching node
  (if cs.size = 0 then 0 else (2 + 3 + 4 * cs.size)) +
  cs.sum (·.2.newWords)

partial
def DiscrTree.Trie.newDepth (t : DiscrTree.Trie α b) : Nat :=
  let (n, .node vs cs) := t.singletonChain
  -- one constructor for compressed series
  (if n = 0 then 0 else 1) +
  -- one constructor for possible values
  (if vs.size = 0 then 0 else 1) +
  -- branching node
  (if cs.size = 0 then 0 else 1) +
  cs.max (·.2.newDepth)

def DiscrTree.nodes (t : DiscrTree α b) : Nat :=
  t.root |>.toArray |>.sum (fun p => p.2.nodes)

def DiscrTree.maxDepth (t : DiscrTree α b) : Nat :=
  t.root |>.toArray |>.max (fun p => p.2.maxDepth)

def DiscrTree.words (t : DiscrTree α b) : Nat :=
  t.root |>.toArray |>.sum (fun p => p.2.words)

def DiscrTree.newNodes (t : DiscrTree α b) : Nat :=
  t.root |>.toArray |>.sum (fun p => p.2.newNodes)

def DiscrTree.newDepth (t : DiscrTree α b) : Nat :=
  t.root |>.toArray |>.max (fun p => p.2.newDepth)

def DiscrTree.newWords (t : DiscrTree α b) : Nat :=
  t.root |>.toArray |>.sum (fun p => p.2.newWords)

end Lean.Meta

def test := Lean.Elab.Command.liftTermElabM $ do
  let (_, t) ← Mathlib.Tactic.LibrarySearch.cachedData.val.get
  let w := t.words
  let w' := t.newWords
  IO.println f!"
    Number of entries: {t.size}

    Status Quo:
    Nodes:             {t.nodes}
    Maximum depth:     {t.maxDepth}
    Words:             {w}
    MBs:               {w*8/(1024*1024)}

    New data structure:
    Nodes:             {t.newNodes}
    Maximum depth:     {t.newDepth}
    Words:             {w'}
    MBs:               {w'*8/(1024*1024)}

    "

#eval test

Joachim Breitner (Sep 24 2023 at 18:11):

WIP at lean4#2577. I’m gonna be curious about the bench result (once I weed out all the bugs).

Henrik Böving (Sep 24 2023 at 19:34):

I guess it would be cool if we had built-in support for measuring the size in words of a thing? That should be doable easily from within C modulo custom external classes right?

Joachim Breitner (Sep 24 2023 at 19:36):

Yes, I was wondering the same. Probably the compactification code can spit that out easily. But then, at least in this example, I would probably have needed some more domain-specific values (depths etc.) anyways, so writing a quick traversal was fine.

Joachim Breitner (Sep 24 2023 at 20:09):

100% tests passed, 0 tests failed out of 1727

That was a pleasant sunday night hacking session. Let's see if CI agrees… and what the speedcenter will say.

Joachim Breitner (Sep 24 2023 at 20:10):

Oh, and I clicked the wrong stream; this should of course be #lean4 dev .

Joachim Breitner (Sep 24 2023 at 20:55):

CI is happy. Can someone !bench lean4#2577

Joachim Breitner (Sep 24 2023 at 21:26):

(Deleted)

Sebastian Ullrich (Sep 25 2023 at 07:31):

These savings look great! I just want to caution that modifications to such a core data structure might take some time to get reviewed. In fact, it is not clear whether this should be the same data structure as the use cases are a bit different - instance and simp discr trees are built at the beginning of the file, extended dynamically afterwards, and queried thousands and maybe even millions of times. Meanwhile the library_search cache is built once in CI and its query frequency is probably measured in seconds or minutes.

Joachim Breitner (Sep 25 2023 at 08:46):

Absolutely! I’d hope that this change is an improvement for all use cases, but that doesn’t prevent further specialization later on.

Sebastian Ullrich (Sep 25 2023 at 08:47):

But I should also mention, I'm still not sure whether it makes sense to have keys of this length in practice! I suspect no-one knows an answer to that question right now.

Joachim Breitner (Sep 25 2023 at 12:02):

The library search cache went down in size from 237 to 58MB.
I'm curious if mathlib build times will be affected; what needs to happen for this to show up in the speedcenter?

Storing only shorter keys would be orthogonal to a more performant data structure, right?

Scott Morrison (Sep 25 2023 at 13:02):

Wow, okay, 237mb to 58mb is great even if just for the speed-up in lake exe cache get.

Jovan Gerbscheid (Sep 25 2023 at 13:14):

I am working on my own version of DiscrTree, based on the one in Lean. My understanding is that since the keys in the Tree also store an arity, For any given sequence of keys one can determine whether it is a completed sequence, or needs to be continued, and values are only stored at the end of a completed sequence. This means that in the .values constructor, the additional Trie would be redundant?

Joachim Breitner (Sep 25 2023 at 13:21):

You are saying that values will always be at leafs only? That could be a mild simplification indeed; also the .empty constructor could go.

Jovan Gerbscheid (Sep 25 2023 at 13:22):

Yes, indeed

Joachim Breitner (Sep 25 2023 at 13:25):

How does your version differ from the one in lean?

Jovan Gerbscheid (Sep 25 2023 at 13:32):

I added the key constructors .bvar, .forall and .lam, and replacing .arrow. This allows the DiscrTree to look under binders, and for example it can match the left side of the library result that ∑ n in Finset.range N, n = N * (N - 1) / 2,

Joachim Breitner (Sep 25 2023 at 13:33):

Nice. Sounds orthogonal to the tweaks to Trie that I worked on?

Joachim Breitner (Sep 25 2023 at 13:34):

And do you know the “Triemaps that match” paper by SPJ and @Sebastian Graf? Might be relevant.

Joachim Breitner (Sep 25 2023 at 13:35):

(A quick traversal of the library_search DiscrTree confirms your observation that values are always at the leafs.

Jovan Gerbscheid (Sep 25 2023 at 13:36):

Additionally I want to add the possibility for duplicate metavariables/stars, so that a result like a + a = 2*a does not match with an expression 1 + 2. And so that homogeneous multiplication results do not show up when searching with a heterogenous multiplication.

Jovan Gerbscheid (Sep 25 2023 at 13:37):

I also keep track of a score for how good a match is. The score increases by 1 for each succesful match of two non-star keys.

Jovan Gerbscheid (Sep 25 2023 at 13:37):

Joachim Breitner said:

Nice. Sounds orthogonal to the tweaks to Trie that I worked on?

Yes, the changes can be combined

Jovan Gerbscheid (Sep 25 2023 at 13:38):

Joachim Breitner said:

And do you know the “Triemaps that match” paper by SPJ and Sebastian Graf? Might be relevant.

I'll have a look

Jovan Gerbscheid (Sep 25 2023 at 14:15):

Another observation I made was that when inserting a value into a node of a DiscrTree, where the values are stored in an Array, it loops through the whole Array to check if the new value is equal to any present value. If the Array is long this can be quite inefficient. I think it could save some time if this check is avoided, or if a set structure is used instead for the lookup. But if the cache is already built then it makes no difference, so maybe this doesn't matter.

Joachim Breitner (Sep 25 2023 at 14:34):

Do we even expect a large number of values at the same position in the tree?

Jovan Gerbscheid (Sep 25 2023 at 14:49):

The most general patterns of [*] and {Eq, *, *, *] should be filtered out before insertion in the tree, but I decided to remove this filter in my case, so then I do get a large number of values there. I guess for other patterns it wouldn't be that large.

Joachim Breitner (Sep 25 2023 at 14:55):

I guess patience isn't always my greatest strength :-)… can I somehow get a speedcenter comparison for mathlib for this refactoring (<https://github.com/leanprover-community/mathlib4/compare/nightly-testing...lean-pr-testing-2577>)?
If the DiscrTree is used by type class search, and type class search is where lots of time is spent, I wonder if it will be visible.

Joscha Mennicken (Sep 25 2023 at 14:59):

Both of those commits can't be reached from a tracked branch (only master is tracked for mathlib4) and must be added to the queue manually before they can be compared

Joachim Breitner (Sep 25 2023 at 15:00):

Do I need to open a PR so that one can type !bench, or can commits be added otherwise easily?

Joscha Mennicken (Sep 25 2023 at 15:01):

With admin access to the velcom instance, you can add them via the web UI

Joachim Breitner (Sep 25 2023 at 15:01):

(Guess a477df946a0648231b66e5a7f0f04839546ebf25 is the base commit and 0ab85118a8f9528bab204b96b914851803d27bd8 is the feature commit.)

Jireh Loreaux (Sep 25 2023 at 15:46):

@Matthew Ballard I think you know how to get speedcenter to run against alternate versions of core like Joachim is wanting. Can you point him to the relevant process or describe it for him?

Matthew Ballard (Sep 25 2023 at 16:07):

Of course.

Assume you have a toolchain released from your fork
Open mathlib4 PR with your modified toolchain
Wait for CI to finish
Comment !bench and the bot will take it from there

Matthew Ballard (Sep 25 2023 at 16:08):

(Read fast so assuming this is to benchmark the effect of a toolchain change on mathlib)

Joachim Breitner (Sep 25 2023 at 16:16):

Ah, that's it? For some reason I thought that !bench is a privileged command

Matthew Ballard (Sep 25 2023 at 16:17):

Thankfully it is still available to the masses

Joachim Breitner (Sep 25 2023 at 16:18):

And if the base commit that I want to compare against is on branch nightly-testing, do I have to !bench that explicitly, or is nightly-testing benched anyways?

Matthew Ballard (Sep 25 2023 at 16:19):

Branches are not benched automatically.

Matthew Ballard (Sep 25 2023 at 16:20):

So you if you want two commits outside master you will want to do each

Joachim Breitner (Sep 25 2023 at 16:25):

Great, thanks, got the into the queue

Mauricio Collares (Sep 25 2023 at 16:37):

Joachim Breitner said:

Ah, that's it? For some reason I thought that !bench is a privileged command

It is in the lean4 repo, but not in mathlib4.

Joachim Breitner (Sep 25 2023 at 17:42):

3.5% compile time improvement:
http://speed.lean-fro.org/mathlib4/compare/e7b27246-a3e6-496a-b552-ff4b45c7236e/to/ee9e4caf-7a54-4dc1-a6d9-19101bc7d2d9?hash1=a477df946a0648231b66e5a7f0f04839546ebf25
Not sure how stable these numbers are, if that's a significant difference, and generally how to get a sense of these changes.

Matthew Ballard (Sep 25 2023 at 17:47):

That is definitely outside general noise though you can look through the significant runs on the benchmark page to see isolated jumps and drops in compile time.

Jovan Gerbscheid (Sep 25 2023 at 18:19):

@Joachim Breitner, in your new design, you split the original .node constructor into separate .values and .branch constructor, whereas the original design had the values and branches in the same constructor, (and always exactly one of these was an empty Array). This choice for splitting seems independent from the performance improvement that comes with the use of paths. One can have paths without multiple contructors like this:

inductive Trie α β where
  | .node (path : Array α) (values : Array β) (children : Array (α × Trie α β))

Is any reason for your design over this one?

Joachim Breitner (Sep 25 2023 at 18:20):

Just my intuition that most nodes have no values, so it seems more compact that way, and less to check for on non-value nodes.

Jovan Gerbscheid (Sep 25 2023 at 18:28):

In the lookup algorithm, we always know whether the Trie we are considering is a leaf or not, so we never actually look in the values of a non-leaf, and we never look in the children of a leaf. In your design, when you reach a leaf, you need check that the constructor is .values, and otherwise throw a panic! message. And when you are not at a leaf, you need to panic! if the constructor is .values. But I don't know if having panic! exceptions is considered a bad thing.

Joachim Breitner (Sep 25 2023 at 18:32):

Not sure I follow; no panics in my code?
You walk down the path, and if there is no .value, then vs = #[] and you return that.

I wrote the trie to support values in inner nodes - is that causing confusion?

Joachim Breitner (Sep 25 2023 at 18:35):

If you have the invariant that in .node, always either values or children is empty, then isn't that a clear indication that two separate constructors would be more natural, wouldn't it?

Jovan Gerbscheid (Sep 25 2023 at 18:43):

I suppose that my suggestion also supports values in inner nodes, but since these never appear, we get the invariant that either values or children is empty.

Jovan Gerbscheid (Sep 25 2023 at 18:44):

returning that vs = #[] also works, but this should be unreachable code.

Joachim Breitner (Sep 25 2023 at 18:48):

With the values in leafs invariant (and the all-Tries-nonempty), I'd still keep .values separate from inner nodes (.branch and .path), but drop .empty and remove the child from .values.

Jovan Gerbscheid (Sep 25 2023 at 18:50):

Joachim Breitner said:

If you have the invariant that in .node, always either values or children is empty, then isn't that a clear indication that two separate constructors would be more natural, wouldn't it?

I'm not entirely certain, because I feel like if there are multiple constructors, then matching on them must give us some information, but in this case we already know which constructor should come up, so it feels like a waste of time to match on the constructor of the Trie. For a the path the same thing holds: in stead of having to match on whether there is a path or not, we simply always loop through the path, and if there should be no path then it is the empty Array, so looping doesn't cost any extra time.

Jovan Gerbscheid (Sep 25 2023 at 18:52):

But maybe storing an empty array for each node is a worse thing than doing an extra match.

Joachim Breitner (Sep 25 2023 at 18:54):

I see what you mean. Guess one would have to measure it!

Joachim Breitner (Sep 25 2023 at 18:56):

With the introduction of paths, one would have to check anyways on inner nodes, and these are the majority of nodes traversed, so I'm not bothered about the constructors. (Putting paths and branches in one constructor means you have to check many ks if it's empty, which is likely more expensive than checking the constructor tag.

Jovan Gerbscheid (Sep 25 2023 at 19:20):

I suggest that for paths we also put them in the same constructor. In the case where there is no path this would mean doing a loop through an empty Array. Do you mean that traversing an empty array could be more expensive?

Jovan Gerbscheid (Sep 25 2023 at 19:30):

But from a readability standpoint, the multiple constructors approach seems easier to understand.

Joachim Breitner (Sep 25 2023 at 20:24):

Yes, I expect its more efficient to do a branch on the constructor tag (directly present) instead of having larger nodes (one extra word) and traversing empty arrays (one pointer away, possibly a non-inlined function calls). The difference is probably very small, though.

Last updated: May 02 2025 at 03:31 UTC