Zulip Chat Archive

Stream: general

Topic: Time complexity of simp


Patrick Johnson (Jan 23 2023 at 18:09):

What is the time complexity of simp search? If there are nn theorems marked as @[simp] and the "size" of the statement being simplified is mm, I guess the time complexity is O(mlogn)O(m\log n), is that correct?

Martin Dvořák (Jan 23 2023 at 19:15):

Are you assuming that the proof has exactly one simplification step?

Patrick Johnson (Jan 23 2023 at 19:21):

Yes

Shreyas Srinivas (Jan 23 2023 at 19:47):

Patrick Johnson said:

What is the time complexity of simp search? If there are $n$ theorems marked as @[simp] and the "size" of the statement being simplified is $m$, I guess the time complexity is $O(m\log n)$, is that correct?

Where does the log n comes from?

Shreyas Srinivas (Jan 23 2023 at 19:47):

Does simp use some binary search? If so, with what order?

Patrick Johnson (Jan 23 2023 at 20:16):

Shreyas Srinivas said:

Where does the log n comes from?

Each expression can be serialized to a list of identifiers (global identifiers, local lambda arguments, metavariables). The simp set is a map from expressions (represented as lists; patterns) to proofs (or other simp info), which can be represented as a tree. Searching in such tree has O(logn)O(\log n) time complexity.

Patrick Johnson (Jan 23 2023 at 20:18):

This is how it could be implemented. I have no idea how the actual implementation is done.

Shreyas Srinivas (Jan 23 2023 at 20:31):

Patrick Johnson said:

Shreyas Srinivas said:

Where does the log n comes from?

Each expression can be serialized to a list of identifiers (global identifiers, local lambda arguments, metavariables). The simp set is a map from expressions (represented as lists; patterns) to proofs (or other simp info), which can be represented as a tree. Searching in such tree has $O(\log n)$ time complexity.

There are two flaws I can see with this argument. You are assuming that you only have to search on root to leaf path (or constantly many). That only makes sense if you have some criterion for choosing between multiple child nodes at each node. The second flaw is this: Unless your goal is very specific to the point that the simp set tree can be pruned in this manner, you don't get O(log n)

Patrick Johnson (Jan 24 2023 at 06:27):

But we do have a criterion for choosing between multiple child nodes at each node. Maybe you don't understand how the data structure I described looks like? In theory, we may follow multiple paths simultaneously, but the number of paths would be proportional to the number of metavariables in the patterns, which can be considered a constant. If you are interested in how logn\log n simplification search would work in practice, I can provide you a link to the implementation (in a different theorem prover).

Can I get the answer from someone who knows how Lean's simp work? Does simp try to match every possible theorem from the simp set with every possible sub-expression from the statement being simplified (which would be O(mn)O(mn)), or does it use some more clever approach?

Shreyas Srinivas (Jan 24 2023 at 07:00):

Patrick Johnson said:

But we do have a criterion for choosing between multiple child nodes at each node. Maybe you don't understand how the data structure I described looks like? In theory, we may follow multiple paths simultaneously, but the number of paths would be proportional to the number of metavariables in the patterns, which can be considered a constant. If you are interested in how $\log n$ simplification search would work in practice, I can provide you a link to the implementation (in a different theorem prover).

Can I get the answer from someone who knows how Lean's simp work? Does simp try to match every possible theorem from the simp set with every possible sub-expression from the statement being simplified (which would be $O(mn)$), or does it use some more clever approach?

Are you talking about a trie like structure?

Patrick Johnson (Jan 24 2023 at 07:03):

yes

Yaël Dillies (Jan 24 2023 at 07:34):

iff and eq lemmas are indexed by simp depending on the head of their LHS. iff lemmas are preprocessed to become equalities (this changed semirecently, so I might be wrong for this bit).

Yaël Dillies (Jan 24 2023 at 07:38):

I don't think it's indexed any further, so the time complexity should be something like O(mn)O(m \sqrt n), assuming there are about as many different heads as there are lemmas per head. If we were doing a more careful analysis, we would probably find that the number of lemmas per head follows some kind of Zipf law, with a few being very commonplace, eg docs#coe_fn which shows up as the head of all hom coercion lemmas.

Junyan Xu (Jan 24 2023 at 08:33):

some speculations here: https://leanprover.zulipchat.com/#narrow/stream/113538-CI/topic/build.20time.20bot/near/304405602

Jannis Limperg (Jan 24 2023 at 14:08):

In Lean 4, simp uses a discrimination tree, which I think is close to the data structure you're describing. In Lean 3, it doesn't (it uses 'keyed matching' instead), which is why Lean 3 simp doesn't cope well with large lemma databases.

However, even with discrimination trees, we must do additional work because the matching should be module defeq. In Lean (3 and 4), simp uses reducible transparency and by convention there are no reducible recursive definitions, which makes the matching tractable: basically, you normalise both the pattern expression (before you insert it into the discrimination tree) and the query expression (when you do the lookup). But this means that there's no straightforward complexity bound on the search.


Last updated: Dec 20 2023 at 11:08 UTC