Zulip Chat Archive

Stream: general

Topic: What is the fastest known data rep for "searching math"?

Daniel Donnelly (Jun 29 2022 at 20:53):

Regardless of what Lean4 uses, you probably have some idea about how you would code a data structure representing expressions in math faithfully and comprehensively, in such a way that say searching for applicable theorems to some givens has optimal runtime. There are of course more things you'd also want to search about when doing math, so what I gave was just an example. My idea is put everything into one large graph db (but one in memory and coded in C++ or D). Some nodes may have arrows that point to child nodes in some expression while other arrows represent implications and so on... Anyway, I don't think that would be very fast to search, because I made a q.uiver.app CD diagram to Neo4j graph db saver /loader and it was slow as hell, with nearly nothing in the db. So I am on a quest to know what the fastest searchable representation of math is. That we are aware of.

Henrik Böving (Jun 29 2022 at 20:56):

In Lean 4 (I know you said regardless but still, its a general approach :P) we have discrimination trees to build maps from expr -> whatever, to be precise docs4#Lean.Meta.DiscrTree, which is at least quite good but i would assume not the best solution there is.

Daniel Donnelly (Jun 29 2022 at 20:59):

@Henrik Böving First off, there's not much info on google about discr trees. Secondly, you didn't explain why it was a good design for Lean4.

Henrik Böving (Jun 29 2022 at 21:00):

If you want to talk about why the proof search is the way it is and so on Leo is the expert on that i only have very superficial knowledge.

Jason Rute (Jun 29 2022 at 21:20):

@Daniel Donnelly It all depends on what you are searching for. If you are searching for an exact string match then use a hash table. If you searching up to alpha equivalence, or are searching for expressions that unify with a query expression containing metavariables then I bet what Lean 4 does is really efficient. If you want a more general type of equivalence used in the matching, like logical equivalence or some vague idea of similar formulas, then you are getting into active research areas probably.

Alexander Bentkamp (Jun 30 2022 at 12:30):

If you want to know more about discrimination trees and similar term indexing data structures, have a look at Chapter 26 of the Handbook of Automated Reasoning (Vol. 2) "Term Indexing" by Sekar, Ramakrishnam, and Voronkov.

Last updated: May 02 2025 at 03:31 UTC