Zulip Chat Archive
Stream: general
Topic: feedback on LeanSearch
Kim Morrison (Apr 14 2025 at 00:46):
I've been having trouble with leansearch.net recently --- and I'm getting suspicious that it never returns results from the standard library!
e.g.
should surely show List.perm_iff_count, but everything it mentions is from Mathlib, nothing from the standard library!
@Jiang Jiedong?
Jiang Jiedong (Apr 14 2025 at 16:59):
@Kim Morrison Thank you for your feedback! The standard library isn't currently included in the database.
I've been discussing this issue with the current maintainer @Tony Beta Lambda of LeanSearch, as there are varying user needs:
- Some want Mathlib-only searches (more presicely, mathematical declarations only, excluding things like tactic parsers)
- Others want broader searches including metaprogramming and other libraries
Our current consideration is to:
- Expand the database coverage (but a simple expansion may affect the searching quality)
- Add configurable toggles for showing different declaration groups in searching results (these declaration groups could be: metaprogramming-related, Batteries library, instances, etc.)
We'd greatly appreciate any suggestions on:
- What toggles would be most useful
- Any alternative solutions beyond toggles
Thanks for bringing this up!
Jiang Jiedong (Apr 14 2025 at 17:04):
@Evgenia Karunus also discussed some related ideas with me.
Eric Wieser (Apr 14 2025 at 17:04):
If you want to exclude things like tactic parsers / metaprogramming, then probably dropping (or rather, filtering) things in the Lean namespace is a reasonable approximation. Dropping based on which folder things are in probably isn't a good heuristic.
Kim Morrison (Apr 14 2025 at 22:54):
Yes, you certainly want to index and return everything from Init.Data and Std.Data!
Siddhartha Gadgil (Apr 15 2025 at 01:11):
As a concrete example, for "If n is less than or equal to m, then $n+1$ is less than or equal to $m+1$" the result Nat.succ_le_succ does not show up and is the result I usually want. This is in the Lean core.
Kim Morrison (Apr 15 2025 at 01:12):
(Well, you want by omega, but sure. :-)
Alok Singh (Apr 15 2025 at 01:38):
Well, _I_ as a user want Std and Batteries to be searched.
Kim Morrison (Apr 15 2025 at 06:48):
LeanSearch is down.
Adam Topaz (Apr 15 2025 at 16:17):
I think it's also safe to say that any proposition no matter where it comes from should be indexed.
Yaël Dillies (Apr 15 2025 at 16:18):
Really? What about the lemmas internal to eg linarith?
Tony Beta Lambda (Apr 17 2025 at 04:07):
It is apparent that everyone has different needs, so instead of deciding it for everyone, how about we discuss and find out what set of filter options should be available to the user? Right now I can figure: packages to include, namespace, kind (tactic/prop/def).
Kim Morrison (Apr 17 2025 at 04:10):
I don't think there's that much variation asked for above.
Everyone wants everything that isn't internal to a tactic, or internal to Lean. (e.g. filter just by having .Tactic. in the namespace, or Lean.)
No one wants "auxiliary declarations", e.g. things starting with _.
I think just a single "metaprogramming" filter, and including the repositories upstream of Mathlib, would get most of the way.
Evgenia Karunus (May 24 2025 at 17:38):
I'm PRing search over metaprogramming functions to LeanSearch, so thinking about what declarations should be in "maths db" vs what declarations should be in "metaprogramming db".
I thought about sorting declarations in Lean and Mathlib into these 2 groups based on the presence of Lean namespace like @Eric Wieser suggested, but e.g. Mathlib.Util.isCompiled (link) is a nice function that wouldn't make it into the "metaprogramming db" by the Lean-namespace heuristic.
The only other heuristic that comes to mind is the by-folder heuristic, meaning we'd get:
Maths db
<path_to_mathlib> Mathlib [EXCLUDE:Mathlib/Lean, Mathlib/Util, MathlibTest, Mathlib/Tactic]
<path_to_lean> Init, Std
Metaprogramming db
<path_to_mathlib> Cache, Mathlib/Lean, Mathlib/Tactic, Mathlib/Util, Mathlib/Control, Shake/Main, scripts
<path_to_lean> Init, Lean, Std, lake [EXCLUDE: lake/tests]
Note: from each of these modules, we want only
defs for the programming db.
Does that seem like a sensible heuristic/are there some other heuristics that come to mind?
Evgenia Karunus (May 24 2025 at 17:40):
One might just index everything and rely on semantic search to figure it out by the search query, but that will worsen relevance/signal-to-noise. On the other hand, if determining what's useful for metaprogramming VS what's useful for formalization is too unreliable, then indexing everything and leaving this separation to embeddings might still be a good solution.
Yaël Dillies (May 24 2025 at 17:54):
Evgenia Karunus said:
are there some other heuristics that come to mind?
Nothing in the Mathlib namespace is mathematics, so maybe the heuristic should be "Either Lean or Mathlib namespace" (and maybe Batteries too)
Evgenia Karunus (May 24 2025 at 18:39):
If we were only indexing declarations with Mathlib in their namespace for the "metaprogramming db", then we'd miss e.g. these:
<path_to_mathlib> /Cache
Cache.IO.ModuleHashMap.filterExists (hashMap : ModuleHashMap) (keep : Bool) : IO ModuleHashMap
<path_to_mathlib> /Mathlib/Control/Monad/Basic
StateT.eval.{u, v} {α σ : Type u} {m : Type u → Type v} [Functor m] (cmd : StateT σ m α) (s : σ) : m α
Lean-Mathlib-namespace heuristic sounds useful to avoid including excessive definitions into "maths db" as an additional check on top of the by-folder heuristic.
Damiano Testa (Aug 29 2025 at 11:58):
While using LeanSearch, I noticed something funny. I used the query
A simple group that is commutative is cyclic
and the very first hit was exactly what I wanted: docs#IsSimpleGroup.isCyclic.
So far, everything is great!
However, what I think is the LLM generated description of this result says
Finite Simple Abelian Groups are Cyclic
Every finite simple abelian group is cyclic.
And my question: why Finite?
Yakov Pechersky (Sep 08 2025 at 00:02):
I noticed today that the results are no longer using notation. It has explicit Iff
Aaron Hill (Dec 14 2025 at 04:06):
I'm getting 'Rate limit exceeded: 1 per 1 second' errors very frequently whenever I try to use LeanSearch
Nick_adfor (Dec 14 2025 at 04:38):
Aaron Hill said:
I'm getting 'Rate limit exceeded: 1 per 1 second' errors very frequently whenever I try to use LeanSearch
It's just a bug. Everytime I meet this, I will close LeanSearch and open https://www.moogle.ai/
Kim Morrison (Dec 16 2025 at 00:14):
Aaron Hill said:
I'm getting 'Rate limit exceeded: 1 per 1 second' errors very frequently whenever I try to use LeanSearch
@Tony Beta Lambda, @Jiang Jiedong, LeanSearch is down still.
Matt Diamond (Dec 16 2025 at 00:28):
it works for me... however it would be hilarious if only one person could use leansearch.net at a time or else it hits rate limits
Matt Diamond (Dec 16 2025 at 00:28):
they should implement a queue mechanism to avoid rate limiting if that's the issue
Matt Diamond (Dec 16 2025 at 00:34):
also I find it very amusing that @Nick_adfor, who is working on LeanSearch, has responded to rate limit complaints with "it's just a bug, when it happens I use some other software"
I would hope that Nick means "we're working on it, but in the meantime you can use Moogle" but it's just a funny way to respond
Nick_adfor (Dec 16 2025 at 02:48):
Matt Diamond said:
also I find it very amusing that Nick_adfor, who is working on LeanSearch, has responded to rate limit complaints with "it's just a bug, when it happens I use some other software"
I would hope that Nick means "we're working on it, but in the meantime you can use Moogle" but it's just a funny way to respond
No. The truth is that I work for the part FATE-H. You can see the whole workflow here. https://frenzymath.com/blog/fate/
Nick_adfor (Dec 16 2025 at 02:51):
LeanSearch is a neighborhood's work. It is often convenient to ask for your neighbor's help. But this time seems not so we can change to the community
Jiang Jiedong (Dec 16 2025 at 03:18):
Thank you for the feedback! I’ll talk with @Tony Beta Lambda , who is our maintainer of LeanSearch, about the current issue. Currently, the rate limit is set due to previous outside large search requests beyond the capacity of our server. We suspect that people are trying to benchmarking LeanSearch. Sometimes there are other issues about LeanSearch, but the error message is still the RPM limit. We’ll investigate what is happening now.
Jiang Jiedong (Dec 16 2025 at 03:20):
The webpage of LeanSearch is responding normally now under a small scale test.
Jiang Jiedong (Dec 16 2025 at 03:37):
@Nick_adfor Thank you for your contribution on formalizing the proofs of problems in FATE-H. I am pretty sorry that LeanSearch is not stable enough for your work…
Nick_adfor (Dec 16 2025 at 05:22):
I've heard that LeanSearch is goint to build a better Docs to replace Mathlib's origin Docs. It is a long-term, hard work, but it deserves!
Tony Beta Lambda (Dec 18 2025 at 08:33):
LeanSearch server had a bug where the rate limiter was not correctly identifying client IP, which has now been fixed. Hopefully it resolves the "rate limit exceeded" error.
Mirek Olšák (Dec 18 2025 at 11:02):
I was curious if it is just me. I usually use these AI searches for definitions, not theorems. When I need a theorem, I have a decent idea of what I am searching for -- which global constants should appear there, so Loogle is typically good enough. On the other hand, I need string-based query when I have no idea how the concept I have in mind is represented in Lean, which happens rather with definitions than with theorems
So it is a bit surprising to me how both Moogle & Leansearch is marketed as "find your theorems", when I mostly go there to find my definitions.
Nick_adfor (Dec 18 2025 at 11:33):
Well, there seems another way: to find the associated theorem to help find the definition:)
Last updated: Dec 20 2025 at 21:32 UTC