Zulip Chat Archive
Stream: Is there code for X?
Topic: Searching doc-strings
Damiano Testa (Nov 28 2023 at 08:54):
Is there a command that would find all declarations whose doc-strings contain a given sub-string? Or all files where a given string appears, maybe in the documentation?
The motivation comes from teaching: when looking for a theorem, say the fundamental theorem of calculus, is there something like
#grep "Fundamental Theorem"
that would return something meaningful?
For the moment, just a substring matching would be great!
If the string can be a regex, that would be even better! Or maybe simply ignoring upper/lower-cases would already be useful.
Damiano Testa (Nov 28 2023 at 08:55):
Something to replace the find in VSCode that does not seem to search inside the build/.lake
directory in a Lean project.
Shreyas Srinivas (Nov 28 2023 at 08:58):
Moogle and loogle are the closest I can think of
Shreyas Srinivas (Nov 28 2023 at 08:59):
At least for loogle there are plans for some form of local searching
Damiano Testa (Nov 28 2023 at 08:59):
Yes, I told my students about them, and they are good, but you need to leave your editor to use them.
Shreyas Srinivas (Nov 28 2023 at 08:59):
There is a vscode loogle extension
Shreyas Srinivas (Nov 28 2023 at 08:59):
I wrote it and released it three weeks ago
Damiano Testa (Nov 28 2023 at 09:00):
Does loogle look at doc-strings, though?
Shreyas Srinivas (Nov 28 2023 at 09:00):
I have had mixed results
Shreyas Srinivas (Nov 28 2023 at 09:01):
Here is the link to the extension: https://marketplace.visualstudio.com/items?itemName=ShreyasSrinivas.loogle-lean
Damiano Testa (Nov 28 2023 at 09:01):
Once I'm at a computer I'll see if the extension does doc-strings search, thanks.
Shreyas Srinivas (Nov 28 2023 at 09:02):
The extension is a loogle front end. Sometimes you search for something simple and loogle offers some suggestions.
Shreyas Srinivas (Nov 28 2023 at 09:03):
The extension does the same thing. But I think (cc: @Joachim Breitner ) loogle can be made to work with docstrings if it doesn't already do so
Damiano Testa (Nov 28 2023 at 09:04):
@loogle "fundamental"
loogle (Nov 28 2023 at 09:04):
:search: Ordinal.IsFundamentalSequence, Ordinal.IsFundamentalSequence.blsub_eq, and 286 more
Damiano Testa (Nov 28 2023 at 09:04):
@loogle "fundamental", "calculus"
loogle (Nov 28 2023 at 09:04):
:shrug: nothing found
Shreyas Srinivas (Nov 28 2023 at 09:07):
I checked now. Loogle seems to expect something similar to lean terms. But search by docstrings should definitely be feasible.
Shreyas Srinivas (Nov 28 2023 at 09:07):
Moogle does this kind of search better
Shreyas Srinivas (Nov 28 2023 at 09:08):
Damiano Testa (Nov 28 2023 at 09:08):
Yes, this is exactly what I told my students. I was wondering if a Lean command for searching doc-strings was available or not.
Damiano Testa (Nov 28 2023 at 09:12):
Also, loogle and moogle do not work for finding results in your own project, I think.
Shreyas Srinivas (Nov 28 2023 at 09:12):
Currently nope. I am not aware of a moogle extension yet (we'd need their permission to use their APIs). The loogle extension is the closest to an in-vscode search other than #find.
Damiano Testa (Nov 28 2023 at 09:13):
Ok, I'll piece something together myself, thanks!
Scott Morrison (Nov 28 2023 at 09:17):
The doc-strings are all available via metaprogramming, e.g. src#Lean.findDocString?
Scott Morrison (Nov 28 2023 at 09:19):
And we have KMP: src#String.Matcher
Scott Morrison (Nov 28 2023 at 09:19):
Go wild. :-)
Joachim Breitner (Nov 28 2023 at 09:19):
The documentation for loogle is at https://loogle.lean-lang.org/; TL;DR: It searches for lemma name substrings (if you use "…"
), mentioned constants or subexpressions.
It does not look at the docstrings. Maybe that will come in due course, but at the moment I’m shying away from the complexity (the necessary natural language processing and indexing, showing the docstrings in the results prettily with formatting and search keyword highlighting etc.). I’m inclined to postpone that to when and if loogle gets integrated into the documentation site properly.
You can run loogle locally, but it requires a few manual steps.
Damiano Testa (Nov 28 2023 at 09:21):
Ok, thanks! I suspected that loogle was not the tool for that.
I'll try out some simple-minded text search on doc-strings and will report if it seems useful!
Damiano Testa (Nov 28 2023 at 09:22):
Currently, my personal approach is to use grep
, which of course is awesome, but not all students use the command line.
Eric Rodriguez (Nov 28 2023 at 09:29):
I just use the ctrl+shift+f vscode search for docstring searching, with some regex magic here and there and making sure the cog is off (although that's not as good out of the box as it was in lean3 :/)
Damiano Testa (Nov 28 2023 at 09:31):
Eric, does that work with the mathlib files in a project depending on mathlib? I thought that I tried and it did not work.
Shreyas Srinivas (Nov 28 2023 at 09:57):
@Damiano Testa : In case you build a search feature for docstrings, I'd love to integrate into the extension :)
Damiano Testa (Nov 28 2023 at 09:59):
Ok, I will be busy for the rest of the day, but this is what I would like to have, except with lots of parameters!
import Mathlib.MeasureTheory.Integral.FundThmCalculus
open String
open Lean Meta Elab Command Tactic
#eval show CoreM _ from do
let doc ← findDocString? (← getEnv) ``Continuous.deriv_integral
let res := Option.isSome <| (Matcher.ofString "Fundamental theorem").find? doc.get!
if res then
dbg_trace doc.get!
return ()
Floris van Doorn (Nov 28 2023 at 10:00):
Damiano Testa said:
Eric, does that work with the mathlib files in a project depending on mathlib? I thought that I tried and it did not work.
It doesn't work by default, because the lake-packages
is in your .gitignore
and VSCode doesn't search through those by default. But that is a setting that you can change.
Damiano Testa (Nov 28 2023 at 10:00):
Basically, adding a loop through the decls in the environment, reporting matches and probably doing a more refined filtering.
Damiano Testa (Nov 28 2023 at 10:01):
Floris, removing it from gitignore though would show all those files in the git diff, right?
Eric Wieser (Nov 28 2023 at 10:01):
There's a button in the search pane you can click to search even in ignored files
Damiano Testa (Nov 28 2023 at 10:01):
Ah, I see: I can get VSCode to ignore the gitignore restrictions!
Got it!
Kevin Buzzard (Nov 28 2023 at 12:06):
You switch off the blue cog, you tell VS Code to search only in .lean files, and you have exactly what you want as far as I can see
Mario Carneiro (Nov 28 2023 at 23:14):
Joachim Breitner said:
The documentation for loogle is at https://loogle.lean-lang.org/; TL;DR: It searches for lemma name substrings (if you use
"…"
), mentioned constants or subexpressions.It does not look at the docstrings. Maybe that will come in due course, but at the moment I’m shying away from the complexity (the necessary natural language processing and indexing, showing the docstrings in the results prettily with formatting and search keyword highlighting etc.). I’m inclined to postpone that to when and if loogle gets integrated into the documentation site properly.
You can run loogle locally, but it requires a few manual steps.
IMO this concern is misplaced. We should not let perfect be the enemy of good here - you are up against grep as an alternative, so as long as you can deliver something at least as good (that is, exact substring matching) I don't think it is useful to be blocking this indefinitely waiting for some natural language processing thing.
Mario Carneiro (Nov 28 2023 at 23:15):
#find "foo"
should just search for foo
appearing in lemma names and docstrings, and defer the formatting and search keyword highlighting until later
Joachim Breitner (Nov 29 2023 at 08:17):
Well, at least you need a way to say
#find "and"
and no get all the lemmas that have “and” in the docstring. So some form of qualifier to indicate whether this should be search in names, or docstrings, or both.
And it would be really bad UX to if you loogle something and it gives you results that, at first glance, have nothing to do with the query. I guess I could add a “Docstring mentions foo
” comment. Or show the docstring in its markdown form.
Damiano Testa (Nov 29 2023 at 08:29):
First of all, for my purpose, the VSCode search solved my issue.
However, you could have a #find "and" in docs
syntax, with the in
optional and maybe taking a *
option to search both terms and docs.
Mario Carneiro (Nov 29 2023 at 08:32):
You could rank the ones that only have a hit in the docstring below label hits. As long as you say somewhere in the documentation that "foo"
searches in the label and comment, I don't think it will be too confusing if the word foo
doesn't appear in the label alone
Damiano Testa (Nov 29 2023 at 08:33):
Or even #find <term> with "string" in docs
, to filter both term-like conditions and grep-like ones.
Mario Carneiro (Nov 29 2023 at 08:35):
If there are too many docs hits you could also just completely suppress it with a message like "very common word and
was hidden from search results". But adding more bells and whistles to the search syntax is probably not good UX, most people won't learn it or will have to look it up every time. At least, assume that people won't be using it most of the time
Mario Carneiro (Nov 29 2023 at 08:37):
IMO unquoted identifiers should be treated as label searches, the current fully qualified only syntax is way too cumbersome
Joachim Breitner (Nov 29 2023 at 10:22):
Damiano Testa said:
Or even
#find <term> with "string" in docs
, to filter both term-like conditions and grep-like ones.
Do you mean like #find <term>, "string"
with the current syntax (you can comma-separate any number of “filters” or “queries” to form the conjunction), or something else?
(I guess that proves Mario’s points that users will want to use it without knowing about the query syntax… :-))
Joachim Breitner (Nov 29 2023 at 10:25):
Mario Carneiro said:
IMO unquoted identifiers should be treated as label searches, the current fully qualified only syntax is way too cumbersome
What do you mean by label? The last component of a name?
Not that if you search for Name
loogle will give you a list of suggestions with possible qualified names (like Lean.Name
) to refine the search, and this even works in terms. (Would be great if this worked in Lean as well :-D)
But proactively doing a “you probably meant Lean.Name” might be an alternative that I have been pondering.
Mario Carneiro (Nov 29 2023 at 10:54):
label meaning theorem names. That is, #find foo
should search for theorems containing foo
in the name, #find "foo"
should search for theorems containing foo
in the docstring
Damiano Testa (Nov 29 2023 at 12:02):
Joachim, what I had in mind was #find (current syntax for filtering terms) with (add your string match here) in docs
. So, for example, you may want to do
#find Real.exp, LE.le 0 _, |- LE.le _ _ with "intermediate" in docs
and find docs#Real.add_one_le_exp_of_nonneg. However, I agree with having a simpler syntax and it took me quite a few attempts to form the example above!
Let me also emphasize: this is certainly not an immediate concern for me! I am already very happy with loogle, although I do still find it hard to use. For this reason, proposing further syntax enhancements may even be counterproductive!
I like Mario's simpler version, where the doc-string matches are simply mentioned after the term-matches.
Damiano Testa (Nov 29 2023 at 12:03):
So, in essence, I would view the with ... in docs
as syntax for saying "and now, sift the term-matches with this grep-like filter on the doc-strings".
Joachim Breitner (Nov 29 2023 at 12:21):
Hmm, but why not
#find Real.exp, LE.le 0 _, |- LE.le _ _ , "intermediate" in docs
how is your with
different from a conjunction of queries?
Joachim Breitner (Nov 29 2023 at 12:22):
I am already very happy with loogle, although I do still find it hard to use.
Happy to hear about the first half, but what makes it hard to use?
Shreyas Srinivas (Nov 29 2023 at 12:24):
I think including the doc strings in the api could alleviate some of this. Of course it doesn't quite let you search by docs, but at least for the vscode client (and possibly also nvim and emacs), filtering results by docstrings should be possible
Shreyas Srinivas (Nov 29 2023 at 12:28):
Currently it is already possible to filter loogle search hits by the label, module name, and type signature
Shreyas Srinivas (Nov 29 2023 at 15:41):
(deleted)
Last updated: Dec 20 2023 at 11:08 UTC