Zulip Chat Archive

Stream: Is there code for X?

Topic: Searching doc-strings


Damiano Testa (Nov 28 2023 at 08:54):

Is there a command that would find all declarations whose doc-strings contain a given sub-string? Or all files where a given string appears, maybe in the documentation?

The motivation comes from teaching: when looking for a theorem, say the fundamental theorem of calculus, is there something like

#grep "Fundamental Theorem"

that would return something meaningful?

For the moment, just a substring matching would be great!

If the string can be a regex, that would be even better! Or maybe simply ignoring upper/lower-cases would already be useful.

Damiano Testa (Nov 28 2023 at 08:55):

Something to replace the find in VSCode that does not seem to search inside the build/.lake directory in a Lean project.

Shreyas Srinivas (Nov 28 2023 at 08:58):

Moogle and loogle are the closest I can think of

Shreyas Srinivas (Nov 28 2023 at 08:59):

At least for loogle there are plans for some form of local searching

Damiano Testa (Nov 28 2023 at 08:59):

Yes, I told my students about them, and they are good, but you need to leave your editor to use them.

Shreyas Srinivas (Nov 28 2023 at 08:59):

There is a vscode loogle extension

Shreyas Srinivas (Nov 28 2023 at 08:59):

I wrote it and released it three weeks ago

Damiano Testa (Nov 28 2023 at 09:00):

Does loogle look at doc-strings, though?

Shreyas Srinivas (Nov 28 2023 at 09:00):

I have had mixed results

Shreyas Srinivas (Nov 28 2023 at 09:01):

Here is the link to the extension: https://marketplace.visualstudio.com/items?itemName=ShreyasSrinivas.loogle-lean

Damiano Testa (Nov 28 2023 at 09:01):

Once I'm at a computer I'll see if the extension does doc-strings search, thanks.

Shreyas Srinivas (Nov 28 2023 at 09:02):

The extension is a loogle front end. Sometimes you search for something simple and loogle offers some suggestions.

Shreyas Srinivas (Nov 28 2023 at 09:03):

The extension does the same thing. But I think (cc: @Joachim Breitner ) loogle can be made to work with docstrings if it doesn't already do so

Damiano Testa (Nov 28 2023 at 09:04):

@loogle "fundamental"

loogle (Nov 28 2023 at 09:04):

:search: Ordinal.IsFundamentalSequence, Ordinal.IsFundamentalSequence.blsub_eq, and 286 more

Damiano Testa (Nov 28 2023 at 09:04):

@loogle "fundamental", "calculus"

loogle (Nov 28 2023 at 09:04):

:shrug: nothing found

Shreyas Srinivas (Nov 28 2023 at 09:07):

I checked now. Loogle seems to expect something similar to lean terms. But search by docstrings should definitely be feasible.

Shreyas Srinivas (Nov 28 2023 at 09:07):

Moogle does this kind of search better

Shreyas Srinivas (Nov 28 2023 at 09:08):

See your example

Damiano Testa (Nov 28 2023 at 09:08):

Yes, this is exactly what I told my students. I was wondering if a Lean command for searching doc-strings was available or not.

Damiano Testa (Nov 28 2023 at 09:12):

Also, loogle and moogle do not work for finding results in your own project, I think.

Shreyas Srinivas (Nov 28 2023 at 09:12):

Currently nope. I am not aware of a moogle extension yet (we'd need their permission to use their APIs). The loogle extension is the closest to an in-vscode search other than #find.

Damiano Testa (Nov 28 2023 at 09:13):

Ok, I'll piece something together myself, thanks!

Scott Morrison (Nov 28 2023 at 09:17):

The doc-strings are all available via metaprogramming, e.g. src#Lean.findDocString?

Scott Morrison (Nov 28 2023 at 09:19):

And we have KMP: src#String.Matcher

Scott Morrison (Nov 28 2023 at 09:19):

Go wild. :-)

Joachim Breitner (Nov 28 2023 at 09:19):

The documentation for loogle is at https://loogle.lean-lang.org/; TL;DR: It searches for lemma name substrings (if you use "…"), mentioned constants or subexpressions.

It does not look at the docstrings. Maybe that will come in due course, but at the moment I’m shying away from the complexity (the necessary natural language processing and indexing, showing the docstrings in the results prettily with formatting and search keyword highlighting etc.). I’m inclined to postpone that to when and if loogle gets integrated into the documentation site properly.

You can run loogle locally, but it requires a few manual steps.

Damiano Testa (Nov 28 2023 at 09:21):

Ok, thanks! I suspected that loogle was not the tool for that.

I'll try out some simple-minded text search on doc-strings and will report if it seems useful!

Damiano Testa (Nov 28 2023 at 09:22):

Currently, my personal approach is to use grep, which of course is awesome, but not all students use the command line.

Eric Rodriguez (Nov 28 2023 at 09:29):

I just use the ctrl+shift+f vscode search for docstring searching, with some regex magic here and there and making sure the cog is off (although that's not as good out of the box as it was in lean3 :/)

Damiano Testa (Nov 28 2023 at 09:31):

Eric, does that work with the mathlib files in a project depending on mathlib? I thought that I tried and it did not work.

Shreyas Srinivas (Nov 28 2023 at 09:57):

@Damiano Testa : In case you build a search feature for docstrings, I'd love to integrate into the extension :)

Damiano Testa (Nov 28 2023 at 09:59):

Ok, I will be busy for the rest of the day, but this is what I would like to have, except with lots of parameters!

import Mathlib.MeasureTheory.Integral.FundThmCalculus

open String

open Lean Meta Elab Command Tactic
#eval show CoreM _ from do
  let doc  findDocString? ( getEnv) ``Continuous.deriv_integral
  let res := Option.isSome <| (Matcher.ofString "Fundamental theorem").find? doc.get!
  if res then
    dbg_trace doc.get!
  return ()

Floris van Doorn (Nov 28 2023 at 10:00):

Damiano Testa said:

Eric, does that work with the mathlib files in a project depending on mathlib? I thought that I tried and it did not work.

It doesn't work by default, because the lake-packages is in your .gitignoreand VSCode doesn't search through those by default. But that is a setting that you can change.

Damiano Testa (Nov 28 2023 at 10:00):

Basically, adding a loop through the decls in the environment, reporting matches and probably doing a more refined filtering.

Damiano Testa (Nov 28 2023 at 10:01):

Floris, removing it from gitignore though would show all those files in the git diff, right?

Eric Wieser (Nov 28 2023 at 10:01):

There's a button in the search pane you can click to search even in ignored files

Damiano Testa (Nov 28 2023 at 10:01):

Ah, I see: I can get VSCode to ignore the gitignore restrictions!
Got it!

Kevin Buzzard (Nov 28 2023 at 12:06):

You switch off the blue cog, you tell VS Code to search only in .lean files, and you have exactly what you want as far as I can see

Mario Carneiro (Nov 28 2023 at 23:14):

Joachim Breitner said:

The documentation for loogle is at https://loogle.lean-lang.org/; TL;DR: It searches for lemma name substrings (if you use "…"), mentioned constants or subexpressions.

It does not look at the docstrings. Maybe that will come in due course, but at the moment I’m shying away from the complexity (the necessary natural language processing and indexing, showing the docstrings in the results prettily with formatting and search keyword highlighting etc.). I’m inclined to postpone that to when and if loogle gets integrated into the documentation site properly.

You can run loogle locally, but it requires a few manual steps.

IMO this concern is misplaced. We should not let perfect be the enemy of good here - you are up against grep as an alternative, so as long as you can deliver something at least as good (that is, exact substring matching) I don't think it is useful to be blocking this indefinitely waiting for some natural language processing thing.

Mario Carneiro (Nov 28 2023 at 23:15):

#find "foo" should just search for foo appearing in lemma names and docstrings, and defer the formatting and search keyword highlighting until later

Joachim Breitner (Nov 29 2023 at 08:17):

Well, at least you need a way to say

#find "and"

and no get all the lemmas that have “and” in the docstring. So some form of qualifier to indicate whether this should be search in names, or docstrings, or both.

And it would be really bad UX to if you loogle something and it gives you results that, at first glance, have nothing to do with the query. I guess I could add a “Docstring mentions foo” comment. Or show the docstring in its markdown form.

Damiano Testa (Nov 29 2023 at 08:29):

First of all, for my purpose, the VSCode search solved my issue.

However, you could have a #find "and" in docs syntax, with the in optional and maybe taking a * option to search both terms and docs.

Mario Carneiro (Nov 29 2023 at 08:32):

You could rank the ones that only have a hit in the docstring below label hits. As long as you say somewhere in the documentation that "foo" searches in the label and comment, I don't think it will be too confusing if the word foo doesn't appear in the label alone

Damiano Testa (Nov 29 2023 at 08:33):

Or even #find <term> with "string" in docs, to filter both term-like conditions and grep-like ones.

Mario Carneiro (Nov 29 2023 at 08:35):

If there are too many docs hits you could also just completely suppress it with a message like "very common word and was hidden from search results". But adding more bells and whistles to the search syntax is probably not good UX, most people won't learn it or will have to look it up every time. At least, assume that people won't be using it most of the time

Mario Carneiro (Nov 29 2023 at 08:37):

IMO unquoted identifiers should be treated as label searches, the current fully qualified only syntax is way too cumbersome

Joachim Breitner (Nov 29 2023 at 10:22):

Damiano Testa said:

Or even #find <term> with "string" in docs, to filter both term-like conditions and grep-like ones.

Do you mean like #find <term>, "string" with the current syntax (you can comma-separate any number of “filters” or “queries” to form the conjunction), or something else?

(I guess that proves Mario’s points that users will want to use it without knowing about the query syntax… :-))

Joachim Breitner (Nov 29 2023 at 10:25):

Mario Carneiro said:

IMO unquoted identifiers should be treated as label searches, the current fully qualified only syntax is way too cumbersome

What do you mean by label? The last component of a name?

Not that if you search for Name loogle will give you a list of suggestions with possible qualified names (like Lean.Name) to refine the search, and this even works in terms. (Would be great if this worked in Lean as well :-D)
But proactively doing a “you probably meant Lean.Name” might be an alternative that I have been pondering.

Mario Carneiro (Nov 29 2023 at 10:54):

label meaning theorem names. That is, #find foo should search for theorems containing foo in the name, #find "foo" should search for theorems containing foo in the docstring

Damiano Testa (Nov 29 2023 at 12:02):

Joachim, what I had in mind was #find (current syntax for filtering terms) with (add your string match here) in docs. So, for example, you may want to do

#find Real.exp, LE.le 0 _, |- LE.le _ _ with "intermediate" in docs

and find docs#Real.add_one_le_exp_of_nonneg. However, I agree with having a simpler syntax and it took me quite a few attempts to form the example above!

Let me also emphasize: this is certainly not an immediate concern for me! I am already very happy with loogle, although I do still find it hard to use. For this reason, proposing further syntax enhancements may even be counterproductive!

I like Mario's simpler version, where the doc-string matches are simply mentioned after the term-matches.

Damiano Testa (Nov 29 2023 at 12:03):

So, in essence, I would view the with ... in docs as syntax for saying "and now, sift the term-matches with this grep-like filter on the doc-strings".

Joachim Breitner (Nov 29 2023 at 12:21):

Hmm, but why not

#find Real.exp, LE.le 0 _, |- LE.le _ _ , "intermediate" in docs

how is your with different from a conjunction of queries?

Joachim Breitner (Nov 29 2023 at 12:22):

I am already very happy with loogle, although I do still find it hard to use.

Happy to hear about the first half, but what makes it hard to use?

Shreyas Srinivas (Nov 29 2023 at 12:24):

I think including the doc strings in the api could alleviate some of this. Of course it doesn't quite let you search by docs, but at least for the vscode client (and possibly also nvim and emacs), filtering results by docstrings should be possible

Shreyas Srinivas (Nov 29 2023 at 12:28):

Currently it is already possible to filter loogle search hits by the label, module name, and type signature

Shreyas Srinivas (Nov 29 2023 at 15:41):

(deleted)


Last updated: Dec 20 2023 at 11:08 UTC