Zulip Chat Archive

Stream: mathlib documentation

Topic: searching through docs


Polina Boneva (May 15 2021 at 11:55):

Hello, I'm starting this topic in a documentation-only oriented stream because having a written overview (the docs) of everything old and new in mathlib seems important enough on its own, independent from the specifics of lean's version or mathlib's changes in the future. I hope this was the right way to go here, I'm sorry if a new stream is not the best approach - let's change it if there is a better place :)

Just a sidenote - to introduce myself, I am a student at the Vrije Universiteit Amsterdam currently and for the future residing in Sofia, Bulgaria. For my bachelor's finishing lane I took a thesis project that describes the needs of the documentation of a library for formal proofs, mainly for the purposes of destructing, re-organizing in a concise and simple way, and traversing the whole load of data the full library holds for all kinds of users: from novice to experts. As some of you know, there has been some progress in the search capabilities of mathlib - from a Google search to a search through the names of all declarations, and now a search through the names of the declarations, the name of the modules they are in, and the descriptions for each declaration, if there is any of course. Filtering by attributes and kind has also been added, in a demo for now only.
The separate data used for the search holds a list of all declarations, each of which carries the following properties: name, module, description, attributes, kind.
Please check out the demo here: https://leanprover-community.github.io/mathlib_docs_demo/ and share your observations and concerns, especially if there are any issues. Some issues have been found and are being discussed in the PR: https://github.com/leanprover-community/doc-gen/pull/131
Naturally, it might be better to discuss the specifics of each issue (mainly connected to the differences in operating systems and their apps it seems) in this stream in a topic specific for this milestone of the search method and indexed data, or in a private message with me - where we can delve deeper into the problem to find a solution :) .

This topic here is more general about your needs, as users and maintainers of the mathlib library, regarding the search capabilities and features for the future. Please don't hesitate to call forth what you imagine would be useful or absolutely necessary.

Johan Commelin (May 15 2021 at 12:28):

@Polina Boneva Thanks a lot for working on this! The docs are already very useful, and it's great that you are making them even better!

Floris van Doorn (May 15 2021 at 17:37):

There are some features related to searching I would really like. Btw, the only way I search through the docs is via a link like
https://leanprover-community.github.io/mathlib_docs/find/foo
(which I made a standard search engine in my browser with a convenient ml shortcut).

I regularly search for names, but I don't know in what namespace this name is (when copying a name from anywhere on Zulip/Github/uncompiled file in VSCode). Therefore, I am most interested in my search term in various namespaces. So if I search for foo, I'm most interested in any declaration whose name ends with .foo. Ideally:

  • If there is one such declaration: redirect automatically to that declaration.
  • If there are more than one such declaration, the page shows a list Declarations in namespaces matching the name exactly: above the current list of suggestions.
  • If there are no matches, give the current list of suggestions.

Polina Boneva (May 16 2021 at 11:59):

I'm glad you turned my attention to this example exactly because I find it weird how the webpages for counterexamples is created in the list of mathlib directories and modules, since it's neither foo.has_zero, nor foo.has_one can be found in the export.json and foo.inhabited is only found as part of a description of instances for cathegory_theory.ulift_functor_full. Searching for foo in the current search (that is by names only) does find them all though, which means they must be in the data structure (loc_map) generated from the export.json in the first place. I'm sure I'm missing something, but I can't see it. Are you able to pinpoint where and how are those extra names added to the data structure (loc_map) used for generating the names list that works in the search that is currently live?

Regarding the search capabilities on the basis of partial-matching inside namespaces, this is a great suggestion and should be implemented indeed. Maybe a partial match in the descriptions of each declaration, and an exact match in the namespaces? Or the opposite? Maybe it will be even better if it's a choice given to the user to select before searching.
I don't think automatic redirecting is a good idea in a place where most of the results from an action don't redirect and only a very small amount of them would. Isn't it usually best if the user experience is easy to anticipate and understand for first-comers too?

What kind of suggestions do you mean exactly? The current list offers matches in namespaces exactly. Do you mean this or a list of results of a similar query, if none for the current query are found?

Floris van Doorn (May 16 2021 at 18:52):

Oh, my example searching for foo was confusing: the declaration foo didn't exist when I wrote my comment, so I didn't mean foo specifically, but just an arbitrary search term.

When I search for a name, say my_name, and there is a declaration list.my_name or even measure_theory.measure.regular.my_name, I always mean one of those possibilities: I have just copied the name without its namespace from somewhere, and that is the declaration I'm interested in. Redirecting would definitely be my preferred behavior (if there is exactly one match inside some namespace), but I can live with it being the top result.

Here is the current "problem".
If I currently search for docs#map_update, the exact match in of a declaration in a namespace is docs#list.nodup.map_update, but it is only the 10th showing result. Even worse is docs#to_list_nodup, which matches docs#option.to_list_nodup, but that is not one of the 20 displayed suggestions.

Kevin Buzzard (May 16 2021 at 19:03):

+1 from me -- I sometimes run into exactly this problem (when reading Lean but not having access to a Lean session).

Polina Boneva (May 16 2021 at 19:43):

Aha, I understand you a bit better now. Since you mention the 20 results that are currently displayed from the search results, I guess you haven't tried using the search that is currently under PR: https://leanprover-community.github.io/mathlib_docs_demo/
Try it and let me know if it's better for your purposes.
For now I see that the ending part of the declaration names holds more value than any other, thank you. I will see what can be done to prioritize it because it is indeed the most specific part of the namespace. Overall for now in the demo above the namespace of the declaration holds a higher priority than any findings in the description or name of the module the declaration is in, but there can definitely be done more for a more specific prioritization criteria.

Floris van Doorn (May 17 2021 at 02:01):

I was indeed not using your search function.
I tried it: your search function gets the "correct" first result with to_list_nodup. With to_list_nodup the "correct" result is at number 8. So it's indeed better, but I think it would be good to build in this heuristic that the namespaces of a name are less valuable when searching.

When using your search function, I had a problem that my search query actually started with a zero-width space (U+200B), and it gave me 0 results back. Presumably that character got there when I copy-pasted some text in there (I'm not quite sure where I copied from). Can you strip out all space-like characters before searching? (Lean might allow some zero-width-space characters in declaration names, but I'd wager we're not going to use them in mathlib.)

Gabriel Ebner (May 17 2021 at 07:12):

Floris van Doorn said:

I'm not quite sure where I copied from

From the documentation, probably. We used to insert zero-width spaces after every dot (to allow linebreaks). I believe Alex Peattie changed this to use a CSS-based solution instead for most of the documentation, but the 404 page was still using them.

Polina Boneva (May 17 2021 at 07:26):

That's a good point! I pushed a fix that trims the query before searching :) it's generally a good idea to treat strings this way


Last updated: Dec 20 2023 at 11:08 UTC