Zulip Chat Archive

Stream: general

Topic: Mathlib semantic search


Zhangir Azerbayev (Feb 23 2023 at 23:56):

Myself and @Edward Ayers put together a demo of semantic search for mathlib.

http://mathlib-search.edayers.com/

For example, when I search "What can I use to prove a subset of the reals is compact?" in the mathlib docs, I don't get anything useful because the default search mostly looks for substring overlap. However, if I use the semantic search, the first result is the correct theorem: metric.is_compact_of_is_closed_bounded. I'd also encourage you to try looking for theorems that have a well-known name (e.g triangle inequality) but are called something obscure in mathlib.

This website is very much a quick and dirty demo, and we're keen to hear more about when it works and when it breaks. You can thumbs up and down search results and also suggest what the search should've returned if you don't see the correct result. This feedback will help us improve the system. We hope these kinds of tools will lower the barrier to entry for newcomers to formal math and help manage the growth of mathlib.

This is mostly implemented using the OpenAI API. If you're curious about the technical details, you can find the source here.

Frédéric Dupuis (Feb 24 2023 at 01:03):

Very nice! Does it check if the theorem names that it outputs are actually in mathlib, or is it just the raw output from the language model?

Zhangir Azerbayev (Feb 24 2023 at 01:11):

We've precomputed dense vector representations of every mathlib term. The language model predicts a term given the natural language query, embeds the predicted term, and does kNN search against the mathlib vectors. So every search result is guaranteed to be a real theorem.

Arthur Paulino (Feb 24 2023 at 01:22):

I am now imagining a tactic in Lean 4 that opens up a text box widget in the infoview, accepts a natural language description and then does the semantic search

Yakov Pechersky (Feb 24 2023 at 01:54):

Very nice! But a link didn't work:
image.png image.png
https://leanprover-community.github.io/mathlib_docs/topology/algebra/infinite_sum.html#summable.tendsto_at_top_zero

Junyan Xu (Feb 24 2023 at 02:46):

Seems to be moved here
but git blame says it's moved 3 years ago ...

Junyan Xu (Feb 24 2023 at 02:59):

I tested this classical example:

Eric Rodriguez said:

for an extreme example of this: i'll be impressed if someone ever finds the cosine rule without knowing what it's called already

unfortunately, nothing useful pops up. Probably because the autoformalization is pretty far from the statement

theorem cosine_law (a b c : ) (ha : a > 0) (hb : b > 0) (hc : c > 0) :
  cos (angle a b c) = (a^2 + b^2 - c^2) / (2 * a * b)

This formalization could be right if angle a b c is defined to be the angle between the side of length a and the side of length b in a triangle with side lengths a, b, c, but such angle doesn't exist in mathlib ...

Junyan Xu (Feb 24 2023 at 03:20):

Mathlib docs search takes docstrings into account so if you put "law of cosine" in the search box then docs#euclidean_geometry.dist_sq_eq_dist_sq_add_dist_sq_sub_two_mul_dist_mul_dist_mul_cos_angle will appear as the first result. This new semantic search seems to completely ignore docstrings.(maybe not)

Yaël Dillies (Feb 24 2023 at 08:15):

Junyan Xu said:

Seems to be moved here
but git blame says it's moved 3 years ago ...

No, it moved last week. #18414

Junyan Xu (Feb 24 2023 at 09:09):

I forgot git can now track file renames ...

Eric Wieser (Feb 24 2023 at 09:15):

Git doesn't track renames in any meaningful way, it just guesses after the fact

Eric Wieser (Feb 24 2023 at 09:25):

I searched for "sine rule" and the autoformalization was

theorem sin_cos_eq_sin_cos_of_cos_eq_cos {α : Type*} [inhabited α] [discrete_linear_ordered_field α]
  [trigonometry α] {a b c : α} (h : cos a = cos b) :
  sin a * cos c = sin b * cos c  sin a = sin b

Should we introduce a [trigonometry α] typeclass to generalize over complex.sin and real.sin? (edit: moved to another thread to not distract from the main topic here)

Junyan Xu (Feb 24 2023 at 09:31):

Git doesn't track renames in any meaningful way, it just guesses after the fact

Thanks for the correction. Here is the relevant official FAQ

Zhangir Azerbayev (Feb 24 2023 at 17:51):

Junyan Xu said:

Git doesn't track renames in any meaningful way, it just guesses after the fact

Thanks for the correction. Here is the relevant official FAQ

The database the current demo uses was generated from a commit that's a few weeks old (06d0ad). I can manually regenerate the database pretty easily, but automatically syncing with mathlib commits is on the to-do list.

Notification Bot (Feb 24 2023 at 17:52):

A message was moved from this topic to #general > Mathlib semantic search suggests: a trigonometry typeclass by Eric Wieser.


Last updated: Dec 20 2023 at 11:08 UTC