Zulip Chat Archive

Stream: metaprogramming / tactics

Topic: automatic lemmas retrieval

Yury Yarovikov (Nov 18 2022 at 12:12):

Hey there! I've been doing some research on automated theorem proving in Lean, during which I encountered a problem: I want my machine learning model to store all the lemmas in the library and to have an ability to insert their names into generated code.
The problem is that, in Lean, lemmas are members of namespaces which, in turn, can also be members of other nested namespaces. With that in mind, I have 2 questions.

What is the easiest way to universally identify a lemma to insert into code? Is storing lemma names with all nested namespaces (e.g. linear_algebra.matrix.addition_commutative instead of simply addition_commutative which can be ambiguous) a good idea?
Is there a simple way to extract this full name from Lean code?

Jason Rute (Nov 18 2022 at 12:19):

Generally yes, fully qualified names are good identifiers. I can’t remember if it is is 100% unambiguous but it is mostly so. One way to get all the lemma names is to use the script in mathlib to make all.lean, and then inside Lean write a small meta-program to loop over the environment.

Eric Wieser (Nov 18 2022 at 12:19):

In the presence of open namespaces, fully-qualified names are only unambiguous if you prefix them with _root_

Jason Rute (Nov 18 2022 at 12:22):

Also, welcome. If you aren’t aware of it, you might want to check out #Machine Learning for Theorem Proving (although this is the right thread for this question).

Yury Yarovikov (Nov 18 2022 at 12:26):

Thanks a lot!
Jason Rute said:

Also, welcome. If you aren’t aware of it, you might want to check out #Machine Learning for Theorem Proving (although this is the right thread for this question).

Jason Rute (Nov 18 2022 at 12:32):

Here is specific code to do what you are interested in. https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/full.20names/near/129836105 (edited link) Metaprogramming is not particularly easy mostly because there is a lack of documentation. That is getting better for Lean 4, but for Lean 3, the best is to search here on Zulip, and ask questions.

Jason Rute (Nov 18 2022 at 12:37):

Here is the mk_all.sh script.

Jason Rute (Nov 18 2022 at 12:39):

Also, it isn't clear if you are going to look up this stuff ahead of time and process in a machine learning model (like in the gptf tactic and Meta's model which is available from the infoview) or you are going to look it up on-the-fly like in library_search.

Alex J. Best (Nov 18 2022 at 12:53):

For 1 I'd say that storing the full name makes the most sense, unless you want your model to also predict a list of namespaces to open.

Alex J. Best (Nov 18 2022 at 12:54):

For 2 it depends what you mean, are you looking at some lean source code as text? Or are you working from within lean? Can you be a bit more specific?

Notification Bot (Nov 18 2022 at 13:04):

2 messages were moved here from #general > ✔ metaprogramming / tactics by Eric Wieser.

Yermek Kapushev (Nov 18 2022 at 13:42):

Hi everyone! I am working with @Yury Yarovikov on automated theorem proving. Thanks a lot for your replies!
I'd like to clarify our question: given a tactic with some arguments how can we find definitions of the arguments?

Jason Rute said:

Here is specific code to do what you are interested in. https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/full.20names/near/129836105 (edited link) Metaprogramming is not particularly easy mostly because there is a lack of documentation. That is getting better for Lean 4, but for Lean 3, the best is to search here on Zulip, and ask questions.

If I understand this code correctly, it finds all declarations in a given file and it looks like we need something different. Correct me please if I'm wrong.

Yermek Kapushev (Nov 18 2022 at 13:42):

Jason Rute said:

Also, it isn't clear if you are going to look up this stuff ahead of time and process in a machine learning model (like in the gptf tactic and Meta's model which is available from the infoview) or you are going to look it up on-the-fly like in library_search.

We are going to look up this stuff just once for a set of proofs and use it later.

Yermek Kapushev (Nov 18 2022 at 13:42):

Alex J. Best said:

For 2 it depends what you mean, are you looking at some lean source code as text? Or are you working from within lean? Can you be a bit more specific?

Currently we are working in Python communicating with lean-gym. So, yes, we are looking at lean source code as text

Jason Rute (Nov 18 2022 at 14:03):

@Yermek Kapushev You can remove the line which filters to the local file. I'd also just search on Zulip for env.fold, curr_env.fold, etc. for similar examples of looping over all names. There is also https://github.com/jesse-michael-han/lean-step-public which is where half of the data used to train both the GPTf and Meta models comes from. This works by similarly looping over all the declarations. (The other data comes from https://github.com/jasonrute/lean_proof_recording. This works differently by hackely modifying the Lean tactic internals to record data and then parsing the Lean files to extract the tactics.)

Last updated: May 02 2025 at 03:31 UTC