Zulip Chat Archive
Stream: maths
Topic: auto-generating names for declarations
Johan Commelin (Jun 27 2020 at 13:05):
Rob Lewis said:
What we really need is a linter that automatically flags if a name is wrong, and suggests to replace it with a proper one...
Interesting!
Johan Commelin (Jun 27 2020 at 13:07):
It would be really nice if we had this for Lean.
Johan Commelin (Jun 27 2020 at 13:07):
We could maybe even run it on Azure...
Johan Commelin (Jun 27 2020 at 13:07):
And have it auto-fix our library for us
Johan Commelin (Jun 27 2020 at 13:07):
But I have no idea about whether this stuff lives up to its promise.
Rob Lewis (Jun 27 2020 at 13:16):
Johan Commelin said:
But I have no idea about whether this stuff lives up to its promise.
I can only speculate. The challenge adapting this to Lean 3 would be accessing what they call "syntax trees," the input-level pre-expressions. Otherwise I think it's just a matter of hooking things together. Maybe a version would work without the syntax trees. They write
The best overall performance (BLEU = 47.2) is obtained using the multi-input model with lemma statement and chopped kernel tree as inputs
I'm not sure if that's intended to exclude the syntax tree in some way.
Johan Commelin (Jun 27 2020 at 13:25):
This is clearly above what I could work on. But I would very much cheer you on.
Rob Lewis (Jun 27 2020 at 13:26):
Not me! Look somewhere else...
Gabriel Ebner (Jun 27 2020 at 13:27):
I agree with Rob, it should be easy to adapt to mathlib. But I really can't tell how bad the results are. It guesses the "right" name 10% of the time. I believe it's enough to feed it only the elaborated expressions, the ChopKnlTree+attn+copy
input is the second best perfoming in their paper.
Johan Commelin (Jun 27 2020 at 13:29):
10% is a bit sad ... I guess we wouldn't get much out of it in practice.
Patrick Massot (Jun 27 2020 at 13:30):
@Karl Palmskog may want to comment here.
Rob Lewis (Jun 27 2020 at 13:33):
Maybe a more useful application would be as a name accuracy estimator. Don't ask it to name lemmas for you, but try to identify lemmas whose names are obviously wrong. This would fit the linting framework better.
Johan Commelin (Jun 27 2020 at 13:34):
Thing like mul_lt_mul''''
Johan Commelin (Jun 27 2020 at 13:35):
git grep "''''"
src/algebra/linear_ordered_comm_group_with_zero.lean:lemma mul_lt_mul'''' (hab : a < b) (hcd : c < d) : a * c < b * d :=
Karl Palmskog (Jun 27 2020 at 15:40):
hi everyone.
I'm not sure if that's intended to exclude the syntax tree in some way.
Our best model for name suggestion did indeed not use the Gallina syntax trees (ASTs, "ChopSynTree") as input. Likely, there was a lot of overlap between the raw tokens ("Stmt") from Coq's lexer and the syntax trees from Coq's parser ("ChopSynTree") in terms of information used for naming
Karl Palmskog (Jun 27 2020 at 15:45):
It guesses the "right" name 10% of the time.
I'm guessing this is referring to the top-1 score. For perspective, MathComp lemma names are not your average lemma names to get exactly right, they can have 8-9 components and mix underscore and camelcase. To be "top-1-correct" every character must be present in exactly the right order.
A better question to ask may be: how many of the top-1 suggested names are "good" or "reasonable"? In our manual evaluation on a project outside of our corpus, we found that around 24% of top-1 names were rated as good/useful. We believe this is good enough for top-5 suggestions from our tool to be used in pull requests, editors, and the like.
Karl Palmskog (Jun 27 2020 at 16:00):
Johan Commelin said:
And have it auto-fix our library for us
The output of our tool can only be as good as the training/validation data. If you want to apply this, you will have to select data with desirable naming conventions for the training, validation and test sets. We chose to train/validate/test on very specific MathComp projects in the paper since they were widely regarded as having high-quality names. If there are no agreed-on conventions for certain training/validation data, having a high top-1 percentage on a certain test set might not have any implications for when the tool is applied in practice.
Rob Lewis (Jun 27 2020 at 16:01):
This question are half-hypothetical, I won't have time in the near future to try to adapt this myself. But for someone who wanted to, how much work would it take beyond data preparation to adapt this to Lean? It would be very very easy to produce a list of mathlib lemma names with their types and bodies as trees, in whatever format is convenient. From section 5 of your paper, it looks like you're working directly on Coq datatypes for a while, right? So it's fairly far from plug-and-play?
Karl Palmskog (Jun 27 2020 at 16:02):
Rob Lewis said:
Maybe a more useful application would be as a name accuracy estimator. Don't ask it to name lemmas for you, but try to identify lemmas whose names are obviously wrong. This would fit the linting framework better.
This is indeed one possible application of this kind of tool. You can basically get a number on how well certain lemma names conform to the lemma names in the training/validation set.
Karl Palmskog (Jun 27 2020 at 16:08):
Rob Lewis said:
But for someone who wanted to, how much work would it take beyond data preparation? It would be very very easy to produce a list of mathlib lemma names with their types and bodies as trees, in whatever format is convenient. From section 5 of your paper, it looks like you're working directly on Coq datatypes for a while, right? So it's fairly far from plug-and-play?
Our toolchain currently assumes input data in the form of S-expressions (since this is what we currently get from Coq). So there will be perhaps a workday or two to make our frontend work with Lean data. We also need a Lean-specific so called "subtokenizer" that knows how to break apart a Lean names into their components (e.g., based on underscores). It's likely you want to perform similar chopping we did on Lean kernel terms to make them into a manageable size. This might be 1-2 workdays more. Then training/validation takes a day of machine time or so. So I would pessimistically estimate a workweek or so (assuming data selection and tokens can be straightforwardly obtained)
Karl Palmskog (Jun 27 2020 at 16:13):
I guess this is a bit off-topic, but since we used the same data and similar techniques: we are also presenting some preliminary work soon in the Coq Workshop July 5 on learning/predicting spacing: https://cozy.ece.utexas.edu/~palmskog/coqws20draft.pdf https://coq-workshop.gitlab.io/2020/#talk-1-2-3
Rob Lewis (Jun 27 2020 at 16:13):
Thanks for the analysis, that's very helpful! The subtokenizing should be very easy. The rest sounds like a nice project for someone looking to do Lean+ML.
Karl Palmskog (Jun 27 2020 at 16:14):
finally, the name generation tool is publicly available now https://github.com/EngineeringSoftware/roosterize and so is the dataset: https://github.com/EngineeringSoftware/math-comp-corpus and the paper will be presented at IJCAR July 3 in a session chaired by Leo: https://easychair.org/smart-program/IJCAR2020/2020-07-03.html#talk:147084 --- my machine learning focused co-author, Pengyu, will likely be able to better answer ML-related questions there than I can here
Johan Commelin (Jun 27 2020 at 16:57):
@Karl Palmskog Thanks a lot for chiming in!
Last updated: Dec 20 2023 at 11:08 UTC