Zulip Chat Archive
Stream: lean4
Topic: Parse names of theorems
Moritz Firsching (Oct 09 2024 at 08:11):
For a linter that I'm trying to implement, I would like to parse the names that come after definition
or theorem
(or lemma
) For instances here:
theorem foo ...
definition bar ...
I want to extract foo
and bar
.
I think what I need is a function that with of the following type,
def getNames : Syntax → Array Syntax
so I probably want to match on | stx@(.node info kind args) =>
, but I don't quite know how to detect if we are in a definition/lemma/theorem from then (and also how to get access to the name after that).
Michael Rothgang (Oct 09 2024 at 08:12):
I think this question is not too difficult. I'm pretty sure @Damiano Testa knows; if not, I'll try to take a look in the next hours (have to go now, but should be able to get back to this before lunch).
Damiano Testa (Oct 09 2024 at 09:07):
It depends a little on how "robust" you want this to be. The "name" of a theorem has a very special SyntaxNodeKind that is a declId
(and is the name with possibly a universe level specification as in lift_univ.{u, v}
).
Damiano Testa (Oct 09 2024 at 09:13):
import Lean.Elab.Command
open Lean Elab Parser Command
elab "getName " cmd:command : command => do
elabCommand cmd
match cmd.raw.find? (·.isOfKind ``declId) with
| some nm => logInfo m!"The name is '{nm}'."
| none => logInfo m!"No given name."
/-- info: The name is 'X'. -/
#guard_msgs in
getName
def X := 0
/-- info: The name is 'X1'. -/
#guard_msgs in
getName
theorem X1 : True := .intro
/-- info: No given name. -/
#guard_msgs in
getName
instance : True := .intro
Damiano Testa (Oct 09 2024 at 09:14):
Directly matching on the declaration is a little bit tricky since there are some moving parts and your match would have to take into account at least a few variations.
Eric Wieser (Oct 09 2024 at 09:16):
Do you want to extrant names generated by to_additive
too?
Eric Wieser (Oct 09 2024 at 09:16):
If so you probably want to consult the environment, not just the syntax
Moritz Firsching (Oct 09 2024 at 09:23):
For now, I don't care about the to_additive
names, so no probably need to look into the environment yet...
Moritz Firsching (Oct 09 2024 at 09:24):
Thanks @Damiano Testa, declID
was just what I was looking for
Damiano Testa (Oct 09 2024 at 09:24):
Let me then also mention another way in which this is brittle: there is no information about the "actual" final name in the above. Namespaces,_root_
and whatever other consideration is ignored.
Damiano Testa (Oct 09 2024 at 09:56):
I could not find exactly what you want in core, but you can look around docs#Lean.Elab.Command.getDefName? (a private declaration in file#Lean.Elab.Command).
Moritz Firsching (Oct 09 2024 at 12:31):
Here's what I needed this for. Instead of just fixing the (presumably) unintentional double underscore here: docs#FirstOrder.Language.PartialEquiv.toEmbeddingOfEqTop__apply
I added a (extendable) linter: #17580, feedback/reviews welcome.
Still wip, because
-
I also plan to depecrecate
toEmbeddingOfEqTop__apply
and replace it withoEmbeddingOfEqTop_apply
in the pull request, but I wanted to wait and see if the CI triggers on this linter. -
apparently importing
Batteries.Data.String.Matcher
is perhaps not the best idea...
Moritz Firsching (Oct 09 2024 at 13:35):
ci failed sucessfully: https://github.com/leanprover-community/mathlib4/actions/runs/11254853203/job/31293151730?pr=17580:
Some required builds logged failures:
- Mathlib.ModelTheory.PartialEquiv
so this works as intended. pushed the deprecation, 1. is fixed. Not sure about "2."
Eric Wieser (Oct 09 2024 at 13:56):
I think this linter should be checking the environment not the syntax
Eric Wieser (Oct 09 2024 at 13:57):
How does the linter.dupNamespace
linter handle it?
Moritz Firsching (Oct 09 2024 at 14:05):
/-- `getIds stx` extracts the `declId` nodes from the `Syntax` `stx`.
If `stx` is an `alias` or an `export`, then it extracts an `ident`, instead of a `declId`. -/
partial
def getIds : Syntax → Array Syntax
| .node _ `Batteries.Tactic.Alias.alias args => args[2:3]
| .node _ ``Lean.Parser.Command.export args => (args[3:4] : Array Syntax).map (·[0])
| stx@(.node _ _ args) =>
((args.attach.map fun ⟨a, _⟩ ↦ getIds a).foldl (· ++ ·) #[stx]).filter (·.getKind == ``declId)
| _ => default
Moritz Firsching (Oct 09 2024 at 20:29):
so I suppose linter.dupNamespace
does not check the environment then?
Damiano Testa (Oct 09 2024 at 20:31):
The duplicated namespace linter was initially an environment linter and checked "all" declarations, but did not give live feedback.
Damiano Testa (Oct 09 2024 at 20:32):
I wrote the syntax linter to get live, although incomplete, feedback and the syntax linter completely replaced the environment linter, even with its short-comings.
Damiano Testa (Oct 09 2024 at 20:33):
If I remember correctly, I also have a test showing that if the to_additive
name of a declaration has a duplicated namespace, the linter ignores it.
Damiano Testa (Oct 09 2024 at 20:34):
I have similar uneasiness about other linters that may exist in both syntax and environment form, since the syntax linter gives immediate feedback, which is normally enough, but the environment linter would give a "definite" answer, albeit later.
Damiano Testa (Oct 09 2024 at 20:37):
Indeed, this is one of the tests:
-- the `dupNamespace` linter does not notice that `to_additive` created `Foo.add.add`.
#guard_msgs in
@[to_additive] theorem add.mul : True := .intro
-- However, the declaration `Foo.add.add` is present in the environment.
run_cmd Lean.Elab.Command.liftTermElabM do
let decl := (← Lean.getEnv).find? ``Foo.add.add
guard decl.isSome
Moritz Firsching (Oct 09 2024 at 20:43):
What does that mean for the proposed linter? Of course it won't catch double underscore (and when extended potentially other things we don't want) in the namespace, but of course namespaces are added less often then definitions/theorems/lemmas. I think it would be nice to have immediate feedback.
What about the import form Batteries, that gives the red warning label on the pr?
Might it make sense to factor out a function getIds : Syntax → Array Syntax
that can be used by both the dupNamespace and the new linter?
Damiano Testa (Oct 09 2024 at 20:50):
I think that a "serious" function that really extracts the name of the "visible" declaration is something that would be very useful.
Damiano Testa (Oct 09 2024 at 20:51):
I also think that its signature should be Syntax -> (appropriate monad) (Option Syntax)
, returning none
only for syntax that is not a declaration and returning some
for everything else, also nameless instances, retrieving their name and attaching some meaningful position to the (fake) syntax node that the function creates.
Damiano Testa (Oct 09 2024 at 20:51):
I remember that I used recently the "hover" information to find the correct name, also for nameless instances.
Damiano Testa (Oct 09 2024 at 20:52):
Let me see if I can find the code.
Damiano Testa (Oct 09 2024 at 20:53):
Ideally, there should not be the need to match on the actual syntax, since I hope that all the information that is needed is already extracted somewhere, since hovers know where they are in scope and know the full name.
Eric Wieser (Oct 09 2024 at 20:57):
Probably we should build some reusable boilerplate for "name" linters, and then implement both the dupNamespace and __ linter with that boilerplate
Eric Wieser (Oct 09 2024 at 20:57):
We can have the hacky syntax-based linter for now; but probably the right thing to do is to extract information from the infotree about which names in the provided syntax correspond to declarations
Eric Wieser (Oct 09 2024 at 20:58):
Let's just not have the same hack in more than one place
Damiano Testa (Oct 09 2024 at 20:58):
As you may guess, I really like syntax linters, but another advantage of the environment linters is that they also check declarations that have been defined before the linter is imported.
Damiano Testa (Oct 09 2024 at 21:00):
Eric Wieser said:
We can have the hacky syntax-based linter for now; but probably the right thing to do is to extract information from the infotree about which names in the provided syntax correspond to declarations
My experience with working with InfoTrees, is that they contain all the information that you usually want, but I also cannot find efficient ways of extracting this information.
Damiano Testa (Oct 09 2024 at 21:00):
I realize that this is a limitation of my skills, but I do find that this potentially challenging.
Damiano Testa (Oct 09 2024 at 22:04):
Hopefully, this is more robust: you feed into the command the Syntax
of the declaration and it returns the name of the declaration, whether it is nameless or not.
import Mathlib.Tactic.ToAdditive
open Lean Elab Command
/--
If `stx` is a declaration, then `getNames stx` returns the array of identifiers
for the name of the declarations whose syntax range is contained in `stx`.
If `stx` is not a declaration, then `getNames stx` returns `#[]`.
-/
def getNames {m} [Monad m] [MonadEnv m] [MonadFileMap m] (stx : Syntax) : m (Array Syntax) := do
let drs := declRangeExt.getState (← getEnv)
let stxPos := stx.getRange?.getD default
let fm ← getFileMap
let stxPos1 := stxPos.start
let stxPos2 := stxPos.stop
let mut nms := #[]
for (nm, rgs) in drs do
if stxPos1 ≤ fm.ofPosition rgs.range.pos && fm.ofPosition rgs.range.pos ≤ stxPos2 then
let ofPos1 := fm.ofPosition rgs.selectionRange.pos
let ofPos2 := fm.ofPosition rgs.selectionRange.endPos
nms := nms.push (mkIdentFrom (.ofRange ⟨ofPos1, ofPos2⟩) nm)
return nms
elab "#show_names " stx:command : command => do
elabCommand stx
for nmStx in ← getNames stx do
logInfoAt nmStx m!"The name is '{nmStx}'"
/--
info: The name is 'add_add'
---
info: The name is 'mul_mul'
-/
#guard_msgs in
#show_names
@[to_additive]
theorem mul_mul : True := .intro
/-- info: The name is 'instAddNat_1' -/
#guard_msgs in
#show_names
instance : Add Nat := inferInstance
/-- info: The name is 'instAddNat_2' -/
#guard_msgs in
#show_names
instance : Add Nat := inferInstance
/-- info: The name is 'X' -/
#guard_msgs in
#show_names
instance X : Add Nat := inferInstance
/--
info: The name is 'X1.mk'
---
info: The name is 'X1'
-/
#guard_msgs in
#show_names
structure X1 where
#guard_msgs in
#show_names
section
Damiano Testa (Oct 09 2024 at 22:21):
I edited the above code: it occurred to me that the functions above could be adapted to catch to_additive
names as well.
François G. Dorais (Oct 10 2024 at 05:33):
Moritz Firsching said:
- apparently importing
Batteries.Data.String.Matcher
is perhaps not the best idea...
This response is way aside at this point in the thread, apologies. I don't think this is the optimal route for your use case but note that Batteries.Data.String.Matcher
is built upon a much more general Batteries.Data.Array.Match
that can be used on much more general data types than strings. (Still in linear time!) The interface is admittedly not optimal but comments and ideas are most welcome! There are plenty of ways to make this algorithm more usable.
Damiano Testa (Oct 10 2024 at 05:51):
Ah, I had missed that import!
Mathlib.Tactic.Linter.Style
is imported in Mathlib.Init
and Mathlib.Init
is quite controlled in its imports. So, I would recommend against adding imports to it, even in Batteries. Instead of matching, you can use str.splitOn "__"
in your case:
#eval -- [true, false]
let strs := ["ab__cd__ef", "ab_cd_ef"]
strs.map fun s => (2 ≤ (s.splitOn "__").length : Bool)
(I am not really sure about efficiency, so feel free to suggest more performant, import-less solutions!)
Moritz Firsching (Oct 10 2024 at 14:22):
ok, cool, now I'm using the code you suggested and moved it to a file Linter/Common.lean, where both the dupNamespace linter and hte new linter use the new funcitons getNames
. Seems like it works in principle, but I get lots of new (false positive?) duplicate namespace warnings:
https://github.com/leanprover-community/mathlib4/actions/runs/11274654137/job/31354283516?pr=17580
The large import should now not be a problem anymore.
Damiano Testa (Oct 10 2024 at 15:02):
Thanks!
I'm benching the PR, since I'm curious about performance of the new script. I don't have time to review it now, though.
Moritz Firsching (Oct 10 2024 at 15:03):
it still does not work, because for some reason I'm getting these false postivies
Michael Rothgang (Oct 10 2024 at 15:39):
I left some comments, mostly about style and documentation.
Damiano Testa (Oct 10 2024 at 19:24):
I also left some comments, hopefully with a fix for the build!
Moritz Firsching (Oct 10 2024 at 19:25):
Thanks to both of you! Build seems fixed now. I will address the remaining points raised
Damiano Testa (Oct 10 2024 at 19:27):
I am hoping that the linter will now catch a "hidden" name that has a duplicated namespace...
Last updated: May 02 2025 at 03:31 UTC