Zulip Chat Archive

Stream: lean4

Topic: Parse names of theorems

Moritz Firsching (Oct 09 2024 at 08:11):

For a linter that I'm trying to implement, I would like to parse the names that come after definition or theorem (or lemma) For instances here:

theorem foo ...

definition bar ...

I want to extract foo and bar.

I think what I need is a function that with of the following type,

def getNames : Syntax → Array Syntax

so I probably want to match on | stx@(.node info kind args) =>, but I don't quite know how to detect if we are in a definition/lemma/theorem from then (and also how to get access to the name after that).

Michael Rothgang (Oct 09 2024 at 08:12):

I think this question is not too difficult. I'm pretty sure @Damiano Testa knows; if not, I'll try to take a look in the next hours (have to go now, but should be able to get back to this before lunch).

Damiano Testa (Oct 09 2024 at 09:07):

It depends a little on how "robust" you want this to be. The "name" of a theorem has a very special SyntaxNodeKind that is a declId (and is the name with possibly a universe level specification as in lift_univ.{u, v}).

Damiano Testa (Oct 09 2024 at 09:13):

import Lean.Elab.Command

open Lean Elab Parser Command

elab "getName " cmd:command : command => do
  elabCommand cmd
  match cmd.raw.find? (·.isOfKind ``declId) with
    | some nm => logInfo m!"The name is '{nm}'."
    | none => logInfo m!"No given name."

/-- info: The name is 'X'. -/
#guard_msgs in
getName
def X := 0

/-- info: The name is 'X1'. -/
#guard_msgs in
getName
theorem X1 : True := .intro

/-- info: No given name. -/
#guard_msgs in
getName
instance : True := .intro

Damiano Testa (Oct 09 2024 at 09:14):

Directly matching on the declaration is a little bit tricky since there are some moving parts and your match would have to take into account at least a few variations.

Eric Wieser (Oct 09 2024 at 09:16):

Do you want to extrant names generated by to_additive too?

Eric Wieser (Oct 09 2024 at 09:16):

If so you probably want to consult the environment, not just the syntax

Moritz Firsching (Oct 09 2024 at 09:23):

For now, I don't care about the to_additive names, so no probably need to look into the environment yet...

Moritz Firsching (Oct 09 2024 at 09:24):

Thanks @Damiano Testa, declID was just what I was looking for

Damiano Testa (Oct 09 2024 at 09:24):

Let me then also mention another way in which this is brittle: there is no information about the "actual" final name in the above. Namespaces,_root_ and whatever other consideration is ignored.

Damiano Testa (Oct 09 2024 at 09:56):

I could not find exactly what you want in core, but you can look around docs#Lean.Elab.Command.getDefName? (a private declaration in file#Lean.Elab.Command).

Moritz Firsching (Oct 09 2024 at 12:31):

Here's what I needed this for. Instead of just fixing the (presumably) unintentional double underscore here: docs#FirstOrder.Language.PartialEquiv.toEmbeddingOfEqTop__apply
I added a (extendable) linter: #17580, feedback/reviews welcome.
Still wip, because

I also plan to depecrecate toEmbeddingOfEqTop__apply and replace it with oEmbeddingOfEqTop_apply in the pull request, but I wanted to wait and see if the CI triggers on this linter.
apparently importing Batteries.Data.String.Matcher is perhaps not the best idea...

Moritz Firsching (Oct 09 2024 at 13:35):

ci failed sucessfully: https://github.com/leanprover-community/mathlib4/actions/runs/11254853203/job/31293151730?pr=17580:

Some required builds logged failures:
- Mathlib.ModelTheory.PartialEquiv

so this works as intended. pushed the deprecation, 1. is fixed. Not sure about "2."

Eric Wieser (Oct 09 2024 at 13:56):

I think this linter should be checking the environment not the syntax

Eric Wieser (Oct 09 2024 at 13:57):

How does the linter.dupNamespace linter handle it?

Moritz Firsching (Oct 09 2024 at 14:05):

/-- `getIds stx` extracts the `declId` nodes from the `Syntax` `stx`.
If `stx` is an `alias` or an `export`, then it extracts an `ident`, instead of a `declId`. -/
partial
def getIds : Syntax → Array Syntax
  | .node _ `Batteries.Tactic.Alias.alias args => args[2:3]
  | .node _ ``Lean.Parser.Command.export args => (args[3:4] : Array Syntax).map (·[0])
  | stx@(.node _ _ args) =>
    ((args.attach.map fun ⟨a, _⟩ ↦ getIds a).foldl (· ++ ·) #[stx]).filter (·.getKind == ``declId)
  | _ => default

https://github.com/leanprover-community/mathlib4/blob/9b43951ca72df001cffd7fbab1023a00282824ef/Mathlib/Tactic/Linter/Lint.lean#L80-L88

Moritz Firsching (Oct 09 2024 at 20:29):

so I suppose linter.dupNamespace does not check the environment then?

Damiano Testa (Oct 09 2024 at 20:31):

The duplicated namespace linter was initially an environment linter and checked "all" declarations, but did not give live feedback.

Damiano Testa (Oct 09 2024 at 20:32):

I wrote the syntax linter to get live, although incomplete, feedback and the syntax linter completely replaced the environment linter, even with its short-comings.

Damiano Testa (Oct 09 2024 at 20:33):

If I remember correctly, I also have a test showing that if the to_additive name of a declaration has a duplicated namespace, the linter ignores it.

Damiano Testa (Oct 09 2024 at 20:34):

I have similar uneasiness about other linters that may exist in both syntax and environment form, since the syntax linter gives immediate feedback, which is normally enough, but the environment linter would give a "definite" answer, albeit later.

Damiano Testa (Oct 09 2024 at 20:37):

Indeed, this is one of the tests:

-- the `dupNamespace` linter does not notice that `to_additive` created `Foo.add.add`.
#guard_msgs in
@[to_additive] theorem add.mul : True := .intro

--  However, the declaration `Foo.add.add` is present in the environment.
run_cmd Lean.Elab.Command.liftTermElabM do
  let decl := (← Lean.getEnv).find? ``Foo.add.add
  guard decl.isSome

Moritz Firsching (Oct 09 2024 at 20:43):

What does that mean for the proposed linter? Of course it won't catch double underscore (and when extended potentially other things we don't want) in the namespace, but of course namespaces are added less often then definitions/theorems/lemmas. I think it would be nice to have immediate feedback.

What about the import form Batteries, that gives the red warning label on the pr?
Might it make sense to factor out a function getIds : Syntax → Array Syntax that can be used by both the dupNamespace and the new linter?

Damiano Testa (Oct 09 2024 at 20:50):

I think that a "serious" function that really extracts the name of the "visible" declaration is something that would be very useful.

Damiano Testa (Oct 09 2024 at 20:51):

I also think that its signature should be Syntax -> (appropriate monad) (Option Syntax), returning none only for syntax that is not a declaration and returning some for everything else, also nameless instances, retrieving their name and attaching some meaningful position to the (fake) syntax node that the function creates.

Damiano Testa (Oct 09 2024 at 20:51):

I remember that I used recently the "hover" information to find the correct name, also for nameless instances.

Damiano Testa (Oct 09 2024 at 20:52):

Let me see if I can find the code.

Damiano Testa (Oct 09 2024 at 20:53):

Ideally, there should not be the need to match on the actual syntax, since I hope that all the information that is needed is already extracted somewhere, since hovers know where they are in scope and know the full name.

Eric Wieser (Oct 09 2024 at 20:57):

Probably we should build some reusable boilerplate for "name" linters, and then implement both the dupNamespace and __ linter with that boilerplate

Eric Wieser (Oct 09 2024 at 20:57):

We can have the hacky syntax-based linter for now; but probably the right thing to do is to extract information from the infotree about which names in the provided syntax correspond to declarations

Eric Wieser (Oct 09 2024 at 20:58):

Let's just not have the same hack in more than one place

Damiano Testa (Oct 09 2024 at 20:58):

As you may guess, I really like syntax linters, but another advantage of the environment linters is that they also check declarations that have been defined before the linter is imported.

Damiano Testa (Oct 09 2024 at 21:00):

Eric Wieser said:

We can have the hacky syntax-based linter for now; but probably the right thing to do is to extract information from the infotree about which names in the provided syntax correspond to declarations

My experience with working with InfoTrees, is that they contain all the information that you usually want, but I also cannot find efficient ways of extracting this information.

Damiano Testa (Oct 09 2024 at 21:00):

I realize that this is a limitation of my skills, but I do find that this potentially challenging.

Damiano Testa (Oct 09 2024 at 22:04):

Hopefully, this is more robust: you feed into the command the Syntax of the declaration and it returns the name of the declaration, whether it is nameless or not.

import Mathlib.Tactic.ToAdditive

open Lean Elab Command

/--
If `stx` is a declaration, then `getNames stx` returns the array of identifiers
for the name of the declarations whose syntax range is contained in `stx`.

If `stx` is not a declaration, then `getNames stx` returns `#[]`.
-/
def getNames {m} [Monad m] [MonadEnv m] [MonadFileMap m] (stx : Syntax) : m (Array Syntax) := do
  let drs := declRangeExt.getState (← getEnv)
  let stxPos := stx.getRange?.getD default
  let fm ← getFileMap
  let stxPos1 := stxPos.start
  let stxPos2 := stxPos.stop
  let mut nms := #[]
  for (nm, rgs) in drs do
    if stxPos1 ≤ fm.ofPosition rgs.range.pos && fm.ofPosition rgs.range.pos ≤ stxPos2 then
      let ofPos1 := fm.ofPosition rgs.selectionRange.pos
      let ofPos2 := fm.ofPosition rgs.selectionRange.endPos
      nms := nms.push (mkIdentFrom (.ofRange ⟨ofPos1, ofPos2⟩) nm)
  return nms

elab "#show_names " stx:command : command => do
  elabCommand stx
  for nmStx in ← getNames stx do
    logInfoAt nmStx m!"The name is '{nmStx}'"

/--
info: The name is 'add_add'
---
info: The name is 'mul_mul'
-/
#guard_msgs in
#show_names
@[to_additive]
theorem mul_mul : True := .intro

/-- info: The name is 'instAddNat_1' -/
#guard_msgs in
#show_names
instance : Add Nat := inferInstance

/-- info: The name is 'instAddNat_2' -/
#guard_msgs in
#show_names
instance : Add Nat := inferInstance

/-- info: The name is 'X' -/
#guard_msgs in
#show_names
instance X : Add Nat := inferInstance

/--
info: The name is 'X1.mk'
---
info: The name is 'X1'
-/
#guard_msgs in
#show_names
structure X1 where

#guard_msgs in
#show_names
section

Damiano Testa (Oct 09 2024 at 22:21):

I edited the above code: it occurred to me that the functions above could be adapted to catch to_additive names as well.

François G. Dorais (Oct 10 2024 at 05:33):

Moritz Firsching said:

apparently importing Batteries.Data.String.Matcher is perhaps not the best idea...

This response is way aside at this point in the thread, apologies. I don't think this is the optimal route for your use case but note that Batteries.Data.String.Matcher is built upon a much more general Batteries.Data.Array.Match that can be used on much more general data types than strings. (Still in linear time!) The interface is admittedly not optimal but comments and ideas are most welcome! There are plenty of ways to make this algorithm more usable.

Damiano Testa (Oct 10 2024 at 05:51):

Ah, I had missed that import!

Mathlib.Tactic.Linter.Style is imported in Mathlib.Init and Mathlib.Init is quite controlled in its imports. So, I would recommend against adding imports to it, even in Batteries. Instead of matching, you can use str.splitOn "__" in your case:

#eval  -- [true, false]
  let strs := ["ab__cd__ef", "ab_cd_ef"]
  strs.map fun s => (2 ≤ (s.splitOn "__").length : Bool)

(I am not really sure about efficiency, so feel free to suggest more performant, import-less solutions!)

Moritz Firsching (Oct 10 2024 at 14:22):

ok, cool, now I'm using the code you suggested and moved it to a file Linter/Common.lean, where both the dupNamespace linter and hte new linter use the new funcitons getNames. Seems like it works in principle, but I get lots of new (false positive?) duplicate namespace warnings:
https://github.com/leanprover-community/mathlib4/actions/runs/11274654137/job/31354283516?pr=17580

The large import should now not be a problem anymore.

mathlib#17580

Damiano Testa (Oct 10 2024 at 15:02):

Thanks!

I'm benching the PR, since I'm curious about performance of the new script. I don't have time to review it now, though.

Moritz Firsching (Oct 10 2024 at 15:03):

it still does not work, because for some reason I'm getting these false postivies

Michael Rothgang (Oct 10 2024 at 15:39):

I left some comments, mostly about style and documentation.

Damiano Testa (Oct 10 2024 at 19:24):

I also left some comments, hopefully with a fix for the build!

Moritz Firsching (Oct 10 2024 at 19:25):

Thanks to both of you! Build seems fixed now. I will address the remaining points raised

Damiano Testa (Oct 10 2024 at 19:27):

I am hoping that the linter will now catch a "hidden" name that has a duplicated namespace...

Last updated: May 02 2025 at 03:31 UTC