Zulip Chat Archive

Stream: Is there code for X?

Topic: undocumented things


Joachim Breitner (Apr 05 2024 at 07:11):

How hard would it be to generate a list of things (definitions, syntax, i.e. stuff that you can hover) that don't have a docstring, sorted by how often they appear in, say, all of mathlib?

Yaël Dillies (Apr 05 2024 at 07:12):

That list (of defs) is https://github.com/leanprover-community/mathlib4/blob/master/scripts/nolints.json, so you could iterate through it and count uses

Kevin Buzzard (Apr 05 2024 at 08:09):

It's all but the last two lines of that list.

How has that lost got so big? I though we linted against that sort of thing.

Yaël Dillies (Apr 05 2024 at 08:10):

Some definitions lost their docstring during the port, eg docs#CommRing

Mario Carneiro (Apr 05 2024 at 08:11):

those were probably the pre-mathport definitions

Shreyas Srinivas (Apr 05 2024 at 09:46):

I think loogle already fetches docstrings. Is there a loogle query for "fetch everything"?

Michael Rothgang (Apr 05 2024 at 10:20):

Another (small) factor is a number of local notations which could have had @[inherit_doc] added. I believe Yael removed ~115 of these at some point; there's an in-progress PR about the remaining ~50 instances.

Eric Wieser (Apr 05 2024 at 10:22):

We should be careful with these; often the docstring needs editing at the same time

Eric Wieser (Apr 05 2024 at 10:23):

eg

/-- Does the foo -/
def foo := sorry

notation "F" => foo

should become

/-- Does the foo, with notation `F` -/
def foo := sorry

@[inherit_doc]
notation "F" => foo

or similar

Mario Carneiro (Apr 05 2024 at 10:56):

is this something that could be automatically handled by @[inherit_doc]?

Mario Carneiro (Apr 05 2024 at 10:57):

IIRC one issue with the current hover is that it doesn't actually show the constant's name

Joachim Breitner (Apr 05 2024 at 11:19):

Ah, I think I phrased my question badly. I guess I can figure out how to see if something has a docstring or not: But how do I got about finding out how often something appears in the mathlib core? (As a proxy for “most useful to document next”.)

Johan Commelin (Apr 06 2024 at 02:26):

Adam posted some code to answer a similar question by Kevin. Let me try to find it

Johan Commelin (Apr 06 2024 at 02:26):

https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/finding.20all.20uses.20of.20a.20theorem.20in.20mathlib

Joachim Breitner (Apr 06 2024 at 11:43):

Hmm, but that’s different: It goes through the definition’s values, i.e. the proof objects, to find constants used. This will find implicit things, and will not find, for example, tactics.

I am more interested in what people actually wrote in the source. In that thread Damiano says

if you get a hover information, then it is in the ilean

so I think that’s where I have to look, or near the implementation of the call hierarchy, once I am near a computer again.

Damiano Testa (Apr 06 2024 at 18:34):

Joachim, how would you like the information to be presented?

Since I am in this "linter phase", I can offer a linter that checks whether a declaration has/does not have a doc-string. Would that work for you?

Adam Topaz (Apr 06 2024 at 18:36):

@Joachim Breitner sounds like you need the info trees.

Damiano Testa (Apr 06 2024 at 18:37):

I am not sure that I understand what you are asking. Is it "if I were to document the most referenced declaration that does not have a docstring, which declaration would I chose?"

Damiano Testa (Apr 06 2024 at 18:53):

And, following from your reference to tactics, those hovers do not come from the ilean but from the info trees, I think, as Adam said.

Damiano Testa (Apr 06 2024 at 18:54):

(Incidentally, syntax linters do give you access to info trees...)

Joachim Breitner (Apr 06 2024 at 20:24):

Ok, then info trees it is :-)

Joachim, how would you like the information to be presented?

# undocumented.csv
name;occurrences
Lean.Parser.Command.def;14400
Lean.Parser.Tactic.assumption;12312
id;11231
Nat.succ;10233

(assuming these wouldn’t have a docstring, and numbers are made up of course. JSON fine too.)
Bonus points for filtering by package, e.g. only show definitions from core.
But of course I can do the let work if I can cargo cult “getting info trees for all modules in mathlib and traversing them” somewhere :-)

Damiano Testa (Apr 07 2024 at 06:26):

With this code:

theorem noDocs (h : True  True) : True := by
  cases h
  refine ?_
  exact True.intro
  exact .intro

/-- I have a doc-string -/
theorem withDocs (h : True  True) : True := by
  cases h
  refine ?_
  exact True.intro
  exact .intro

is this a reasonable output for the "tactic" information?

[noDocs] -- no docs
[(Lean.Parser.Term.byTactic, (753, 811)),
 (by, (753, 755)),
 (Lean.Parser.Tactic.cases, (758, 765)),
 (Lean.Parser.Tactic.tacticSeq, (758, 811)),
 (Lean.Parser.Tactic.refine, (768, 777)),
 (Lean.Parser.Tactic.exact, (780, 796)),
 (Lean.Parser.Tactic.exact, (799, 811))]

[withDocs]
[(Lean.Parser.Term.byTactic, (887, 945)),
 (by, (887, 889)),
 (Lean.Parser.Tactic.cases, (892, 899)),
 (Lean.Parser.Tactic.tacticSeq, (892, 945)),
 (Lean.Parser.Tactic.refine, (902, 911)),
 (Lean.Parser.Tactic.exact, (914, 930)),
 (Lean.Parser.Tactic.exact, (933, 945))]

I have not yet combined this with the Expr data, though.

Joachim Breitner (Apr 07 2024 at 15:27):

Good! Can you also make it so that True.intro shows up?

Damiano Testa (Apr 07 2024 at 15:34):

I can get True.intro and intro easily, if you want intro (corresponding to .intro) to become, then I will have to do some elaboration.

Damiano Testa (Apr 07 2024 at 15:35):

If the proof uses simp, do you want to dig out the proof term or just the syntax information? So, just simp or also the lemmas that simp used?

Damiano Testa (Apr 07 2024 at 15:35):

(Of course, of you use simp [x], then x would appear.)

Damiano Testa (Apr 07 2024 at 15:36):

Also, after posting that, I realized that the data for doc-string may not be embedded in the syntax, because add_decl_doc (or something similar), may add doc-strings after the fact.

Joachim Breitner (Apr 07 2024 at 15:38):

Damiano Testa said:

(Of course, of you use simp [x], then x would appear.)

That! Just all the explicitly mentioned tactics and lemmas, ideally resolved to their full Name.

Damiano Testa (Apr 07 2024 at 15:39):

Ok, I can probably do that, though I am not sure whether I will be able to do it today.

Joachim Breitner (Apr 07 2024 at 15:41):

No expectations :-)

Can you share the code that produced the output above?

Damiano Testa (Apr 07 2024 at 16:39):

I placed the code in a draft PR: #11991.

Damiano Testa (Apr 07 2024 at 16:41):

It is a linter, by default it is "on", but nothing imports it (except Mathlib.lean and Mathlib/Tactic.lean. If you want to see the effect, simpy add import Mathlib.Tactic.SyntaxDataLinter to your file.

Joachim Breitner (Apr 07 2024 at 16:55):

Thanks, that's already very helpful!

Damiano Testa (Apr 07 2024 at 16:58):

I am testing it out on Mathlib and there are quite a few exclusions that are probably helpful to make the output clearer. Am I right that you do not care about these?

`null
`by
`«;»
]»
``Lean.Parser.Tactic.tacticSeq
``Lean.Parser.Tactic.paren
``Lean.Parser.Tactic.tacticSeq1Indented
``Lean.Parser.Term.byTactic

(and possibly more, I am just skimming through some random files).

Damiano Testa (Apr 07 2024 at 16:59):

(In any case, I am adding some logic to exclude a list of SyntaxNodeKinds and you can then adapt as you want.)

Adam Topaz (Apr 07 2024 at 17:03):

@Damiano Testa @Joachim Breitner you should look at Scott’s lean training data repo. Specifically there’s a function there called TacticInfo.isSubstantive I think, and it probably does what you’re looking for

Damiano Testa (Apr 07 2024 at 17:09):

def isSubstantive (t : TacticInfo) : Bool :=
  match t.name? with
  | none => false
  | some `null => false
  | some ``cdot => false
  | some ``cdotTk => false
  | some ``Lean.Parser.Term.byTactic => false
  | some ``Lean.Parser.Tactic.tacticSeq => false
  | some ``Lean.Parser.Tactic.tacticSeq1Indented => false
  | some ``Lean.Parser.Tactic.«tactic_<;>_» => false
  | some ``Lean.Parser.Tactic.paren => false
  | _ => true

Adam Topaz (Apr 07 2024 at 17:23):

I think even that is missing whatever the name is associated to have and let

Adam Topaz (Apr 07 2024 at 17:23):

Maybe others

Damiano Testa (Apr 07 2024 at 17:26):

Yes, I think that there are a few uncontroversial syntax nodes that you almost always want to exclude, but others that are more goal-specific. I'll DM Joachim now, to avoid spamming the general thread. If anyone wants in, simply let me know! :smile:

Adam Topaz (Apr 07 2024 at 17:28):

I think it would be useful for various things if we could have a complete list of “nonsubstantive” things

Damiano Testa (Apr 07 2024 at 17:29):

Ok, I can give an update, once I find more. There are some nodes, like

Lean.Parser.Tactic.withAnnotateState
`Lean.Parser.Tactic.withReducible

that rw brings in: do they count?

Damiano Testa (Apr 07 2024 at 17:30):

rw also brings in Lean.Parser.Tactic.rwRuleSeq (inside a Lean.Parser.Tactic.rwSeq node).

Adam Topaz (Apr 07 2024 at 17:34):

You can also check is the syntax is original and I think that would take care of such things?

Damiano Testa (Apr 07 2024 at 17:34):

Ah, I'll add that filter!

Damiano Testa (Apr 07 2024 at 17:53):

The original filter works very well! On top of Scott's isSubstantive, I think that further additions may include

`by
`«;»
]»
`«<;>»

(and counting).

Joachim Breitner (Apr 07 2024 at 19:10):

I don't think id exclude let and have. There is certainly something to be told to the hovering about them!

Better err on the side of inclusion, it's better to skip documenting something trivial than to miss documenting something that's worth documenting (certainly I won't mind an explanation of <;> when I first see it.)

Joachim Breitner (Apr 08 2024 at 08:29):

Here is my current attempt: It uses ileans, which is a very quick way to get references without having to re-elaborate the file, although it means we are missing out on tactics and other syntax elements. I assume that patching Lean.Server.identOf on a draft PR to include ofTacticInfo and ofCommandInfo, letting CI create a test CI and then running this script will fix that. A bit ad-hoc, but a pragmatic way to get the data.

I added this to mathlib’s script folder and am running it with

~/build/lean/mathlib4 $ lake exe occCount ./.lake/build/lib/**/*.ilean | tee undocumented.csv

which produces

~/build/lean/mathlib4 $ head -n 20 undocumented.csv
Iff.trans,1515
Iff.rfl,1435
Function.comp_apply,993
Ne.symm,946
eq_comm,861
Iff.symm,856
mt,809
if_neg,744
Subsingleton.elim,598
if_pos,592
exists_prop,526
dif_pos,477
dif_neg,414
Fin.succ,385
Classical.choose_spec,346
eq_self_iff_true,346
Or.resolve_left,343
ne_eq,326
and_imp,288
not_false_iff,287

which looks plausible. (Code also on branch joachim/occCount.)

Damiano Testa (Apr 08 2024 at 08:41):

There is hover information for tactics already in VSCode, so maybe the information about the location of tactics is also already stored somewhere...

Joachim Breitner (Apr 08 2024 at 08:42):

It is in the info tree (and the environment)! But it’s thrown away when the info trees are “condensed” into the .ilean files. It would be straight-forward, but just very slow, to elaborate each file to get the info tree. But I am inpatient, so if possible I’d like to just read the .ilean files instead :)

Damiano Testa (Apr 08 2024 at 08:44):

Ok, you can also work on documenting the lemmas, while the linter produces the list of tactics and then extract the data from the tactics as well! :upside_down:

Damiano Testa (Apr 08 2024 at 08:46):

In any case, I think that for tactics, the biggest gap in tooling is making doc-gen aware of the output of #help tactic.

Joachim Breitner (Apr 08 2024 at 13:27):

It wasn’t too bad to hack lean to include syntax nodes in this report, now I get

~/build/lean/mathlib4 $ head -n 20 undocumented2.csv
Init.NotationExtra,cdotTk,35292
Init.Core,Iff.trans,1494
Init.Core,Iff.rfl,1429
Init.Conv,Lean.Parser.Tactic.Conv.convSeq,1344
Init.Conv,Lean.Parser.Tactic.Conv.convSeq1Indented,1333
Init.Core,Function.comp_apply,955
Init.Core,Ne.symm,913
Init.Core,Iff.symm,848
Init.Core,eq_comm,835
Init.Core,mt,730
Init.Core,if_neg,665
Init.Core,Subsingleton.elim,584
Init.Core,if_pos,559
Init.PropLemmas,exists_prop,515
Init.Core,dif_pos,456
Init.Core,dif_neg,399
Init.Data.Fin.Basic,Fin.succ,381
Init.Classical,Classical.choose_spec,343
Init.Prelude,Or.resolve_left,326
Init.Core,eq_self_iff_true,319

Mario Carneiro (Apr 08 2024 at 13:30):

can you exclude theorems from the list?

Mario Carneiro (Apr 08 2024 at 13:30):

There are other things like convSeq1Indented that aren't really hoverable, not really sure how to express that though

Joachim Breitner (Apr 08 2024 at 13:36):

That’s fine, at least for me I only need a sorted TODO list, it’s ok if not all are actionable.

Mario Carneiro (Apr 08 2024 at 13:38):

I mean, start with Init and work your way up? Almost everything is undocumented, it's really easy to find things to work on

Mario Carneiro (Apr 08 2024 at 13:39):

Functions on List, String, Array are particularly important (i.e. the part of Std which is actually Init)

Joachim Breitner (Apr 08 2024 at 13:40):

Without theorems:

Init.NotationExtra,cdotTk,35292
Init.Conv,Lean.Parser.Tactic.Conv.convSeq,1344
Init.Conv,Lean.Parser.Tactic.Conv.convSeq1Indented,1333
Init.Data.Fin.Basic,Fin.succ,381
Init.Classical,Classical.choose,251
Init.MetaTypes,Lean.Meta.Simp.Config.contextual,226
Init.Control.Basic,Alternative.failure,216
Init.Prelude,Prod.mk,191
Init.WF,WellFounded,179
Init.Data.Nat.Gcd,Nat.gcd,177
Init.Data.Option.Basic,Option.isSome,128
Init.Prelude,Nat.casesOn,113
Init.Control.Basic,guard,113
Init.Data.Repr,Repr,103
Init.Data.List.BasicAux,List.get?,103
Init.Meta,Lean.TSyntax.getId,95
Init.Data.Array.Basic,Array.map,88
Init.Data.Ord,Ordering.lt,86
Init.WF,Acc,79
Init.Meta,Lean.mkIdent,78

Mario Carneiro (Apr 08 2024 at 13:41):

yep it's almost all true positives now

Joachim Breitner (Apr 08 2024 at 13:53):

Ah, and I should include Lean.* in the list to get a better picture of syntax missing a docstring:

~/build/lean/mathlib4 $ head -n 30 undocumented2.csv
Lean.Parser.Command,Lean.Parser.Command.declaration,150198
Lean.Parser.Term,Lean.Parser.Tactic.tacticSeq1Indented,135514
Init.NotationExtra,cdotTk,35292
Lean.Parser.Command,Lean.Parser.Command.variable,23193
Lean.Parser.Command,Lean.Parser.Command.end,16660
Lean.Parser.Command,Lean.Parser.Command.section,9454
Lean.Parser.Command,Lean.Parser.Command.open,8132
Lean.Parser.Command,Lean.Parser.Command.namespace,7387
Lean.Parser.Command,Lean.Parser.Command.moduleDoc,7120
Lean.Parser.Command,Lean.Parser.Command.in,4213
Lean.Parser.Command,Lean.Parser.Command.set_option,4047
Lean.Parser.Attr,Lean.Parser.Attr.simple,2709
Lean.Parser.Command,Lean.Parser.Command.attribute,1511
Lean.Parser.Command,Lean.Parser.Command.universe,1452
Init.Conv,Lean.Parser.Tactic.Conv.convSeq,1344
Init.Conv,Lean.Parser.Tactic.Conv.convSeq1Indented,1333
Lean.Parser.Command,Lean.Parser.Command.noncomputableSection,855
Lean.Parser.Attr,Lean.Parser.Attr.instance,655
Lean.Parser.Syntax,Lean.Parser.Command.notation,568
Lean.Meta.Basic,Lean.Meta.MetaM,517
Init.Data.Fin.Basic,Fin.succ,381
Lean.Parser.Syntax,Lean.Parser.Command.mixfix,265
Init.Classical,Classical.choose,251
Lean.Parser.Syntax,Lean.Parser.Command.syntax,246
Init.MetaTypes,Lean.Meta.Simp.Config.contextual,226
Lean.Linter.Deprecated,Lean.Linter.linter.deprecated,225
Init.Control.Basic,Alternative.failure,216
Lean.Elab.AutoBound,Lean.Elab.autoImplicit,213
Init.Prelude,Prod.mk,191
Lean.Parser.Command,Lean.Parser.Command.export,179

Joachim Breitner (Apr 08 2024 at 14:04):

Mario Carneiro said:

There are other things like convSeq1Indented that aren't really hoverable, not really sure how to express that though

I can get the hover text

A sequence of tactics in brackets, or a delimiter-free indented sequence of tactics. Delimiter-free indentation is determined by the first tactic of the sequence.

from docs#Lean.Parser.Tactic.tacticSeq whenever a tactic doesn’t have its own docstring, as in

macro "foo" : tactic => `(tactic|skip)
example : True := by
  foo

so it wouldn’t be unreasonable to get the same behavior from convSeq. Although a convSeq will commonly be part of a conv, and then that’s docstring will show on hover. I guess that’s what you mean with “not really hoverable”.

Hmm, and now I suddenly cannot reproduce seeing the tacticSeq docstring anymore, and am confused.

Anyways, not so important.

Mario Carneiro (Apr 08 2024 at 14:06):

Ah, I believe you can get convSeq under certain conditions, but convSeq1Indented will be harder

Joachim Breitner (Apr 08 2024 at 14:06):

It seems that commands never show the generic docstring on hover. I wonder if that is something worth improving? A user might expect to be able to hover over def or example or variable and get an explanation what this command does?

Mario Carneiro (Apr 08 2024 at 14:07):

Oh yeah all commands need docs

Mario Carneiro (Apr 08 2024 at 14:07):

Lean.Syntax.Command.foo

Joachim Breitner (Apr 08 2024 at 14:08):

But there inductive has a docstring, but it doesn't seem to be displayed when hovering, so besides more docstrings there is something to be fixed, isn’t it?

Joachim Breitner (Apr 08 2024 at 14:10):

Also, «def» does not show up in my list, which is strange.

Mario Carneiro (Apr 08 2024 at 14:10):

Joachim Breitner said:

But there inductive has a docstring, but it doesn't seem to be displayed when hovering, so besides more docstrings there is something to be fixed, isn’t it?

it shows up only with import Lean apparently?

Mario Carneiro (Apr 08 2024 at 14:11):

I was already aware of this limitation for go-to-def but normally hover docs still show up for builtin syntax

Mario Carneiro (Apr 08 2024 at 14:12):

Oh I see, the issue is that declaration is undocumented

Mario Carneiro (Apr 08 2024 at 14:12):

inductive is not actually a top level command

Mario Carneiro (Apr 08 2024 at 14:13):

of course if you put a doc comment on declaration it will be about as generic as the parentheses doc comment

Mario Carneiro (Apr 08 2024 at 14:14):

so indeed this is an area where just adding doc strings is probably insufficient to get a good result

Joachim Breitner (Apr 08 2024 at 14:17):

We do get the docstring on inductive after import Lean, so I assume we would get specific docstrings on def etc, even without putting an unhelpfully generic docstring on declaration.

I don’t know yet what needs to be done so that the existing docstring on inductive is shown even without import Lean. Reported as https://github.com/leanprover/lean4/issues/3842.

Joachim Breitner (Apr 08 2024 at 14:18):

Also it shows a short coming in my script. If the user would get the hover from inductive, and ideally eventually from def, then the script should complain about Lean.Parser.Command.def instead of Lean.Parser.Command.declaration.

Joachim Breitner (Apr 08 2024 at 14:20):

Fun fact: The script reports

Lean.Parser.Command,Lean.Parser.Command.theorem,1

due to a mention as an explicit name:

    stx.setKind ``Parser.Command.theorem

Mario Carneiro (Apr 08 2024 at 14:50):

yes, I think any parser can show up as a hover target if it is used directly in syntax definitions

Mario Carneiro (Apr 08 2024 at 14:51):

I think it would be good to document the pp* parsers in particular

Joachim Breitner (Apr 08 2024 at 14:59):

Marc walked me through the hover code, and it’s looking for docstrings both in the info tree (where my script can grab it) and the syntax tree (where my script can’t). That explains where the inductive hover came from, even though it's not mentioned in the info tree.

Mario Carneiro (Apr 08 2024 at 15:01):

oh yeah most syntax docs come from the syntax tree itself

Joachim Breitner (Apr 08 2024 at 15:11):

I see, and there we don’t really need every syntax kind to be documented, as long as some “containing” syntax node has a good docstring. So it’s not easy to list all the syntax nodes that should be documented (besides that my current script is infotree-based).

Mario Carneiro (Apr 08 2024 at 15:17):

for "should", that's going to require some additional smarts indeed, but getting the closure of all syntax nodes that may need to be documented should not be that hard

Mario Carneiro (Apr 08 2024 at 15:18):

Long term though I think we want to have docs on everything and additional attributes to suppress those docs in hover if they would be covering up something more useful (e.g. declaration hiding def's docstring)

Mario Carneiro (Apr 08 2024 at 15:19):

because it's not good to have negative incentives on documentation

Joachim Breitner (Apr 08 2024 at 15:21):

e.g. declaration hiding def's docstring

Shouldn’t that already be the case since the hover comes from the “innermost” syntax parser that has a docstring?


Last updated: May 02 2025 at 03:31 UTC