Zulip Chat Archive

Stream: general

Topic: docgen scoped notations

Christian Merten (Nov 12 2024 at 14:03):

This question was probably asked here before, but I didn't find it on Zulip: Could doc gen open the namespace of a declaration before rendering it? In particular, I am looking at lemmas like docs#CategoryTheory.Limits.pullbackDiagonalMapIdIso_hom_fst. This is unreadable and I believe this is because CategoryTheory.«term_≫_» is scoped and hence not shown by the pretty printer if the namespace is not open.

Markus Himmel (Nov 12 2024 at 14:07):

As far as I understand (cc @Henrik Böving), the main obstacle for this is that there isn't a good place at the moment for the configuration which namespaces to open to live.

Henrik Böving (Nov 12 2024 at 14:08):

Indeed, I guess if we would record what namespaces were open when a theorem was declared in some env extension we could do it?

Christian Merten (Nov 12 2024 at 14:09):

Is "open the namespace of the declaration" not a good approximation of which namespace to open?

Henrik Böving (Nov 12 2024 at 14:10):

But there should be no reason for a heuristic solution here, this is a solvable problem if the elaborator cooperates

Markus Himmel (Nov 12 2024 at 14:11):

Christian Merten said:

Is "open the namespace of the declaration" not a good approximation of which namespace to open?

If I understand your proposal correctly, this would lead to bad output in the Algebra/Category directory of mathlib.

Christian Merten (Nov 12 2024 at 14:14):

Ah yes, so it is a very rough approximation.

Christian Merten (Nov 12 2024 at 14:16):

What about, open all namespaces of terms appearing in the definition?

Damiano Testa (Nov 12 2024 at 14:17):

Christian Merten said:

What about, open all namespaces of terms appearing in the definition?

Would this mean to print just the last segment of every constant?

Christian Merten (Nov 12 2024 at 14:18):

Damiano Testa said:

Christian Merten said:

What about, open all namespaces of terms appearing in the definition?

Would this mean to print just the last segment of every constant?

You can do open scoped <namespace> to avoid this, no?

Damiano Testa (Nov 12 2024 at 14:19):

Oh, I thought that you were referring to the documentation pages.

Henrik Böving (Nov 12 2024 at 14:20):

https://github.com/leanprover/lean4/issues/6050 recorded as an RFC

Christian Merten (Nov 12 2024 at 14:21):

Damiano Testa said:

Oh, I thought that you were referring to the documentation pages.

I was, then I am probably misunderstanding something. Can you not do an equivalent of open scoped before rendering there?

Damiano Testa (Nov 12 2024 at 14:24):

I guess that I was running under the assumption that the documentation was not checking what was open/open scoped and the RFC is precisely to add such awareness. So, I guess that we were both discussing the same topic, but I was not assuming the RFC and you were! :smile:

Kyle Miller (Nov 13 2024 at 15:58):

@Damiano Testa Christian is suggesting that, without the RFC, the heuristic that docgen could do is "when processing a declaration, first set up the state that is the equivalent to doing open scoped namespaceOfTheDecl". There's no need for awareness of the namespaces opened at the time of a declaration's definition with Christian's proposal, but it's an approximation.

Christian Merten (Nov 13 2024 at 16:06):

Kyle Miller said:

Damiano Testa Christian is suggesting that, without the RFC, the heuristic that docgen could do is "when processing a declaration, first set up the state that is the equivalent to doing open scoped namespaceOfTheDecl". There's no need for awareness of the namespaces opened at the time of a declaration's definition with Christian's proposal, but it's an approximation.

(and since it was pointed out above that namespaceOfTheDecl is not sufficient, it could be open scoped namespaceOfTerm for every Term appearing in the statement of Decl)

Damiano Testa (Nov 13 2024 at 16:09):

Ah, ok, I understand now: sorry for creating confusion with my confusion! :upside_down:

Winston Yin (尹維晨) (Nov 20 2024 at 10:20):

Even that is not sufficient, as open scoped Manifold is widely used but the Manifold namespace has only two very specialised declarations.

Robert Maxton (Apr 04 2025 at 00:04):

... Huh, this was only a recent thing? Because e.g. docs#CategoryTheory.whiskeringLeftObjCompIso has been Functor C D and F.comp G for ages; in fact I don't think any of the category theory notations have been showing up in docs for a while. I'd just assumed that notation didn't work in docs in general, but apparently that's just a category theory thing?

Kyle Miller (Apr 04 2025 at 00:07):

The docs don't make use of scoped notation @Robert Maxton. The problem today was that even notations like for Iff, Eq, or HAdd.hAdd weren't being used.

Jovan Gerbscheid (Apr 04 2025 at 00:07):

Can we do something about this though? it's quite cumbersome to look at CategoryStruct.comp

Robert Maxton (Apr 04 2025 at 00:08):

Ahh, I see. That makes... well okay, it's a good explanation, but -- yeah it'd be quite nice if that were otherwise, I can't say that it 'makes sense' as a design choice :p

Notification Bot (Apr 04 2025 at 00:17):

4 messages were moved here from #mathlib4 > No notation in #docs by Kyle Miller.

Kyle Miller (Apr 04 2025 at 00:18):

I moved the discussion here @Robert Maxton.

Robert Maxton (Apr 04 2025 at 00:18):

so I see!

Robert Maxton (Apr 04 2025 at 00:24):

Oh, hey, there is an RFC, that's helpful.

I'm surprised that this isn't as simple as "record the def statement exactly as produced by the user during normal compilation and then transform it", though, since doc comments are part of the underlying language/not just normal whitespace comments anyway...?

Kyle Miller (Apr 04 2025 at 01:03):

Are you saying that you're surprised that docgen tries to pretty print declaration signatures using the actual computed types rather than trying to transform the source code directly?

Robert Maxton (Apr 04 2025 at 01:04):

Kyle Miller said:

Are you saying that you're surprised that docgen tries to pretty print declaration signatures using the actual computed types rather than trying to transform the source code directly?

Sort of? More that it tries to do something other than "work through the existing display channels, in particular the standard delaborator"

Robert Maxton (Apr 04 2025 at 01:06):

Generate the docgen data in the same pass as normal compilation

Kyle Miller (Apr 04 2025 at 01:07):

Ok, so you're surprised that docgen decides to create the documentation by loading all the compiled oleans and pretty printing the results?

Kyle Miller (Apr 04 2025 at 01:08):

One good reason not to do this is that it's a lot faster to create the documentation than to compile mathlib. For example, downstream projects can compile their own project and generate full documentation with mathlib documentation without compiling mathlib themselves.

Kyle Miller (Apr 04 2025 at 01:09):

(Are you surprised because you think nobody thought about doing documentation generating in the same pass as compilation? Or that the idea was discarded early on?)

Robert Maxton (Apr 04 2025 at 01:09):

Kyle Miller said:

Ok, so you're surprised that docgen decides to create the documentation by loading all the compiled oleans and pretty printing the results?

That sounds about right, yeah. Since Lean notation can get so elaborate and also so integral to its purpose, it seems like it should be a bit more of a 'first class citizen'

Robert Maxton (Apr 04 2025 at 01:10):

Kyle Miller said:

(Are you surprised because you think nobody thought about doing documentation generating in the same pass as compilation? Or that the idea was discarded early on?)

Either would be surprising, but I suppose mostly the second!

Kyle Miller (Apr 04 2025 at 01:14):

Given that you seem to know about how scoped notations work, I'm surprised that your opinion is "I can't say that it 'makes sense' as a design choice". The current way it works is the obviously-not-incorrect approach. It's what you see if you import Mathlib and #check the declaration. It would be surprising if that used notations you couldn't use yourself, right?

Kyle Miller (Apr 04 2025 at 01:15):

It's not uncommon also for new notations to be defined later. Consider all the definitions in the prelude (docs#Nat.le_trans is one). Should the documentation print that using only the available notations? That's what your suggestion would result in. LE.le n m → LE.le m k → LE.le n k

Robert Maxton (Apr 04 2025 at 01:27):

Kyle Miller said:

Given that you seem to know about how scoped notations work, I'm surprised that your opinion is "I can't say that it 'makes sense' as a design choice". The current way it works is the obviously-not-incorrect approach. It's what you see if you import Mathlib and #check the declaration. It would be surprising if that used notations you couldn't use yourself, right?

Sorry if that came off harsh. I don't think this is "obviously not incorrect", though, as it results in surprisingly illegible docs precisely when you need them most.

In general, my position is that digital environments should present as 'real', concrete environments by default and deviate from that only with a positive reason to be otherwise. In this case, when I work in Lean, I set up my elaborators and delaborators and spend time on my notation, and when I'm done I sit back and see that indeed I can type math fluently into my code and, just as importantly, that Lean uses it when reporting back to me; I feel that Lean "understands my notation". It is therefore surprising when I go use the docgen feature, which again is hooking into something I've been building "along the way" as I work, only to discover that it apparently sees an entirely** different environment than I do. To borrow a term, it breaks immersion.

** Yes, I know it's not actually an entirely different mechanism, the point is about user experience rather than the actual underlying code

Robert Maxton (Apr 04 2025 at 01:29):

Kyle Miller said:

It's not uncommon also for new notations to be defined later. Consider all the definitions in the prelude (docs#Nat.le_trans is one). Should the documentation print that using only the available notations? That's what your suggestion would result in. LE.le n m → LE.le m k → LE.le n k

While of course the maximally correct option is "having a place to specify the context in which documentation is generated", yes, I would actually prefer that. It would be annoying, but the prelude is only one, relatively short file, the set of places where Mathlib uses a concept far ahead of its notation are only a few short files out of thousands. Furthermore, it would adhere to the above principle of consistency: it would match what I see when I open the prelude in Lean, as presumably for that exact reason I wouldn't be able to use or read those notations while editing.

Jireh Loreaux (Apr 04 2025 at 02:38):

In general, I would be in favor of coming up with a way to display some scoped notations in doc-gen. However, I don't think it can just be: display all scoped notations, because they might conflict.

Yaël Dillies (Apr 04 2025 at 05:26):

I can confirm that it is a great impediment to my productivity that the docs don't show category theory notation!

Yaël Dillies (Apr 04 2025 at 05:33):

Jireh Loreaux said:

In general, I would be in favor of coming up with a way to display some scoped notations in doc-gen. However, I don't think it can just be: display all scoped notations, because they might conflict.

In Lean 3, doc-gen opened all locales and it was working great from my point of view. Personally I think that was a more correct design than what we have now. Even more correct would be to pretty-print file X with the scopes opened in X.lean. If that results in ambiguous notation:

It's usually pretty clear from context which notation is meant. Biggest counterexample to that is probably docs#CategoryTheory.Functor.LaxMonoidal.ε with docs#Mon_Class.mul, which I will remark is an annoying pair even outside the context of pretty-printing!
In the docs, you can click on notation to see what it stands for.

Yaël Dillies (Apr 04 2025 at 05:37):

Here's a tentative algorithm: When creating documentation for file X, scan for open ... and for open scoped ..., open the relevant scopes, pretty-print each declaration in X

Yaël Dillies (Apr 04 2025 at 05:40):

An alternative (I am not clear whether it's better or worse than the previous, but it should result in equal docs in most cases): When creating documentation for file X, scan the .lean file for namespace, section, open ..., and figure out from this information which declaration had which scope open, pretty-print each declaration according to which scopes were opened then

Yaël Dillies (Apr 04 2025 at 05:43):

I am not clear about whether sections and namespaces are recorded in oleans. My understanding is that oleans contain quite a huge lot of stuff which we almost never use (eg proof terms are only used in leanchecker), so including the data of section, namespace and open doesn't sound like a big ask.

Yaël Dillies (Apr 04 2025 at 05:46):

Kyle Miller said:

It's not uncommon also for new notations to be defined later. Consider all the definitions in the prelude (docs#Nat.le_trans is one). Should the documentation print that using only the available notations? That's what your suggestion would result in. LE.le n m → LE.le m k → LE.le n k

To clarify, with the algorithm I've just offered, documentation would display global notation that's defined later, scoped notation iff the scope already existed and was open then, and not display local notation. That seems morally correct to me

Yaël Dillies (Apr 04 2025 at 05:47):

My personal understanding was that we had to give up on displaying scoped notation in Lean 4 for transient technical reasons. Therefore I am surprised to hear that there was no plan to bring it back!

Kyle Miller (Apr 04 2025 at 06:00):

Yaël Dillies said:

Therefore I am surprised to hear that there was no plan to bring it back!

Where did you hear this? In this thread, there's an RFC about collecting the open information so that it's possible to open scoped notations for docgen.

Kyle Miller (Apr 04 2025 at 06:00):

lean4#6050

Yaël Dillies (Apr 04 2025 at 06:09):

Ah sorry, I missed that link. Then I am happy :smile:

Robert Maxton (Apr 04 2025 at 06:20):

Jireh Loreaux said:

In general, I would be in favor of coming up with a way to display some scoped notations in doc-gen. However, I don't think it can just be: display all scoped notations, because they might conflict.

If they would conflict in the docs, surely they must already conflict in the code, no?

Kyle Miller (Apr 04 2025 at 06:29):

No, because the point of scoped notation is that you can decide notations you want to enable. I've been confused about you saying notation should feel like a first-class citizen, when being able to enable and disable it like this is what that would mean — unless you mean something else by that?

Robert Maxton (Apr 04 2025 at 06:33):

Is there a way to disable them? AFAIK, you can decide what notations you want to enable, but only by opening various scopes, and the only way to 'close' them would be to close the entire section it's in. Maybe I've missed something?

Though, that said, I don't think it really matters to my argument -- either way, if you're tracking the context in which the (doc comment using the) notation is used, then either it will be ambiguous in both, or whatever means you can use to disambiguate in the codebase proper -- in this case, sectioning data (which already needs to be tracked for things like universes and variable declarations, I would have thought?) and opens -- and can be used disambiguate the docs.

Henrik Böving (Apr 04 2025 at 06:39):

The central issue is that doc-gen generates the documentation from the olean files by calling into various lean facilities. This is the correct way to go for various reasons pointed out above. The olean files in turn do not currently contain the information about which namespaces had been open during the declaration of a definition and thus it is impossible to recover this information in doc-gen right now. There is an RFC linked above which explains how we can get around these limitations by doing some work in core (namely tracking the open namespaces in the olean files through an environment extension). Which is to say that on a technical level this problem is solved, it is just not implemented.

Kyle Miller (Apr 04 2025 at 06:39):

There's a pareto frontier here @Robert Maxton. Docgen tends toward showing you the documentation for a library-as-it-is-imported. You have mentioned that you would rather have a pretty version of the declaration-headers-as-they-appear. These are both defensible.

Robert Maxton (Apr 04 2025 at 06:39):

... I suppose that RFC is precisely "no we are not currently tracking the namespaces opened", but then I get back to "huh, it's not just part of compilation?" Feels like doing it this way requires solving a nontrivial problem (the entire system of d/elaboration) twice

Robert Maxton (Apr 04 2025 at 06:39):

Kyle Miller said:

There's a pareto frontier here Robert Maxton. Docgen tends toward showing you the documentation for a library-as-it-is-imported. You have mentioned that you would rather have a pretty version of the declaration-headers-as-they-appear. These are both defensible.

Fair enough!

Henrik Böving (Apr 04 2025 at 06:40):

Robert Maxton said:

... I suppose that RFC is precisely "no we are not currently tracking the namespaces opened", but then I get back to "huh, it's not just part of compilation?" Feels like doing it this way requires solving a nontrivial problem (the entire system of d/elaboration) twice

The compiler tracks the namespaces, the information is just not persisted in the olean files, this is the case for a lot of information.

Last updated: May 02 2025 at 03:31 UTC