Zulip Chat Archive

Stream: PhysLean

Topic: Documentation discussion

Joseph Tooby-Smith (Aug 11 2025 at 07:10):

Hey! Great, thanks @Daniel Morrison and @Shlok Vaibhav for showing an interest in this. As I mentioned above, I think it is best if this isn't 'led' by me, but I would be happy to get the ball rolling. In particular I think it would be wise to make a thread on this channel dedicated to documentation, come up with a goal and a way to track that goal.

Shlok Vaibhav (Aug 11 2025 at 07:25):

Yes, its better to just get started, could you suggest what kind of goals i.e. what are the expectations from the documentation from where the goals can be derived? For example, for the tight-binding chain lean file, the included theorems are already elaborated by you : https://notes.physlean.com/Condensed-matter/The-tight-binding-chain/#TightBindingChain
What more can be added here??

Joseph Tooby-Smith (Aug 11 2025 at 07:37):

I think the tight binding chain does have fairly good documentation already, although note that the page you linked to is actually generated externally from the project in another repo. It is these documentations:

https://physlean.com/docs/PhysLean/CondensedMatter/TightBindingChain/Basic.html

which represent the 'documentation' within the file.

My expectations for the documentation would be the following:

It should be possible to read the module doc-string and understand what is in the file, what the main aim of the file is, and any references.
The documentation throughout the module (including defn and theorem doc-strings) should be sufficient to understand what is being done in a given part of the file, and preferably why.
These where possible should be understandable with little knowledge of Lean.

I would be very happy to hear other expectations here.

To give some examples of files which do not meet my expectations (naturally choosing ones I wrote as examples):

Shlok Vaibhav (Aug 11 2025 at 08:21):

I can appreciate point 3 given my situation :sweat_smile: . But need some help understand points 1 and 2.
So, to start with, in the file on Eigenfunction of Harmonic Oscillator, the following sequence of theorems which culminate into the orthonormality of eigenfunctions
image.png

Considering take the first theorem, eigenfunction_mul, what needs to be added to this particular theorem to make this theorem well-documented
What needs to be added overall to make the entire section on orthonormality of oscillator eigenfunctions well-documented

Joseph Tooby-Smith (Aug 11 2025 at 08:37):

Ok, so IMHO the doc-string of the section could be improved to:

/-!

## Orthnormality

The eigenfunctions of the quantum harmonic oscillator are orthonormal to one another in the Hilbert space. The proof of this result follows from two helper lemmas:

- eigenfunction_mul
- eigenfunction_mul_self

which simplify the point-wise multiplication of two of the eigenfunctions. The first of these is used to help prove that the eigenfunctions are orthogonal in

- eigenfunction_orthogonal

and the second is used to prove that the eigenfunctions are normalized in

- eigenfunction_normalized

These two latter lemmas are combined in

- eigenfunction_orthogonal

to prove orthogonality.

-/

For the doc-string of eigenfunction_mul I would say an improvement is:

/-- The multiplication of two eigenfunctions of the quantum harmonic oscillator,
  simplified to combine the exponential factors and normalization factors appearing in
   the expressions for the eigenfunctions. This result is used to help prove
   orthogonality of the eigenfunctions.  -/

Shlok Vaibhav (Aug 11 2025 at 09:05):

Thanks! This seems a consice way.

Shlok Vaibhav (Aug 11 2025 at 09:22):

I will try this for tight binding file, whichever theorems don't have enough docstring there

Shlok Vaibhav (Aug 11 2025 at 15:48):

https://github.com/HEPLean/PhysLean/blob/b0f6665617508d2ed3cc19f8ee60f0e53cbdb1d4/PhysLean/CondensedMatter/TightBindingChain/Basic.lean#L105

Consider the above theorem which explains the application of hamiltonian on euclidean basis :

I thought of this comment structure:
First describe the theorem
Next describe the physlean defintions/lemmas used (This is taken from your text above where you did the opposite i.e locating the places where a certain lemma has been used, describing what all is being used for a certain lemma also lets you track the lemma dependencies backwards.)
Finally, i have described a summary of the tactics used, here i have given a very verbose description (I copied my own comments when i was understanding the tactics) which is neither sustainable nor useful. However i do feel that if the contributor higlights a brief summary of the strategy used, some non-trivial tactics etc, this will help people understand the code faster and will really help beginners like me a lot.

What does everyone think about this format??

/-- This lemma proves that Hamiltonian applied to the localized state `|n⟩`
  gives `T.E0 • |n⟩ - T.t • (|n + 1⟩ + |n - 1⟩)`.

  Physlean definitions/lemmas used: hamiltonian, localizedComp_apply_localizedState

  Brief description of strategy:
  a) simp tactics are used to simpify the effect of hamiltonian on the localized state |n⟩ with following tactics:
  hamiltonian: {T.E0 • ∑ i : Fin T.N, |i⟩⟨i| - T.t • ∑ i : Fin T.N, (|i⟩⟨i + 1| + |i + 1⟩⟨i|)}|n⟩
  LinearMap.sub_apply: {T.E0 • ∑ i : Fin T.N, |i⟩⟨i|}|n⟩ - {T.t • ∑ i : Fin T.N, (|i⟩⟨i + 1| + |i + 1⟩⟨i|)}|n⟩
  LinearMap.smul_apply: T.E0 • [{∑ i : Fin T.N, |i⟩⟨i|}|n⟩] - T.t • [{∑ i : Fin T.N, (|i⟩⟨i + 1| + |i + 1⟩⟨i|)}|n⟩]
  LinearMap.coeFn_sum: T.E0 • [{∑ i : Fin T.N, {|i⟩⟨i|}}|n⟩] - T.t • [{∑ i : Fin T.N, {(|i⟩⟨i + 1| + |i + 1⟩⟨i|)} |n⟩}]
  Finset.sum_apply: T.E0 • [∑ i : Fin T.N, {|i⟩⟨i| |n⟩}] - T.t • [∑ i : Fin T.N, {|i⟩⟨i + 1| |n⟩ + |i + 1⟩⟨i| |n⟩}]
  LinearMap.add_apply: T.E0 • [∑ i : Fin T.N, {|i⟩⟨i| |n⟩}] - T.t • [∑ i : Fin T.N, {|i⟩⟨i + 1| |n⟩} + ∑ i : Fin T.N, {|i + 1⟩⟨i| |n⟩}]
  smul_add: T.E0 • [∑ i : Fin T.N, {|i⟩⟨i| |n⟩}] - T.t • [∑ i : Fin T.N, {|i⟩⟨i + 1| |n⟩}] - T.t • [∑ i : Fin T.N, {|i + 1⟩⟨i| |n⟩}]

  b) Then congr tactic is used to break the equation into two subequations
  1. T.E0 • [∑ i : Fin T.N, {|i⟩⟨i| |n⟩}] = T.E0 • |n⟩ :
    - The `|n⟩` term is simplified using the lemma localizedComp_apply_localizedState.
  2. T.t • [∑ i : Fin T.N, {|i⟩⟨i + 1| |n⟩}] + T.t • [∑ i : Fin T.N, {|i + 1⟩⟨i| |n⟩}] = T.t • (|n + 1⟩ + |n - 1⟩)
      using simp, addition is distributed to get two terms and hence congr tactic is again used here to break the equation into two subequations
      The `|n + 1⟩` term and `|n - 1⟩` , both are simplified using the lemma we proved above - localizedComp_apply_localizedState
      and then Finset.sum_eq_single is used to simplify the summation.
  -/

Joseph Tooby-Smith (Aug 12 2025 at 07:37):

@Shlok Vaibhav and @Daniel Morrison , I moved the conversations regarding the documentation to this new thread.

Shlok, I like this plan :). I think it may be better if the Brief description of strategy didn't focus on the tactics, but rather gave an English description of the proof (maybe also with a note about the tactics which do certain steps in the proof). @Daniel Morrison opinions?

Daniel Morrison (Aug 12 2025 at 19:06):

I feel like that much information in a doc-string ends up being counter-productive in the long-term. I agree that level of detail is useful when trying to learn PhysLean but is excessive for a more seasoned user trying to get a quick idea of what the theorem says, and would significantly expand the length of files making everything harder to find. I think simple doc-strings for individual theorems (like eigenfunction_mul) are fine - there are other, better ways to help new users. For example, rather than listing the statements of every theorem used in the proof we teach new users how to search the documentation website for a specific theorem name to learn what it says. I think this is an instance of it's better for learning to give new users the tools to find the answer to their questions themselves rather than providing all that information directly. My worry is that using and understanding PhysLean does require knowledge of Lean, and trying to sidestep that is going to create more problems down the road.

Daniel Morrison (Aug 12 2025 at 19:16):

I think there are ways to retain some of that information in other ways. My thoughts are:

Make sure each statement has a doc-string, even if it is short
Expand (or create) the file/module doc-strings that summarize the file. Like Mathlib, we can have important results and links to resources. In PhysLean's case I would say that it would be good to include references to other files that are 'prerequisites' for understanding the file and a description of the formalization since it seems like we're running into a lot of cases where the usual understanding of the concept isn't amenable to formalization and we need to use something more complex.

Other resources outside the actual in-file documentation that might be helpful (which may or may not be within the scope of this effort)

List fundamental files that are a good place to start understanding how PhysLean works
List files/definitions/theorem which are highly technical and are meant to be used indirectly through other means (and therefore can be skipped by a new learner)
External notes as a place to describe the actual proofs

Matteo Cipollina (Aug 12 2025 at 19:25):

Something I have found useful is the overview file in mathlib and I would dream to have a similar file but possibly 5 times more detailed, to navigate mathlib, especially to gather in one doc API spread throughout the files (eg key API of a typical math domain is nornally spread through Data, Algebra, Topology, Analysis.Normed).
In this spirit It would be nice to have an overview file in PhysLean with the main API and the connections. It's likely too early as we have not yet reached that critical mass and complexity but it is the type of things that help, at least psychologically to orientate oneself. Also it would be nice to have some 'trajectories' of how API should be connected in the long run: 'medium term' TODOs alongside the immediate ones. Like in Relativity, how to complete Jaffe's notes and connect to CliffordAlgebra.
So I think I can try starting with these two tasks, and happy to collaborate :)

Daniel Morrison (Aug 12 2025 at 19:41):

I agree, that's what I was trying to get at but you said it better. :smile:

Joseph Tooby-Smith (Aug 13 2025 at 04:23):

Two comments:

The stats page shows how many definitions and lemmas have doc-strings: https://physlean.com/Stats
We have these notes: https://notes.physlean.com which can be taken in any direction people feel fit, if it helps. I haven't found the time to keep them anywhere near up-to date as I would have liked.

Joseph Tooby-Smith (Aug 13 2025 at 04:25):

(The definitions on that Stats page which do not have doc-strings correspond to instances, as the linters pick up 'def ...' which do not have them).

Joseph Tooby-Smith (Aug 18 2025 at 08:45):

Do people think it would help if we collated some examples of good documentation, which could be followed when trying to complete other docs. Similar to what @Shlok Vaibhav did above.

Joseph Tooby-Smith (Aug 26 2025 at 09:08):

I made this PhysLean#713 for tracking improvements to module documentation. In fact I think one can improve documentation without actually ever having to download Lean or clone the repo - it can be done entirely from the GitHub website. Anyone should feel welcome to edit that issue.

Joseph Tooby-Smith (Sep 15 2025 at 05:43):

Just to make a comment here:

I've been upgrading some of the documentation in PhysLean, and I think the following are on their way to being 'good examples'. Ideally if everything could be done in this way it would be great - but this would be a lot of work....

ZhiKai Pong (Dec 22 2025 at 11:34):

The updated docs in PhysLean#837 doesn't quite compile correctly for the latex parts, I'm guessing it's a multiline issue?

Joseph Tooby-Smith (Dec 22 2025 at 12:58):

This is annoying :(. I checked the equations are correct LaTeX expressions, so maybe this isn't enough.

ZhiKai Pong (Dec 22 2025 at 13:18):

I think @Shlok Vaibhav used the\begin{align} environment for the pendulum docs, maybe something similar might work here? (e.g. multiline)

Joseph Tooby-Smith (Dec 22 2025 at 13:36):

$$\frac{\partial^2 B_{ij}}{\partial t^2} = c^2 \sum_k \frac{\partial^2 B_{ij}}{\partial x_k^2} +
  \frac{1}{\epsilon_0} \left( \frac{\partial J_i}{\partial x_j} -
  \frac{\partial J_j}{\partial x_i} \right).$$

works but

$$\frac{\partial^2 B_{ij}}{\partial t^2} = c^2 \sum_k \frac{\partial^2 B_{ij}}{\partial x_k^2}
  + \frac{1}{\epsilon_0} \left( \frac{\partial J_i}{\partial x_j} -
  \frac{\partial J_j}{\partial x_i} \right).$$

does not. Only difference is the placement of the +.

(For my own future reference, I am using:

lake build PhysLean.Electromagnetism.Dynamics.IsExtrema:docs

to build the docs).

Joseph Tooby-Smith (Dec 22 2025 at 13:44):

Fixed here: PhysLean#840

Shlok Vaibhav (Dec 22 2025 at 14:22):

I have faced this issue before, the precaution is don't start a new line with + as it's counted as a symbol for the docgen text and not latex

Shlok Vaibhav (Dec 22 2025 at 14:23):

Don't start with * either

Shlok Vaibhav (Dec 22 2025 at 14:25):

I had listed it as one of the prompts that are to be given to llm before it can generate latex from informal equations in our original thread on lagrangian from l&l

Joseph Tooby-Smith (Dec 22 2025 at 14:28):

I wonder if this is something we could lint-against. Although seems a bit tricky, as it looks like it would require something like regex.

Last updated: Feb 28 2026 at 14:05 UTC