Zulip Chat Archive

Stream: general

Topic: Preprint: LeanLJ

Tyler Josephson ⚛️ (Apr 28 2025 at 14:06):

Hi everyone! We've been working for a while on formalizing molecular dynamics simulations. We've drafted our first manuscript on this, where we perform Lennard-Jones energy calculations in Lean, compare with NIST benchmarks, and accompany the calculations with proofs. Take a look! We're planning to post the preprint on ChemRxiv and submit the paper to a molecular physics journal this week. LeanLJ.pdf Code available here.

Joseph Tooby-Smith (Apr 28 2025 at 14:28):

Hi Tyler! This looks great, congrats!

One quick comment: The licensing on the repo is a bit confusing. The attached license is Apache License 2.0, but at the bottom of the read me it says MIT license.

Yakov Pechersky (Apr 28 2025 at 16:43):

Very cool! I would suggest clarifying what noncomputable and executing means when you say

Because real numbers are noncomputable, Lean requires this definition to be prefaced with the noncomputable keyword. This would be an issue if we were executing this function in our calculations

Perhaps reinterpreting it as "noncomputable: compilable by Lean to generate code that the VM can run on the objects directly referenced in the function. That is why we intend to execute other versions of this function ..."

Eric Wieser (Apr 28 2025 at 18:07):

Some initial comments:

The use of ⟨0, by decide⟩ to write (0 : Fin 3) seems to be doing Lean a bit of a disservice; it makes it look like the system gets in your way more than it actually has to.
Regarding
def lj_p {α : Type} [LE α] [DecidableLE α] [HDiv α α α] [HPow α α α] [HSub α α α] [HMul α α α] [OfNat α 2] [OfNat α 4] [OfNat α 6] [Zero α] (r r_c ε σ : α) : α :=
it would probably make sense to bundle all of these together into a RealLike class, to keep things brief; that also lets you demonstrate defining a new typeclass.
In fig 5, "Bottom: the equivalent recursive implementation in Lean 4, which
mirrors the same logic using functional programming." - you could use Id.run do and for notation here to almost exactly match the python code.

Yaël Dillies (Apr 28 2025 at 21:20):

Is there any chance you can move the whole project to a stable release, eg v4.16.0 or v4.17.0? This will make it more likely that people will be able to depend on your project, and is generally good practice.
Can you make Figure 2 show a case where the periodic boundary condition and minimum image conventions differ? Currently, it feels like the figure is not conveying the point.
Minor but in the caption of Figure 3, "This plot was generated using Python and matplotlib, since data visualization in Lean is still experimental." reads more like a footnote than a caption.
The informal description preceding lj_real does not match its implementation: You are not defining r3 and instead you are defining r12 (which won't ever save any compute since r12 is used exactly once in the definition).
In the statement of ljp_eq_le, {r | r > 0 ∧ r ≤ r_c } has got one too many space at the end.
Just to hammer Eric's nail, "for- and while-loops, which are not supported by Lean" is factually incorrect. For loops exist in Lean, although they have a slightly different flavor than in imperative languages.

Yaël Dillies (Apr 28 2025 at 21:38):

"the real version" should be "the Real version"
"The proofs do not catch bugs like these, but the tests do.", probably worth mentioning CakeML at this point

Kevin Buzzard (Apr 29 2025 at 00:11):

Your code indentation is very inconsistent. If you were sticking with mathlib's conventions (which I think are the same conventions that core, batteries etc uses) then examples of errors are: p8 thing_to_be_proved is indented 5 spaces not 4, p10 lj_real... in the first box is indented 0 spaces not 4, and the first proof is indented 0 spaces not 2, in the second box lj_real is indented 1 space not 4, in the second box ContinuousOn is indented 2 spaces not 4, basically you are indenting a random number of spaces so it all looks very inconsistent, this goes on throughout the paper.

Tyler Josephson ⚛️ (Apr 29 2025 at 10:19):

These are great points, thanks everyone for the helpful feedback!

Re for/while loops - thanks for catching this! In an earlier draft, we actually mentioned do notation as a way to do imperative programming in Lean, but that was removed and soon the text didn’t reflect reality. We’ll clean that up.

It likely would have been easier if we had just used imperative style for the summations - but we made it recursive so we could do proofs. Would it be accurate to say that proving things with “for” and “do” in Lean is difficult? Do these come with termination proofs, like recursive functions do? Looking back, I notice we didn’t end up doing that much with the recursive function, as far as proofs are concerned. Our proofs are about other functions. Like we mentioned, it would have been nice to figure out a proof that this summation function loops over N*(N-1) pairs of interactions, but we didn’t figure that out.

Eric Wieser (Apr 29 2025 at 11:03):

Proving things about do and for isn't too bad, with caveats:

If your loop uses break or early returns then it is fiddler
The simp-normal form around Id.run is a slight mess, I have an open draft PR to fix this

Tyler Josephson ⚛️ (Apr 29 2025 at 13:30):

Did I have a misconception about whether certain loop-like functions should be expressed with do and for vs. recursion, in order to have proofs about them? I went hunting to find where I got this idea from, and this summarizes how I've been thinking about this. I had the impression that Lean allows imperative-style code, but obtaining the guarantees with this is harder than when you write code in the functional style.

Tyler Josephson ⚛️ (Apr 29 2025 at 13:39):

We make a rather big point about this in the paper, trying to teach our audience about recursion, but that's because I thought there was a benefit to that. If no benefit, then it's likely better to skip that point; it's harder to write code that way (at least for people who learn imperative languages first), and it's better not to demonstrate harder ways of doing things when our goal is to make this a gentle introduction.

Eric Wieser (Apr 29 2025 at 14:04):

If your loop-like function is really a (filtered) sum, then you should be using functions about sums and filtering sequences

Eric Wieser (Apr 29 2025 at 14:06):

I think your function in question can literally be written using the notation for Finset.sum, which will look more similar to the mathematics than even the Python

Tyler Josephson ⚛️ (Apr 29 2025 at 14:32):

That would be nice, indeed! Much closer to the mathematics. It would still be a learning curve / requiring translation from the traditional loop implementation.

Eric Wieser (Apr 29 2025 at 14:36):

But isn't the traditional loop implementation itself translated from LaTeX?

Tyler Josephson ⚛️ (Apr 29 2025 at 14:53):

Good question. I checked the two intro molecular simulation textbooks that I learned from (by Allen and Tildesley and by Frenkel and Smit), and both show typeset equations with summation, and both demonstrate in code (e.g. FORTRAN) using for loops. For instance, from Frenkel and Smit (this for loop code is about forces):
Screenshot 2025-04-29 at 10.40.03 AM.png
Screenshot 2025-04-29 at 10.42.15 AM.png

So, I suppose the convenience for scientists is that someone else has already translated the math into a FORTRAN loop implementation, and all they'd need to do is to translate that into their context (usually, another imperative language).

Eric Wieser (Apr 29 2025 at 16:02):

I'd argue that the convenience is that there is no translation, and you just transcribe $\sum_i x_i$ as ∑ i, x i

Eric Wieser (Apr 29 2025 at 16:04):

(I don't see where the n from that LaTeX went in the translation, or how "the term with $i = j$ is to be excluded when $\mathbf{n} = 0$ " is represented there)

Tyler Josephson ⚛️ (Apr 29 2025 at 17:08):

The above summation is a summation over all pairs in a periodic system. This is illustrated in revised Figure 2 (thanks @Yaël Dillies for point #5 above)

So the central green atom interacts with every purple and green atom in the extended system (which goes out infinitely), except for itself. That's what summing over i, j, and n does. It's halved to remove double-counting interactions from i to j with that from j to i.

Tyler Josephson ⚛️ (Apr 29 2025 at 17:11):

MID.png

Tyler Josephson ⚛️ (Apr 29 2025 at 17:15):

But let me highlight a better equation - this is closer to our implementation:
Screenshot 2025-04-29 at 1.12.18 PM.png

Tyler Josephson ⚛️ (Apr 29 2025 at 17:16):

If the interaction is truncated at a distance < L/2, then you only need to keep track of one pair. You no longer need the 1/2 or the condition around i = j when the condition over the summation is i < j.

Eric Wieser (Apr 29 2025 at 23:55):

Right, this is summing over docs#List.sym2 or docs#Finset.sym2

Eric Wieser (Apr 29 2025 at 23:56):

Or two nested sums over Ico 0 n and Ioo i n

Last updated: May 02 2025 at 03:31 UTC