Zulip Chat Archive

Stream: general

Topic: Formalizing Chemical Theory

Tyler Josephson (Jul 07 2022 at 02:40):

We are excited to share a new Lean project! Introducing LeanChemicalTheories. We are a research group in Chemical Engineering who have been learning Lean 3 and using it to formalize proofs in science and engineering.

So far, we've formalized some proofs in adsorption, chemical reaction kinetics, thermodynamics, and molecular mechanics. In our longest proof, in which we formalize the derivation of BET adsorption theory, we invoke infinite geometric series. We've also formalized some undergraduate physics proofs, in particular the derivation of the kinematic equations from an introductory Physics course (we worked through these to figure out how to work through engineering calculus problems in Lean). We've made some progress on proofs in introductory transport phenomena (uses PDEs) and quantum mechanics (Hilbert spaces, complex numbers), and would love to do proofs in statistical mechanics, as well (random numbers, N-dimensional integrals, etc.). @Max Bobbin has been the main driver of the project, with important contributions from @Parivash, @Samiha Sharlin, and @Catherine Wraback.

We're still cleaning up our proofs and writing a manuscript for a chemical science audience, but we have made enough progress to share this with the Zulip community. Let us know what you think!

Tyler Josephson (Jul 07 2022 at 02:41):

The Zulip community and all the online training materials have been invaluable! Thanks especially to @Eric Wieser, @Kevin Buzzard, @Patrick Massot , @Johan Commelin, @Bryan Gin-ge Chen, @Yaël Dillies, @Violeta Hernández, and @Reid Barton for helping us with our many questions so far!

Tyler Josephson (Jul 07 2022 at 03:18):

One question we have up front is: how can we format our GitHub so it looks nice like the mathlib documentation, with a title and typeset preamble on top, with proofs to follow, such as this example?

Huỳnh Trần Khanh (Jul 07 2022 at 03:36):

I am not an expert and this should not be construed as expert advice. The mathlib documentation is generated with https://github.com/leanprover-community/doc-gen

Eric Wieser (Jul 07 2022 at 08:14):

There's an example of how to set doc-gen up on your own project at https://github.com/ImperialCollegeLondon/M40001_lean/pull/5/files

Eric Wieser (Jul 07 2022 at 08:41):

I assume this line is the formalization of BET you refer to?

Patrick Massot (Jul 07 2022 at 10:29):

This BET statement is extremely weird. What's the point of introducing all those variables that are defined in terms of each other anyway? There are also several unneeded assumptions. One is very easy to spot, you can type #lint at the bottom of this file and it will tell you that your assumption $V_0 \neq 0$ is never user. What the linter doesn't see however is that your assumption hsum and hsum2 are simply saying: "I don't want to do some parts of the proof, so let me it put them as assumptions".

Patrick Massot (Jul 07 2022 at 10:30):

Putting aside those hsum and hsum2 "assumptions", the mathematical content of that proof is:

theorem BET₁ {x : ℝ} (C : ℝ) (hx1: x < 1) (hx2 : 0 < x) :
  (∑' k : ℕ, (k + 1 : ℝ)*(x^(k+1) * C))/(1 + ∑' k, x^(k+1)* C) = C*x/((1 - x)*(1 - x + C*x)) :=
sorry

Patrick Massot (Jul 07 2022 at 10:31):

And by the way hx2 : -1 < x would be good enough.

Patrick Massot (Jul 07 2022 at 10:32):

I think it greatly would greatly help to first prove that mathematical part and then derive your original statement from it, for instance as in

theorem BET
(θ : ℕ → ℝ ){ k1 k2 kads kdes a b C P q A V₀ Vads x y c : ℝ}
--(hsum : summable (λ (b : ℕ), ↑b * θ b)) -- This is automatic
--(hsum2 : summable (λ (b : ℕ), θ b)) -- This is automatic
-- Equations
(h1 : ∀ k, θ (k+1) = (x ^ (k+1)) * C * θ 0)
(hA : A = ∑' (k : ℕ), θ k)
(hVads : Vads =  V₀ * ∑' (k : ℕ), k * θ k)
(hq : q = Vads/A)
-- Definitions
(hC: C = y/x)
(hy: y = k1/k2*P)
(hx: x = kads/kdes*P)
(hc: c = k1/k2)
(ha: a = V₀*c)
(hb: b = kads/kdes)
-- Constraints
(hx1: x<1)
(hx2 : 0 < x)
-- (hV₀ : V₀ ≠ 0)  This assumption was useless
(hθ : θ 0 ≠ 0)
:
q = a*P/((1-b*P)*(1-b*P+c*P))
:=
begin
  have hsum2 : summable θ,
  { refine (summable_nat_add_iff 1).mp _,
    simp only [h1, pow_succ', mul_assoc],
    exact (summable_geometric_of_lt_1 hx2.le hx1).mul_right _ },
  have hxnorm : ∥x∥ < 1, by refine abs_lt.mpr ⟨_, _⟩ ; linarith,
  have hsum : summable (λ k : ℕ, ↑k * θ k),
  { refine (summable_nat_add_iff 1).mp _,
    simp only [h1, ← mul_assoc],
    refine summable.mul_right _ (summable.mul_right _ _),
    set u := λ k :ℕ, (k : ℝ) * x ^ k,
    change summable (λ (n : ℕ), u (n+1)),
    refine (summable_nat_add_iff 1).mpr _,
    simpa using summable_pow_mul_geometric_of_norm_lt_1 1 hxnorm },
  have hy' : y = C*x, from  (div_eq_iff hx2.ne.symm).mp hC.symm,
  rw [hq, hVads, hA, ha, hb, hc, ← hx, mul_assoc, ←hy, hy', mul_div_assoc, tsum_eq_zero_add hsum, tsum_eq_zero_add hsum2],
  have : ∀ k : ℕ, (k + 1 : ℝ) * (x ^ (k + 1) * C * θ 0) = ((k + 1 : ℝ) * (x ^ (k + 1) * C)) * θ 0,
  by simp [mul_assoc],
  simp only [nat.cast_zero, zero_mul, zero_add, nat.cast_one, one_mul, nat.cast_add, h1, this],
  rw [tsum_mul_right, tsum_mul_right,show ∀ z : ℝ, θ 0 + z * θ 0 = (1+z)*θ 0, by {intro z, rw [add_mul, one_mul] },
      mul_div_mul_right _ _ hθ, BET₁ C hx1 hx2, ← mul_div_assoc]
end

where I removed the unneeded assumptions.

Tyler Josephson (Jul 07 2022 at 11:18):

@Eric Wieser yep - that's the BET formalization.

@Patrick Massot Thanks for helping us shorten the proof! We wrestled a bit with the types and assumptions around hsum and hsum2 - nice to see a cleaner way to do it.

Part of the "weirdness" of the BET statement is that the variables that are defined have chemical meaning (better in-line comments and documentation will be coming). For example, A is the area of the surface and Vads is the amount of adsorbed gas, and q is the amount adsorbed per unit area. Some of these variables don't appear in the final expression, but are used when deriving it.

Patrick Massot (Jul 07 2022 at 11:21):

I understand they have chemical meaning of course. But they play no role in this statement. In isolation, this is only making things complicated. It would make sense if you had a full theory with several definitions and lemmas.

Patrick Massot (Jul 07 2022 at 11:25):

I also find it very suspicious that the formula for $\theta$ cannot be expressed in a single way including $\theta_0$ . Are you sure you formalized what you intended to formalize?

Patrick Massot (Jul 07 2022 at 11:28):

I mean your sequence satisfies $\theta_{k+1} = x\theta_k$ for all $k$ except for $k = 0$ where $\theta_{0 + 1} = xC\theta_0$ . Is that really what you intended?

Tyler Josephson (Jul 07 2022 at 12:03):

Brunauer38-BET-theory.pdf
The original paper is here, if you're curious. We're actually following a slightly different derivation in the current proof, but plan to modify this one to more closely follow the original (proving Eq. 26 first, and then Eq. 28).

Indeed, our sequence for $\theta$ is different for the first index compared to all others - this is part of the theory. For $k=0$, the layer of adsorbed molecules interact with the surface, and for $k > 0$, the molecules lay on top of other molecules. So, these layers have different interaction strengths, and require different constants (and a weird setup for the summation).

Tomas Skrivan (Jul 07 2022 at 12:44):

I am definitely cheering at your effort formalizing stuff from chemistry and physics. However, I'm really unsure about the style of your code.

I have randomly stumbled to the directory about the translational invariance of Lennard Jones potential.

The invariance is just a consequence that the the potential V around a particle is V(x) = f(|x-p|) where p is particle position. It can be generalized even more, but radially symmetric potentials are very common. Mathlib benefits greatly from keeping things very general such that it provides a good API to build on top of. The way your code/theorems are stated and formulated do not seem to support that

So for example, I would expect to define Lennard-Jones potential as a function and then state the properties of that function.

Now specifically about potentials and constitutive relations. For example, I would expect theorems like: if you have a function V(x, y) that is invariant under transformation V(R*x+t, R*y+t) for any R rotation matrix and any vector t. Then there is a function U s.t. V(x, y) = U(|x-y|). Of course this implication goes the other way and the special case it that LJ potential is translationaly invariant.

Even more generalization, this statement has to be some special case of a theorem from Classical Invariant theory. Unfortunately, I have always failed to learn this theory properly, so I can't point you to the exact theorem.

Tomas Skrivan (Jul 07 2022 at 12:47):

Also I really do not see the benefit of the theorem assumptions like hC := C = y/x. It is just a definition of C, isn't this what let bindings are for? That is the benefit of having it as an assumption?

Damiano Testa (Jul 07 2022 at 12:52):

Tomas Skrivan said:

Even more generalization, this statement has to be some special case of a theorem from Classical Invariant theory. Unfortunately, I have always failed to learn this theory properly, so I can't point you to the exact theorem.

Are you thinking of the First Fundamental Theorem? For instance, Theorem 10.2, p. 114 of this reference.

Tomas Skrivan (Jul 07 2022 at 12:59):

Yeah I think that would be it. At some point I probably understood the statement but now it is mostly gibberish to me :(

Damiano Testa (Jul 07 2022 at 13:02):

It is simply saying that a (polynomial) function that is invariant under the orthogonal group (i.e. the group of rotations) is a function of the scalar products of the involved arguments. It takes a bit of unpacking, but it is saying that the only meaningful quantities to talk about that are unchanged by rotations are scalar products. The difference x-y in your formula is a consequence of translations.

Tomas Skrivan (Jul 07 2022 at 13:08):

Yes that is roughly what I remember it saying on high level.

However I was working with a function like: F(A, x y) = F(R*A*S, R*x, S*y) for every rotation matrix R, S and I totally failed at applying the FFT to this case. Or maybe it was even more complicated, I don't remember not important here.

The point is that the way it is formulated it quite impenetrable to a physicist or a chemist.

Damiano Testa (Jul 07 2022 at 13:11):

Oh, I totally agree with the impenetrability: not just for physicists or chemists, but for anyone who has not learned the classical formalism of invariant theory. The brackets (i|j) are a great example of shrouding an easy concept with an inscrutable symbol! Who would think that it means "take the scalar products"?

Eric Wieser (Jul 07 2022 at 14:25):

I think I've put my finger on what's weird about this code; it doesn't contain a single definition! (At least according to a github search)

Tyler Josephson (Jul 07 2022 at 15:44):

Tomas Skrivan said:

I am definitely cheering at your effort formalizing stuff from chemistry and physics. However, I'm really unsure about the style of your code.

I have randomly stumbled to the directory about the translational invariance of Lennard Jones potential.

The invariance is just a consequence that the the potential V around a particle is V(x) = f(|x-p|) where p is particle position. It can be generalized even more, but radially symmetric potentials are very common. Mathlib benefits greatly from keeping things very general such that it provides a good API to build on top of. The way your code/theorems are stated and formulated do not seem to support that

So for example, I would expect to define Lennard-Jones potential as a function and then state the properties of that function.

Now specifically about potentials and constitutive relations. For example, I would expect theorems like: if you have a function V(x, y) that is invariant under transformation V(R*x+t, R*y+t) for any R rotation matrix and any vector t. Then there is a function U s.t. V(x, y) = U(|x-y|). Of course this implication goes the other way and the special case it that LJ potential is translationaly invariant.

Even more generalization, this statement has to be some special case of a theorem from Classical Invariant theory. Unfortunately, I have always failed to learn this theory properly, so I can't point you to the exact theorem.

Ahh, great suggestions for the LJ potential - approaching this from a "function"-based perspective makes a lot of sense! We started with this overly-simple proof as we've learned Lean, with an aim to generalize to both translation and rotation invariance of LJ, and eventually to "all pairwise interaction potentials." Treating these as functions will be much more powerful. We'd love to leverage invariance in mathlib, but understanding that (and getting our types and definitions straightened out) has been a barrier so far.

Tyler Josephson (Jul 07 2022 at 15:45):

Tomas Skrivan said:

Also I really do not see the benefit of the theorem assumptions like hC := C = y/x. It is just a definition of C, isn't this what let bindings are for? That is the benefit of having it as an assumption?

Also great suggestion! @Max Bobbin - let's learn more how let works and try that out.

Max Bobbin (Jul 07 2022 at 16:15):

Eric Wieser said:

I think I've put my finger on what's weird about this code; it doesn't contain a single definition! (At least according to a github search)

I've tried to use def a couple of times but could never figure out exactly how or why to use it or the power, but I understand that we have a lot of places we could use it. Tomas gave the suggestion of using def for our Lennard-Jones potential and then writing theorems about that. I've read the Lean textbook and the def chapter, but I think I'm still missing some things about it. Do you have any examples you could point me towards or any suggestions to use def in our Github? From there I could go forward and start understanding def and learning it better

Tyler Josephson (Jul 07 2022 at 16:23):

Eric Wieser said:

I think I've put my finger on what's weird about this code; it doesn't contain a single definition! (At least according to a github search)

Good catch! Indeed, we haven't used def anywhere, yet - will learn more about these and try to improve the structure. We've mostly been introducing terms as premises, and drawing conclusions from them. We can definitely see some cases where that would be a better model for setting these up (define the LJ potential as a function, and then prove statements about it, rather than give it as a premise). But other proofs, like for the kinematic equations, I don't quite see how a definition would be appropriate. We're not trying to define position and prove the properties it has; we want to show that position, velocity, and acceleration, so defined, lead to this equation or that equation. What do you think?

Eric Wieser (Jul 07 2022 at 18:12):

I now remember that I gave up on looking at the kinematics stuff because the filename is missing .lean and so GitHub does not highlight it!

Max Bobbin (Jul 07 2022 at 18:13):

(deleted)

Max Bobbin (Jul 07 2022 at 18:14):

Yes, I just fixed those problems. We made some of the files in Github and copied the code over and forgot to add the .lean. That should be fixed right now, sorry for that

Max Bobbin (Jul 07 2022 at 18:18):

Also, besides learning about def, I also noticed that a big chunk of our proofs is algebra, I've learned a lot about different simplifying tactics that help reduce code lines, but is there any other suggestions you all have to help us elevate the headache of algebraic simplification in Lean? BET is a good example where we have a rw line with 20 or so theorems used just to simplify the addition of two division structures.

Brandon Brown (Jul 08 2022 at 02:02):

In CSTR_Balance.lean is it necessary to define new axioms?

Max Bobbin (Jul 08 2022 at 02:04):

@Brandon Brown No it isn't. That is very old, one of the first things I did where I was playing around with stuff. Those axioms are useless and easily can lead to problems. I just never got around to changing that

Tomas Skrivan (Jul 08 2022 at 05:44):

I would recommend switching to let bindings. That way, just simply calling simp should reduce the statement to Patrick's reformulation:

theorem BET₁ {x : ℝ} (C : ℝ) (hx1: x < 1) (hx2 : 0 < x) :
  (∑' k : ℕ, (k + 1 : ℝ)*(x^(k+1) * C))/(1 + ∑' k, x^(k+1)* C) = C*x/((1 - x)*(1 - x + C*x)) :=
sorry

Which should be fairly easy to prove. The left side is (modulo some constants), derivative of geometric series divided by geometric series. You can eliminate those sums as these series have explicit formulas. After that is is just a matter of algebra and ring tactic or something similar might do the job.

Max Bobbin (Jul 08 2022 at 16:08):

@Tomas Skrivan I see what you mean by using Let and I remember trying to use it before, but I couldn't get it to work last time because it started to get in an application of Types that I didn't understand. Would you mind elaborating on the application of let? If I wanted to use let to write hC : C = y/x, like you suggested, would it be something like:

let hC := C = y/x

I was also looking at using set, ie

set! C:= y/x with hC

but that created a new instance of C

Tomas Skrivan (Jul 08 2022 at 16:13):

You can use let like this:

example (m n : ℕ) :
    let lhs := m + n in
    let rhs := n + m in
    lhs = rhs :=
by simp

Tomas Skrivan (Jul 08 2022 at 16:19):

So the statement would look something like:

theorem BET (V₀ x y c : ℝ} (hx1 : x < 1) (hx2 : 0 < x) :
   a := ... in
   b := ... in
   P := ... in
   q := ... in
   q = a*P/((1-b*P)*(1-b*P+c*P))
:= ...

Eric Wieser (Jul 08 2022 at 16:23):

I think you missed the let

Eric Wieser (Jul 08 2022 at 16:24):

That should probably be

theorem BET (V₀ x y c : ℝ} (hx1 : x < 1) (hx2 : 0 < x) :
   let a := ...,
       b := ...,
       P := ...,
       q := ... in
  q = a*P/((1-b*P)*(1-b*P+c*P))
:= ...

Max Bobbin (Jul 08 2022 at 16:24):

Ohhh, I see. I was thinking you meant use Let in the actual proof part, but we can use let in the objective statement. That is interesting, thank you

Max Bobbin (Jul 08 2022 at 16:53):

So when I have this

  let a := V₀*C,
      b := x/P,
      c := y/P,
      q := Vads/A in
  q = a*P/((1-b*P)*(1-b*P+c*P))

it gets rid of the relationship premises, which cleans up the statement a lot, but now I'm at a point where I have a goal

 P ≠ 0

With the old proof, I could use linarith to show that this is true because I have 0 < x and an explicit relation between P and x through those premises. Is there a way to show that relation from the let statements?

Max Bobbin (Jul 08 2022 at 16:58):

Nevermind, you can use intro :laughing:, not sure why I thought it was more complicated then that

Max Bobbin (Jul 08 2022 at 18:07):

There is a new version of BET on the github that uses the suggestions given in this chain

Eric Wieser (Jul 08 2022 at 19:23):

Presumably

(hA : A = ∑' (k : ℕ), θ k)
(hVads : Vads =  V₀ * ∑' (k : ℕ), k * θ k)

can also be moved into the let?

Max Bobbin (Jul 08 2022 at 19:38):

That is true, just added that

Max Bobbin (Jul 08 2022 at 19:39):

Latest form

theorem BET_regression_form
(θ : ℕ → ℝ ){P V₀ x y: ℝ}
-- Equations
(h1 : let C := y/x in ∀ k, θ (k+1) = (x ^ (k+1)) * C * θ 0)
-- Constraints
(hx1: x<1)
(hx2 : 0 < x)
(hθ : θ 0 ≠ 0)
(hP : P ≠ 0)
:
  let a := V₀*y/P,
      b := x/P,
      c := y/P,
      Vads :=  V₀ * ∑' (k : ℕ), k * θ k,
      A :=  ∑' (k : ℕ), θ k,
      q := Vads/A in
  q = a*P/((1-b*P)*(1-b*P+c*P))
:=

Tomas Skrivan (Jul 08 2022 at 20:40):

You can also define the function/series θ with let. The initial value θ 0 has to be a parameter though.

Max Bobbin (Jul 08 2022 at 21:31):

I was running intro trouble implementing theta with let, and I wanted to try it through def.

variables {x θ₀ : ℝ}

def seq :  ℕ → ℝ
|(0:ℕ)           := θ₀
|(nat.succ n:ℕ) := x^(nat.succ n)*θ₀

This is my first time experimenting with def, but I think I set it up correctly. Only problem is Lean throughs a "don't know how to synthesize placeholder" error in the proof statement when I attempt to say seq k

Andrew Yang (Jul 08 2022 at 21:55):

seq takes in x and θ₀ as variables, but it does not know where to find them. It would be better if you make the variables explicit (via variables (x θ₀ : ℝ)), and write seq x θ₀ k instead.

Bolton Bailey (Jul 09 2022 at 03:49):

I don't see the need for cases, couldn't you define it as

import data.real.basic

variables (x θ₀ : ℝ)

def seq (n : ℕ) : ℝ := x^(n)*θ₀

Samiha Sharlin (Jul 27 2022 at 18:00):

Eric Wieser said:

There's an example of how to set doc-gen up on your own project at https://github.com/ImperialCollegeLondon/M40001_lean/pull/5/files

Hi Eric, I was trying to setup our project using the steps you shared here. It is getting stuck at the first step trying to install elan.. Do you know what's this about?
Screen-Shot-2022-07-27-at-1.51.07-PM.png

Mauricio Collares (Jul 27 2022 at 18:04):

Try using https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh instead of https://raw.githubusercontent.com/Kha/elan/master/elan-init.sh

Ruben Van de Velde (Jul 27 2022 at 18:04):

Huh. Can you copy/paste the URL that curl mentions?

Mauricio Collares (Jul 27 2022 at 18:06):

The problem is that there is a version missing in the curl url (between the two slashes).

Mauricio Collares (Jul 27 2022 at 18:06):

https://github.com/leanprover/elan/releases/download/v1.4.1/elan-x86_64-unknown-linux-gnu.tar.gz is a valid URL, https://github.com/leanprover/elan/releases/download//elan-x86_64-unknown-linux-gnu.tar.gz isn't

Mauricio Collares (Jul 27 2022 at 18:07):

But the snippet that determines the latest release changed in https://github.com/leanprover/elan/commit/c58659942e5ce83b91b91a4a55bbf5722e90c423, which is present in https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh but not in https://raw.githubusercontent.com/Kha/elan/master/elan-init.sh

Samiha Sharlin (Jul 28 2022 at 15:55):

Mauricio Collares said:

Try using https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh instead of https://raw.githubusercontent.com/Kha/elan/master/elan-init.sh

Thank you so much @Mauricio Collares and @Ruben Van de Velde ! The modified link works and got things to work out quite further. I am now having issues with generating html docs. Got some errors pertaining to our lean files earlier (incomplete proofs and inclusion of #check). After fixing all those, it still gets stuck building the html docs. This time I am not seeing any file name where the error is. I suppose it’s not the lean files anymore?
Screen-Shot-2022-07-28-at-11.51.50-AM.png

Mauricio Collares (Jul 28 2022 at 16:17):

See PR comment

Eric Wieser (Jul 28 2022 at 16:33):

I think that template might not work so well any more

Eric Wieser (Jul 28 2022 at 16:34):

I made the PR to Kevin's repo a long time ago, and doc-gen has changed in the meantime

Mauricio Collares (Jul 28 2022 at 16:39):

@Samiha Sharlin Might still be worth a try applying the second fix in https://github.com/ImperialCollegeLondon/M40001_lean/pull/5/files, because I get an empty json file without it and a valid json file with it. But like Eric said, other fixes might be required.

Eric Wieser (Jul 28 2022 at 16:40):

@Mauricio Collares, I pushed a different fix which is the one that I use at https://github.com/pygae/lean-ga

Mauricio Collares (Jul 28 2022 at 16:40):

Ah, I missed that! Thanks for pointing it out.

Samiha Sharlin (Jul 28 2022 at 20:05):

Mauricio Collares said:

Samiha Sharlin Might still be worth a try applying the second fix in https://github.com/ImperialCollegeLondon/M40001_lean/pull/5/files, because I get an empty json file without it and a valid json file with it. But like Eric said, other fixes might be required.

Thank you! It works with this second fix! (but not completely). <https://atomslab.github.io/LeanChemicalTheories/Adsorption/BETInfinite.html>
Is there something we need to add to our lean files so it connects to GitHub repo when we click source in the webpage.. ? That's the only part that didn't work I guess. I am also going to give @Eric Wieser 's method for lean-ga a try and see if that works for me!

Eric Wieser (Jul 28 2022 at 20:11):

Yes, you can fix the source links; see my lean-ga version for the code to do so

Eric Wieser (Jul 28 2022 at 20:11):

But it probably won't work if you copy the whole thing

Eric Wieser (Jul 28 2022 at 20:13):

This line is the one that sets the source link: https://github.com/ATOMSLab/LeanChemicalTheories/blob/fe31998d5d9dd4addb0e86f18f03d1cf210542a7/.github/workflows/main.yml#L76

Last updated: May 02 2025 at 03:31 UTC