Zulip Chat Archive

Stream: Natural sciences

Topic: Examples of utility to empirical science

Castedo Ellerman (Dec 22 2025 at 20:49):

Thank y'all for this Zulip org/stream!

I am new to Lean and wondering about the ways Lean can be useful to scientists. By "scientist" I mean a researcher with empirical questions based on empirical data. By "empirical" I mean observed from the physical world external to our minds (and/or computers).

Does anybody have suggested reading on how the degree of formalization in Lean is useful to empirical science?

I can think of:

Formal verification of mathematical theorems and software used by scientists.
Computer assisted discovery of valid applied math theorems. Here an applied mathematician discovers that an applied math conjecture is a valid special case of some more general highly abstract proved theorem in pure math. For example, an applied mathematician might be only interested in 2D, but an existing theorem has been proved for a more general abstract space.
Using near-future query tools that enable a scientist to input mathematical statements/questions and get useful information to using that math towards empirical research. As a toy example imagine a scientist inputting something like "Is the mean of a {{foobar}} distribution equal to the median?" because the scientist is doing statistical inference on empirical data with the assumption of a {{foobar}} distribution. My assumption here is that Lean together with some Lean-aware query tool is the best tool for this type of query.
Communication and disambiguation of technical terms in applied mathematics by resolving written terms to formal definitions in a Lean library. A trivial toy example would be someone writing "natural numbers" and then referencing/linking to a formal definition that disambiguates whether zero is included. A useful example would be much less trivial.

Any other ideas on how formalizing math via Lean might be useful to answering empirical questions in empirical science?

Thank you,
Castedo Ellerman

Timothy Chow (Dec 23 2025 at 05:22):

You'll get different opinions depending on who you ask, but in my opinion, formalized mathematics (Lean or otherwise) has not yet matured to the point where it is "useful for answering empirical questions in empirical science." One can speculate about how it might become useful for empirical science in the future, and all your suggestions are reasonable possibilities, but we're not there yet.

What formalized mathematics brings to the table is an extremely high degree of mathematical rigor and reliability. You'll find plenty of scientists who will argue that that level of rigor is not necessary for, and may even get in the way of, empirical science. Even physics, which is probably the most mathematical of the empirical sciences, is full of derivations that mathematicians are currently unable to make rigorous even by "normal" mathematical standards, let alone the standards of formalized mathematics. Now, as a mathematician, I do think that rigor is important and has benefits for science, but they are mostly indirect benefits; moreover, we can reap most if not all of those benefits from "normal" mathematical rigor, without needing formalized mathematics.

As for formally verified software, that's nice of course, but the scientific world functions pretty well even though essentially no software is formally verified.

Some of your suggestions, such as (what I would call) "semantic search" for relevant mathematical theorems or definitions, are currently best handled by human experts. Even search engines and (gasp!) LLMs are going to do a better job than databases of formalized mathematics such as Lean's mathlib. Currently, it's sometimes a challenge even to find something in mathlib that one already strongly suspects is in there somewhere. That may change in the future; one day, maybe a database of formalized mathematics will form the basis of a semantic search engine par excellence, but again, that day has not yet arrived.

One other potential future benefit of Lean and mathlib is that the combination of machine learning with formalized mathematics has the potential to revolutionize mathematics. We are starting to get some glimmers of that potential today, but once again, the revolution has not yet happened, if it indeed it ever will.

Joseph Tooby-Smith (Dec 23 2025 at 10:19):

Does anybody have suggested reading on how the degree of formalization in Lean is useful to empirical science?

I've been collecting resources about the use of formalization in physics and adjacent fields in: https://github.com/HEPLean/ITPsInPhysicsArchive

There are also many discussions on this channel and on #PhysLean which discuss related topics.

Many of my personal thoughts are detailed in:
https://advanced.onlinelibrary.wiley.com/doi/10.1002/advs.202517294

However, this does not cover anything like biology, or medicine or related areas. The reason ITPs are useful in mathematics and (I would argue) physics is that lots of the important information is in the form of statements which can be rigorously mathematically formulated (in the case of physics this does not necessarily mean mathematically justified e.g. all the handwaving physicists do). If there is an area of e.g. Biology where the majority of the important information can be rigorously mathematically formulated then the use of ITPs in that area I believe will have benefits, even if just from an organization of information point of view. If not, then it seems to me that another class of languages would be more suited.

Timothy Chow (Dec 23 2025 at 13:11):

Joseph Tooby-Smith said:

I've been collecting resources about the use of formalization in physics and adjacent fields in: https://github.com/HEPLean/ITPsInPhysicsArchive

Interesting! One of the first things I looked for was special relativity. I see that there was some discussion of special relativity last year. It appears that the work of István Németi and his collaborators has been partially formalized in Isabelle and Coq/Rocq, but not (yet) in Lean. I remember being fascinated by their work when I first learned about it many years ago, because they were apparently the first to explicitly point out that "the laws of physics are scale-invariant" needs to be explicitly assumed, or else one can devise "a nonstandard special relativity" that has all the usual features of the standard theory except that (as some have put it) ants and elephants experience slightly different laws of physics.

This example illustrates that formalizing physics (with or without using an interactive theorem prover) can have the benefit of clarifying hidden assumptions. Now, whether such clarification is "useful to scientists" is a separate question. I don't think any physicists care about these kinds of logical nuances in the theory of special relativity. Németi and his collaborators have also tackled general relativity, which physicists recognize as an area of active research, but again, I suspect that physicists don't care too much about the formal correctness of the axiomatic foundations, compared to discovering new insights into solutions to Einstein's equations.

Joseph Tooby-Smith (Dec 23 2025 at 14:05):

I am a physicist and I care about formalization (or, as I like to call it digitalizing, to prevent confusion), I also know and work with many other physicists that also care. Though note that formalization is not the same as axiomization (which I personally care less about). They are two different things, for example there are many (high-level) results from special relativity as part of PhysLean, but not an axiomization. Same with areas like particle physics, etc.

I am slightly reluctant to rehash all the arguments here about why physicists should care about formalizing physics, because this is a conversation that has been discussed elsewhere on this Zulip, in open-meetings and in publications. I don't think it is fair to expect everyone to rewrite their arguments here regarding this.

Castedo Ellerman (Dec 23 2025 at 14:28):

Joseph Tooby-Smith said:

I am a physicist and I care, I also know and work with many other physicists that care about formalization from many different areas.

My original wording of "useful to scientists" is a poor choice of words, for many reasons. It's perhaps better to focus on utilitiy to XYZ questions instead of utility to XYZ people. As people, we can wear different hats, sometimes interested in empirical questions, sometimes non-empirical questions, and there is nothing wrong with non-empirical questions.

I am slightly reluctant to rehash arguments here, because this is a conversation that has been discussed elsewhere on this Zulip, in open-meetings and in publications. I don't think it is fair to expect everyone to rewrite their arguments here regarding this.

If anybody thinks it would be useful, I'm happy to write down and share notes online with pointers to publications and perhaps Zulip posts, and seek comments to update the notes.

I'm new to Lean and hoping to resolve some questions in my own mind, so I've only been able to read what is online (such as Joseph's publications and YouTube videos).

Joseph Tooby-Smith (Dec 23 2025 at 14:43):

I struggled with the wording of my last paragraph above, and maybe it is better rephrased as: There a lots of people in the science-lean community that have thoughts and opinions on the use of formalization in the sciences, and have expressed those thoughts publicly before now. They may not have the time to restate their thoughts in this particular conversation - especially because of the time of the year. Care should be taken not to just restate what has come before.

If this makes sense.

In any case, I think you are interested in subjects beyond physics, Castedo?

Joseph Tooby-Smith (Dec 23 2025 at 14:47):

If anybody thinks it would be useful, I'm happy to write down and share notes online with pointers to publications and perhaps Zulip posts, and seek comments to update the notes.

If these are things you think could be added to:

https://github.com/HEPLean/ITPsInPhysicsArchive

I would very much welcome PRs - I think broadening it to the natural sciences and adding important Zulip conversations would be worth while. The formatting of this repo is awful, so eventually that also needs some work.

Castedo Ellerman (Dec 23 2025 at 15:29):

Joseph Tooby-Smith said:

.... They may not have the time to restate their thoughts in this particular conversation - especially because of the time of the year. Care should be taken not to just restate what has come before.

If this makes sense.

More or less this makes sense to me. This kind of communication thread is very tricky. It probably would get rejected/shutdown on a StackExchange. And there are many reasons StackExchanges have policies to shutdown such high-level open-ended topics like the one I've started here.

In any case, I think you are interested in subjects beyond physics, Castedo?

Correct. I wish I knew more about physics because PhysLean sounds very exciting! Population genetics is the empirical science that I am studying. Population genetics is wonderfully mathematical (in the context of biology) and probably is similar to some parts of physics in terms of its relationship to math.

Separately, I find formal math and ITPs very appealing. I've mostly seen my interest in formal math and population genetics as independent. But seeing the progress that has happened in Lean this year now has me wondering.

Joseph Tooby-Smith (Dec 23 2025 at 15:43):

Do you know of examples of results from population genetics written into other functional programming languages like Haskell?

Joseph Tooby-Smith (Dec 23 2025 at 15:48):

IMO there are two separate questions, which is true for any area of study:

Are there stories which can be written into an ITP from that area of study. Which means, stories which can be expressed predominantly in terms of mathematics or functional programming definitions (which are really one-and-the-same, but the point of view is slightly different). For example, the derivation of a mathematical result from an interesting starting point, or the taking of a mathematical structure and asking questions about it.
If so, what are the benefits to the specific community in question about formalizing such results. These include many of the things already discussed above, and elsewhere.

Castedo Ellerman (Dec 23 2025 at 15:58):

Joseph Tooby-Smith said:

Do you know of examples of results from population genetics written into other functional programming languages like Haskell?

No, and I doubt there are any. However I think one can "layer" population genetics on top of increasingly mathematical layers of applied math/stats where resuts are written into Lean or similar. And some of these layers are shared with other empirical fields like parts of physics.

Apologies if I'm not making that much sense. I'm still trying to make sense of all this myself since I'm so new to Lean (and relatively new to population genetics).

Castedo Ellerman (Dec 23 2025 at 16:03):

I'm pondering ways to write down and share updatable answers to some of these questions I'm raising here and I'll share a link here. It'll take me a week or two ... or three :sweat_smile:

Castedo Ellerman (Dec 23 2025 at 16:12):

Timothy Chow said:

... a semantic search engine par excellence, but again, that day has not yet arrived.

A math semantic search engine par excellence seems to me a reasonable future prediction with utility to empirical science (via applied mathematicians researching math for use by non-mathematician scientists).

Now, as a mathematician, I do think that rigor is important and has benefits for science, but they are mostly indirect benefits; moreover, we can reap most if not all of those benefits from "normal" mathematical rigor, without needing formalized mathematics.

I'm realizing now there's a hypothetical class of benefits from fomalized math that depends on future computer applications. If these future computer application become real in the future, then these future benefits will become perceived as "needed".

Does a mathematician in 2025 living in a off-the-grid shack in the woods using only pencil and paper need to use computers and the Internet? I wager the answer to this question today will be similar to the answer in 2050 to the question "Do mathematicians and scientists need the existence of large systems of ITP-compatible formalized math?"

Tyler Josephson ⚛️ (Dec 26 2025 at 04:34):

As a chemical engineer, I see the applications including:

1) translating derivations in science and engineering into formal proofs, rooting out logic errors in our theories (https://pubs.rsc.org/en/content/articlehtml/2023/dd/d3dd00077j)

2) since science derivations are also mayh proofs, could this play a role in automated construction of new scientific theories? The formalization part would at least guarantee the conclusions follow from the premises, but that could only be part of a broader system that generates interesting and relevant and novel premises.

3) serving the foundation for formally-verified scientific computing (https://www.tandfonline.com/doi/full/10.1080/00268976.2025.2539421)

4) Imagine we have 1 and 3 and we link the resulting formally-verified software to things that interface with the real world (sensors, controllers, etc.). If we can model in Lean certain things about the real world, then we could also prove things about the behavior of the overall system, provided that our model is capturing the salient aspects of the real world.

Last updated: Feb 28 2026 at 14:05 UTC