Zulip Chat Archive

Stream: Natural sciences

Topic: should scilib and mathlib be separate libraries?


Bulhwi Cha (Jun 30 2023 at 01:13):

Should formalized theories of natural science and engineering be part of Mathlib? Or should they be put together into Scilib, which I hope will be what LeanChemicalTheories will grow to become?

What about formal philosophy, formal linguistics, and mathematical social sciences?

Jireh Loreaux (Jun 30 2023 at 01:22):

I would expect these to be separate. A mono-repo is only useful so long as the pieces are all interconnected. Largely disconnected projects should not be incorporated into mathlib.

Bulhwi Cha (Jun 30 2023 at 01:25):

Moritz Doll said:

I can comment a bit about Hamiltonian mechanics: the correct way of formalizing that is through symplectic geometry (in a sense symplectic geometry is Hamiltonian mechanics).

For example, symplectic geometry should be part of Mathlib. But does that mean classical mechanics, which includes Hamiltonian mechanics, should be part of Mathlib?

If you want to formalize some theories of natural or social sciences, I think you'd first have to mathematize them most of the time.

Jason Rute (Jun 30 2023 at 02:29):

@Tyler Josephson ⚛️ may have thoughts on this.

Jireh Loreaux (Jun 30 2023 at 03:04):

I would expect libraries like this to use mathlib as a dependency.

Tyler Josephson ⚛️ (Jun 30 2023 at 17:45):

SciLib will use mathlib as a dependency, but I don’t think the reverse makes sense

If scientists need a bit of mathematics that mathlib doesn’t have, they can prove such theorems and store them in SciLib. But mathlib won’t and shouldn’t depend on SciLib - if someone wants to add something to mathlib using a theorem in SciLib, they should first PR the theorem to mathlib.

I expect SciLib to have laxer standards than mathlib. For example, we’ll be happy to have theorems about real numbers that are also true of fields, without generalizing them to fields. We want scientists and engineers to be able to read and understand and add to SciLib, and that will often mean not making the theorems as general as possible (and potentially keeping multiple versions of theorems, too, like 1D mechanics and 3D mechanics, even though 3D generalizes 1D). Mathlib can have the theorems about fields.

Shreyas Srinivas (Jun 30 2023 at 18:15):

I like that argument about being able to read and understand theorems in scilib, and not going for the most general abstraction in which a theorem holds. I believe a similar approach is needed for whatever comes up for the algorithms and complexity community.

Shreyas Srinivas (Jun 30 2023 at 18:26):

For example, most algorithms people know/remember very little measure theory. We typically work with probability theory without any reference to it. EDIT: To give one more example, many graph theoretic proofs are easier with finite graphs. Track A folks don't care about infinite graphs. All graphs are assumed to be finite by default.

Shreyas Srinivas (Jun 30 2023 at 18:31):

As Feynman explains: https://youtu.be/obCjODeoLVw

Bolton Bailey (Jul 03 2023 at 17:11):

I've always found it interesting that we have docs3#tsirelson_inequality, not complaining about it but it seems more like physics than math to me.

Tyler Josephson ⚛️ (Jul 03 2023 at 22:06):

Oh that’s really neat!

Yury G. Kudryashov (Jul 03 2023 at 22:23):

IMHO, theorems from other sciences that can be formulated mathematically without introducing lots of domain specific definitions can go to mathlib.

Shreyas Srinivas (Jul 03 2023 at 23:51):

Yury G. Kudryashov said:

IMHO, theorems from other sciences that can be formulated mathematically without introducing lots of domain specific definitions can go to mathlib.

Would it be okay to add multiple versions of a theorem to mathlib? perhaps even with different proofs? My current impression based on reading the zulip chatter for half a year is that mathlib is meant to have theorems in their most general and abstract form. This also came up in the discussion on probability bounds.

If the answer to both is no, then I suspect that pursuing a separate Scilib (and even a separate AlgoLib) makes sense. The primary problem with trying to fit everything into mathlib is that in the sciences (at least this is true in CS and physics), we have a somewhat different culture w.r.t how we write and prove theorems. We have many proofs of the same theorem because different proofs yield insights and show a new approach to a problem. It is not enough to know that a theorem is true. How a theorem is true can yield useful insights, possibly even algorithms. Further as Tyler pointed out, it is important to have similar (or even the same) theorems in several different forms. Then if required we can show equivalences between the various statements. If mathlib finds them interesting then the mathlib admins might have to consider how to incorporate such theorems. However if everyone is adding to mathlib, there might be all sorts of incompatibilities that arise from culture clashes. For example a computer scientist might want to get on with a graph theorem for finite simple graphs, while mathematicians might insist on proving it in the most general case for infinite graphs or multigraphs and instantiating the vertex type with a fintype instance.

Scott Morrison (Jul 04 2023 at 00:25):

We have many proofs of the same theorem because different proofs yield insights and show a new approach to a problem. It is not enough to know that a theorem is true. How a theorem is true can yield useful insights, possibly even algorithms.

Saying that this is not true "in mathematics" --- as I read in your phrasing "somewhat different culture" --- seems to me completely wrong!

Yury G. Kudryashov (Jul 04 2023 at 00:32):

Any proof of a big theorem involves lots of small useful lemmas. Also, different proofs often prove slightly different statements (if you generalize as much as possible at every step).

Scott Morrison (Jul 04 2023 at 00:32):

I think I'd need a more concrete version of the question "would it be okay to add multiple versions of a theorem to mathlib" before really being able to answer it.

It's a bit of a straw man to say "mathlib only accepts theorems in their most general and abstract form" --- no theorem in the history of mathematics has even been in the most general or most abstract form. However mathlib has decided to avoid "needless" specialisation. (Here by "needless" I mean --- if the same theorem is true in a more general setting, with no more work, we should certainly generalise, even if that makes the statement less "accessible" to someone with less mathematical background: c.f. not having a vector_space class because module over a field is all you need.) Similarly, we do mildly discourage doing special cases of more general theories, at least when there is a strong prospect that the more general theory can and will be done in the short term.

Yury G. Kudryashov (Jul 04 2023 at 00:33):

E.g., we can prove the theorem 2fxy=2fyx\frac{\partial^2f}{\partial x\partial y}=\frac{\partial^2f}{\partial y\partial x} with different assumptions.

Scott Morrison (Jul 04 2023 at 00:34):

Yes, and clearly for an example like that mathlib wants to have a collection of theorems, possibly with overlapping sets of hypotheses, sufficient to cover essentially every occasion one would want to conclude that partial derivatives commute.

Scott Morrison (Jul 04 2023 at 00:35):

There won't just be a single theorem that suffices to cover every occasion. But we would prefer fewer rather than more!

Mario Carneiro (Jul 04 2023 at 00:35):

including potentially more "specialized" hypotheses if the more general form is difficult to apply in practice

Scott Morrison (Jul 04 2023 at 00:37):

And "more difficult to apply in practice" can have a pretty low threshold --- of course it's lovely when generalising a theorem only involved generalising a type class, and none of the call sites need to change, but if generalising adds an interesting side condition that was trivial in the special case, of course we need to leave the special case in place (but can replace its proof with appealing to the general case + discharging the trivial side condition).

Scott Morrison (Jul 04 2023 at 00:38):

But if the only argument for keeping the specialised proof is "it is easier to understand" (alternatively, sometimes a specialised proof actually proves more in the specialised case --- but then that should be extracted into additional lemmas that only apply in the specialised case, thereby justifying keeping the specialised proof), then I think that is probably outside the remit of mathlib.

Yury G. Kudryashov (Jul 04 2023 at 01:13):

As an example of "specialize or not": we have docs#fderiv for Fréchet derivatives of functions between normed spaces (there is a TODO about generalizing to topological vector spaces) but we specialize it as docs#deriv for functions from the base field to a vector space. However, we do not:

  • introduce a separate definition for real numbers;
  • introduce a separate definition for functions from the base field to itself

though we do specialize "chain rule" lemmas to different cases of scalars here or there.

Yury G. Kudryashov (Jul 04 2023 at 01:14):

Another example: we prove the inverse function theorem in higher dimension with very general assumptions, then specialize to C1C^1 over R\mathbb R or C\mathbb C.

Shreyas Srinivas (Jul 04 2023 at 08:06):

I mean so w.r.t perceived mathlib culture, not math culture. Feynman does make that comparison in that video, but I interpreted that as an explanation of Tyler's point about having multiple specialized versions of a theorem, instead of one that is in great generality, but cannot be understood by the intended users. In addition to the examples I pointed out already, I remember when I was told that we have the geometric version of some functional analysis theorem (hyperplane separation theorem?), which is equivalent to Farkas' lemma. For many in my field, they would prefer the matrix version with the corresponding proof, for the same reasons as Tyler suggests.

Shreyas Srinivas (Jul 04 2023 at 08:20):

To give a concrete example, would mathlib be happy to take CS versions of concentration inequalities? I recall someone was working on them and wanted to add them with the proper generality.

Rémy Degenne (Jul 04 2023 at 08:22):

What do you mean by a CS version of a concentration inequality? Can you give an example?

Shreyas Srinivas (Jul 04 2023 at 08:24):

Rémy Degenne said:

What do you mean by a CS version of a concentration inequality? Can you give an example?

Chebyshev's, Chernoff (all variants), Azuma, Hoeffding etc.

Shreyas Srinivas (Jul 04 2023 at 08:28):

All in discrete probability.

Rémy Degenne (Jul 04 2023 at 08:28):

These are concentration inequalities indeed, but what I don't see is how a "CS version" would differ from a "math version".
My main field of work is designing and analysing sequential learning algorithms and I use these inequalities all the time, but to me they don't come in CS/math variants.

Rémy Degenne (Jul 04 2023 at 08:29):

Shreyas Srinivas said:

All in discrete probability.

Oh I think I see your point now. Thanks for the clarification.

Shreyas Srinivas (Jul 04 2023 at 08:31):

I can see that ML researchers have a higher threshold for math abstraction, but even there, the abstractions are simplified. For example, when working with norms, many ML researchers tend to default to l1 or l2 norms

Yaël Dillies (Jul 04 2023 at 10:00):

You might be interested in seeing my summer project, Shreyas. In the prerequisites I have discrete Lp norms, discrete convolution, the Marcinkiewicz-Zygmund inequality (quantitative, for deterministic functions), and I believe all those belong in mathlib.

Notification Bot (Jul 05 2023 at 06:31):

20 messages were moved from this topic to #general > LeanAPAP by Johan Commelin.

Tyler Josephson ⚛️ (Jul 08 2023 at 13:51):

@Patrick Massot Shall we move this topic to the #Natural sciences stream, too?


Last updated: Dec 20 2023 at 11:08 UTC