Zulip Chat Archive

Stream: mathlib4

Topic: Citing the conceptual framework of Mathlib


Wrenna Robson (Nov 17 2023 at 19:04):

Hi all. I'm currently writing up my thesis and I'm currently in particular writing a chapter (which might become a paper) about my recent efforts recreating and extending Dan Bernstein's work formalising Classic McEliece's representation of permutations.

I want to talk about the mathlib concept of "API", which has a specific sort of meaning here as "the plumbing that is a necessary part of making something more than just a one-off proof that is hard to maintain", if you like. I've been contributing to mathlib for a while so I would say I've picked this up through folklore, but clearly it isn't my original concept. Has anyone written and published anything I can cite about mathlib's design methodology (especially more recent stuff, post-Lean 4 port)? I'm aware there was an initial paper on mathlib, but my recollection is that it doesn't necessarily get into that kind of detail. Essentially, I'm wondering how much of the understanding that people have as regular users of Lean and contributors to the library exists in a published and citable form?

Kyle Miller (Nov 17 2023 at 19:36):

(Not answering your question about what to cite, but I think it's worth discussing "API".)

I think API sort of means what it usually means, but it's much less formal in mathlib.

Just to recall, in programming there is a concept of a software library, which is a collection of code that documents some part of itself as being the official external interface. This interface is called the API, or "application programming interface." This is what programs using a library are supposed to use. While some programming languages have features to enforce the API boundary (to prevent you from relying on specifics of the internal implementation, which makes applications more robust to changes to the library), this is not a requirement in the definition.

In formalization, there is a similar concept. There is some low-level implementation of some type, then there is library code that builds up theory, and then those that use the object are supposed to use its theory and not rely on implementation details. This is not so different from in mathematics itself, where there is a culture of describing the axioms or universal properties of an object, proving there is such an object and that it's uniquely determined, and then never using any facts about that particular object ever again.

What makes "API" a bit muddy for mathlib is that (1) it's a library for itself mostly, and it's not clear where the boundaries are supposed to be, except through constant refinement and refactorings and (2) there's often not a documented API in a formal sense, and instead you're supposed to know some conventions (starting with "unfolding is bad" and knowing what is "defeq abuse").

I think it's (partly) wrong to use API to mean the plumbing lemmas that are not part of a theory's interface. I say partly because these plumbing lemmas are still building up an interface internal to the theory, so it's not really wrong, but still, I tend to find it to be an idiosyncratic use of "API."

These are just my thoughts from hanging around here for a while and seeing "API" thrown around loosely. (It's also a bit interesting that discussions about API -- even when someone claims that the API is such-and-such -- are still negotiations about the design of the APIs of mathlib's sub-theories.)

Wrenna Robson (Nov 17 2023 at 20:04):

Yes, I think the interesting thing is that we use the term in a way that is related to but adjacent to the standard technical meaning.

Rob Lewis (Nov 18 2023 at 00:21):

I do think there's a fascinating paper to be written about how software engineering concepts get adapted to math through proof assistants. Jeremy's paper on modularity and Johan and Adam's paper on abstraction boundaries are great, and I'd love to read (or find the time to process and write) something with a broader view.


Last updated: Dec 20 2023 at 11:08 UTC