Zulip Chat Archive

Stream: general

Topic: Refining definitions for formalization


Alena Gusakov (Mar 09 2025 at 22:07):

Hi all,

I'm writing a survey paper exploring how definitions get refined and selected for formalizations. Choosing the right definition is one of those hurdles that causes Lean to have a fairly steep learning curve for new users, so I'd like to see what I can do to distill observations about this process and do a few case studies (e.g. how the algebra library definitions are approached vs the graph theory library). I've seen a few papers that talk about this, and I'm looking through the Zulip for examples of this exact sort of conversation, but does anyone have any insights or specific personal experiences they'd like to talk about when it comes to this? Doesn't necessarily have to be formalizations included in mathlib, but mathlib is what I'll probably focus on more because of familiarity. Thanks!

Eric Wieser (Mar 09 2025 at 23:28):

Did you see @Oliver Nash's Lean Together talk on this topic? That had a nice example for root systems

Alena Gusakov (Mar 10 2025 at 01:47):

Ooh no, don't think I have. This looks like an excellent source, thank you!!

Jireh Loreaux (Mar 10 2025 at 17:05):

My recent paper with Anatole Dedecker focuses on the process of refining a particular definition in much the same way as Oliver's talk. It can be found at arXiv: 2501.15639

Ruben Van de Velde (Mar 10 2025 at 17:56):

Jireh Loreaux said:

My recent paper with Anatole Dedecker focuses on the process of refining a particular definition in much the same way as Oliver's talk. It can be found at arXiv: 2501.15639

Ah, I might finally figure out what the CFC is about. (Do you take typo reports?)

Chris Henson (Mar 10 2025 at 18:35):

There are some good examples of this in category theory. The first that comes to mind is how sometimes "redundant" variants of the typeclass fields in the definition of a category can make it easier to get definitionally equality when working with dual categories. Likewise it tends to be helpful to not try to define constructions by their dual. Good references for these are Experience Implementing a Performant Category-Theory Library in Coq and Formalizing category theory in Agda. There is also some discussion in these of typeclass bundling decisions and choosing definitions to avoid certain universe issues.

Jireh Loreaux (Mar 10 2025 at 22:08):

Ruben Van de Velde said:

Ah, I might finally figure out what the CFC is about. (Do you take typo reports?)

Sure. Probably won't update immediately, but happy to receive them via DM, or a new thread.

Winston Yin (尹維晨) (Mar 10 2025 at 22:28):

I find Lean formalisation tricky because it actually involves two steps:

  1. expressing the structure / statement in dependent type theory
  2. finding the best implementation of the type theoretic statement using the particular language features of Lean

Perhaps point 1 is what's been keeping a lot of differential geometry from being formalised, since you have to deal with subsets a lot. Jumping directly to 2 naively will work initially but it quickly pollutes downstream proofs.

There was the famous problem of nested subalgebras in early mathlib that was solved with IsScalarTower. That seems more like point 1 than point 2. Analogous solutions for other mathematical structures likely exist but have not been found.

Alena Gusakov (Mar 11 2025 at 04:27):

Winston Yin (尹維晨) said:

I find Lean formalisation tricky because it actually involves two steps:

  1. expressing the structure / statement in dependent type theory
  2. finding the best implementation of the type theoretic statement using the particular language features of Lean

Perhaps point 1 is what's been keeping a lot of differential geometry from being formalised, since you have to deal with subsets a lot. Jumping directly to 2 naively will work initially but it quickly pollutes downstream proofs.

There was the famous problem of nested subalgebras in early mathlib that was solved with IsScalarTower. That seems more like point 1 than point 2. Analogous solutions for other mathematical structures likely exist but have not been found.

Yeah, that's roughly what I've observed as well - there are usually some number of mathematically equivalent definitions to choose from (and it takes some trial and error to figure out which one to formalize, and the best way to express them in type theory), and then the implementation in Lean specifically usually takes a couple attempts to work out as well. The hope is that compiling some case studies and examining common pitfalls in early definition attempts can help define a kind of rule set for definitions, or at least things to keep in mind.

Chris Henson (Mar 11 2025 at 19:39):

@Alena Gusakov I would be curious to know if you find anywhere in your survey where someone has come up with a word/phase to describe this process. I've taken to using the phrase "user proof interface" myself, for this idea of the back and forth between mathematically equivalent definitions and the experience of actually formalizing them as a practical matter.

Alena Gusakov (Mar 11 2025 at 20:24):

@Chris Henson I don't think I've really seen anyone use a specific term for it yet, I've just been thinking of it as refining or weeding out definitions.

Alena Gusakov (May 01 2025 at 19:51):

Hey everyone, I wanted to ask a follow-up question. Does anyone have examples of weird or hacky workarounds for definitions that end up being the only thing that work in Lean's type theory for that definition? Things along the lines of @Peter Nelson 's thread about formalizing matroids #mathlib4 > The design of matroids. I'm writing a section on how occasionally, trying to avoid type theory problems can restrict one's ability to properly express mathematical concepts, and I want to write up some case studies on those kinds of problems and what some of the community-approved solutions can look like.

I assume there are probably multiple examples like this but I haven't been super active here lately so I don't know where to look. Thanks!

Patrick Massot (May 01 2025 at 20:07):

Hopefully this is extremely rare. The ordinary situation is that Lean forces to clean up definitions and the Lean definitions end up being nicer than the paper ones.

Alena Gusakov (May 01 2025 at 21:22):

Patrick Massot said:

Hopefully this is extremely rare. The ordinary situation is that Lean forces to clean up definitions and the Lean definitions end up being nicer than the paper ones.

I'll try to stress that it's rare in the survey then - most of my paper so far talks about how people typically clean up definitions and what things to keep in mind, and this section where I talk about the matroid theory formalization difficulties is at the end.


Last updated: May 02 2025 at 03:31 UTC