mathlib3 documentation

notes

Lean mathlib notes

Various implementation details are noted in the mathlib source, and referenced later on. We collect these notes here.

Design choices about smooth algebraic structures

  1. All smooth algebraic structures on G are Prop-valued classes that extend smooth_manifold_with_corners I G. This way we save users from adding both [smooth_manifold_with_corners I G] and [has_smooth_mul I G] to the assumptions. While many API lemmas hold true without the smooth_manifold_with_corners I G assumption, we're not aware of a mathematically interesting monoid on a topological manifold such that (a) the space is not a smooth_manifold_with_corners; (b) the multiplication is smooth at (a, b) in the charts ext_chart_at I a, ext_chart_at I b, ext_chart_at I (a * b).

  2. Because of model_prod we can't assume, e.g., that a lie_group is modelled on 𝓘(𝕜, E). So, we formulate the definitions and lemmas for any model.

  3. While smoothness of an operation implies its continuity, lemmas like has_continuous_mul_of_smooth can't be instances becausen otherwise Lean would have to search for has_smooth_mul I G with unknown 𝕜, E, H, and I : model_with_corners 𝕜 E H. If users needs [has_continuous_mul G] in a proof about a smooth monoid, then they need to either add [has_continuous_mul G] as an assumption (worse) or use haveI in the proof (better).

referenced by: [1] [2] [3] [4] [5]

IMO geometry formalization conventions

We apply the following conventions for formalizing IMO geometry problems. A problem is assumed to take place in the plane unless that is clearly not intended, so it is not required to prove that the points are coplanar (whether or not that in fact follows from the other conditions). Angles in problem statements are taken to be unoriented. A reference to an angle ∠XYZ is taken to imply that X and Z are not equal to Y, since choices of junk values play no role in informal mathematics, and those implications are included as hypotheses for the problem whether or not they follow from the other hypotheses. Similar, a reference to XY as a line is taken to imply that X does not equal Y and that is included as a hypothesis, and a reference to XY being parallel to something is considered a reference to it as a line. However, such an implicit hypothesis about two points being different is included only once for any given two points (even if it follows from more than one reference to a line or an angle), if X ≠ Y is included then Y ≠ X is not included separately, and such hypotheses are not included in the case where there is also a reference in the problem to a triangle including those two points, or to strict betweenness of three points including those two. If betweenness is stated, it is taken to be strict betweenness. However, segments and sides are taken to include their endpoints (unless this makes a problem false), although those degenerate cases might not necessarily have been considered when the problem was formulated and contestants might not have been expected to deal with them. A reference to a point being on a side or a segment is expressed directly with wbtw rather than more literally with affine_segment.

Manifold type tags

For technical reasons we introduce two type tags:

  • model_prod H H' is the same as H × H';
  • model_pi H is the same as Π i, H i, where H : Κ → Type* and Κ is a finite type.

In both cases the reason is the same, so we explain it only in the case of the product. A charted space M with model H is a set of local charts from M to H covering the space. Every space is registered as a charted space over itself, using the only chart id, in manifold_model_space. You can also define a product of charted space M and M' (with model space H × H') by taking the products of the charts. Now, on H × H', there are two charted space structures with model space H × H' itself, the one coming from manifold_model_space, and the one coming from the product of the two manifold_model_space on each component. They are equal, but not defeq (because the product of id and id is not defeq to id), which is bad as we know. This expedient of renaming H × H' solves this problem.

referenced by: [1] [2] [3] [4]

addition on function coercions

Terms containing @has_add.add (has_coe_to_fun.F ...) pi.has_add seem to cause leanchecker to crash due to an out-of-memory condition. As a workaround, we add a type annotation: (f + g : V₁ → V₂)

bundled maps over different rings

Frequently, we find ourselves wanting to express a bilinear map M →ₗ[R] N →ₗ[R] P or an equivalence between maps (M →ₗ[R] N) ≃ₗ[R] (M' →ₗ[R] N') where the maps have an associated ring R. Unfortunately, using definitions like these requires that R satisfy comm_semiring R, and not just semiring R. Using M →ₗ[R] N →+ P and (M →ₗ[R] N) ≃+ (M' →ₗ[R] N') avoids this problem, but throws away structure that is useful for when we do have a commutative (semi)ring.

To avoid making this compromise, we instead state these definitions as M →ₗ[R] N →ₗ[S] P or (M →ₗ[R] N) ≃ₗ[S] (M' →ₗ[R] N') and require smul_comm_class S R on the appropriate modules. When the caller has comm_semiring R, they can set S = R and smul_comm_class_self will populate the instance. If the caller only has semiring R they can still set either R = ℕ or S = ℕ, and add_comm_monoid.nat_smul_comm_class or add_comm_monoid.nat_smul_comm_class' will populate the typeclass, which is still sufficient to recover a ≃+ or →+ structure.

An example of where this is used is linear_map.prod_equiv.

referenced by: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

category_theory universes

The typeclass category C describes morphisms associated to objects of type C : Type u.

The universe levels of the objects and morphisms are independent, and will often need to be specified explicitly, as category.{v} C.

Typically any concrete example will either be a small_category, where v = u, which can be introduced as

universes u
variables {C : Type u} [small_category C]

or a large_category, where u = v+1, which can be introduced as

universes u
variables {C : Type (u+1)} [large_category C]

In order for the library to handle these cases uniformly, we generally work with the unconstrained category.{v u}, for which objects live in Type u and morphisms live in Type v.

Because the universe parameter u for the objects can be inferred from C when we write category C, while the universe parameter v for the morphisms can not be automatically inferred, through the category theory library we introduce universe parameters with morphism levels listed first, as in

universes v u

or

universes v₁ v₂ u₁ u₂

when multiple independent universes are needed.

This has the effect that we can simply write category.{v} C (that is, only specifying a single parameter) while u will be inferred.

Often, however, it's not even necessary to include the .{v}. (Although it was in earlier versions of Lean.) If it is omitted a "free" universe will be used.

change elaboration strategy with `by apply`

Some definitions may be extremely slow to elaborate, when the target type to be constructed is complicated and when the type of the term given in the definition is also complicated and does not obviously match the target type. In this case, instead of just giving the term, prefixing it with by apply may speed up things considerably as the types are not elaborated in the same order.

coercion into rings

Coercions such as nat.cast_coe that go from a concrete structure such as ℕ to an arbitrary ring R should be set up as follows:

@[priority 900] instance : has_coe_t ℕ R := ⟨...⟩

It needs to be has_coe_t instead of has_coe because otherwise type-class inference would loop when constructing the transitive coercion ℕ → ℕ → ℕ → .... The reduced priority is necessary so that it doesn't conflict with instances such as has_coe_t R (option R).

For this to work, we reduce the priority of the coe_base and coe_trans instances because we want the instances for has_coe_t to be tried in the following order:

  1. has_coe_t instances declared in mathlib (such as has_coe_t R (with_top R), etc.)
  2. coe_base, which contains instances such as has_coe (fin n) n
  3. nat.cast_coe : has_coe_t ℕ R etc.
  4. coe_trans

If coe_trans is tried first, then nat.cast_coe doesn't get a chance to apply.

referenced by: [1]

comp_of_eq lemmas

Lean's elaborator has trouble elaborating applications of lemmas that state that the composition of two functions satisfy some property at a point, like continuous_at.comp / cont_diff_at.comp and cont_mdiff_within_at.comp. The reason is that a lemma like this looks like continuous_at g (f x) → continuous_at f x → continuous_at (g ∘ f) x. Since Lean's elaborator elaborates the arguments from left-to-right, when you write hg.comp hf, the elaborator will try to figure out both f and g from the type of hg. It tries to figure out f just from the point where g is continuous. For example, if hg : continuous_at g (a, x) then the elaborator will assign f to the function prod.mk a, since in that case f x = (a, x). This is undesirable in most cases where f is not a variable. There are some ways to work around this, for example by giving f explicitly, or to force Lean to elaborate hf before elaborating hg, but this is annoying. Another better solution is to reformulate composition lemmas to have the following shape continuous_at g y → continuous_at f x → f x = y → continuous_at (g ∘ f) x. This is even useful if the proof of f x = y is rfl. The reason that this works better is because the type of hg doesn't mention f. Only after elaborating the two continuous_at arguments, Lean will try to unify f x with y, which is often easy after having chosen the correct functions for f and g. Here is an example that shows the difference:

example {x₀ : α} (f : α → α → β) (hf : continuous_at (function.uncurry f) (x₀, x₀)) :
  continuous_at (λ x => f x x) x₀ :=
-- hf.comp x (continuous_at_id.prod continuous_at_id) -- type mismatch
-- hf.comp_of_eq (continuous_at_id.prod continuous_at_id) rfl -- works
referenced by: [1] [2] [3]

continuity lemma statement

The library contains many lemmas stating that functions/operations are continuous. There are many ways to formulate the continuity of operations. Some are more convenient than others. Note: for the most part this note also applies to other properties (measurable, differentiable, continuous_on, ...).

The traditional way #

As an example, let's look at addition (+) : M → M → M. We can state that this is continuous in different definitionally equal ways (omitting some typing information)

However, lemmas with this conclusion are not nice to use in practice because

  1. They confuse the elaborator. The following two examples fail, because of limitations in the elaboration process.
variables {M : Type*} [has_add M] [topological_space M] [has_continuous_add M]
example : continuous (Îť x : M, x + x) :=
continuous_add.comp _

example : continuous (Îť x : M, x + x) :=
continuous_add.comp (continuous_id.prod_mk continuous_id)

The second is a valid proof, which is accepted if you write it as continuous_add.comp (continuous_id.prod_mk continuous_id : _)

  1. If the operation has more than 2 arguments, they are impractical to use, because in your application the arguments in the domain might be in a different order or associated differently.

The convenient way #

A much more convenient way to write continuity lemmas is like continuous.add:

continuous.add {f g : X → M} (hf : continuous f) (hg : continuous g) : continuous (λ x, f x + g x)

The conclusion can be continuous (f + g), which is definitionally equal. This has the following advantages

  • It supports projection notation, so is shorter to write.
  • continuous.add _ _ is recognized correctly by the elaborator and gives useful new goals.
  • It works generally, since the domain is a variable.

As an example for an unary operation, we have continuous.neg.

continuous.neg {f : α → G} (hf : continuous f) : continuous (λ x, -f x)

For unary functions, the elaborator is not confused when applying the traditional lemma (like continuous_neg), but it's still convenient to have the short version available (compare hf.neg.neg.neg with continuous_neg.comp $ continuous_neg.comp $ continuous_neg.comp hf).

As a harder example, consider an operation of the following type:

def strans {x : F} (γ γ' : path x x) (t₀ : I) : path x x

The precise definition is not important, only its type. The correct continuity principle for this operation is something like this:

{f : X → F} {γ γ' : ∀ x, path (f x) (f x)} {t₀ s : X → I}
  (hγ : continuous ↿γ) (hγ' : continuous ↿γ')
  (ht : continuous t₀) (hs : continuous s) :
  continuous (Îť x, strans (Îł x) (Îł' x) (t x) (s x))

Note that all arguments of strans are indexed over X, even the basepoint x, and the last argument s that arises since path x x has a coercion to I → F. The paths γ and γ' (which are unary functions from I) become binary functions in the continuity lemma.

Summary #

  • Make sure that your continuity lemmas are stated in the most general way, and in a convenient form. That means that:
    • The conclusion has a variable X as domain (not something like Y × Z);
    • Wherever possible, all point arguments c : Y are replaced by functions c : X → Y;
    • All n-ary function arguments are replaced by n+1-ary functions (f : Y → Z becomes f : X → Y → Z);
    • All (relevant) arguments have continuity assumptions, and perhaps there are additional assumptions needed to make the operation continuous;
    • The function in the conclusion is fully applied.
  • These remarks are mostly about the format of the conclusion of a continuity lemma. In assumptions it's fine to state that a function with more than 1 argument is continuous using ↿ or function.uncurry.

Functions with discontinuities #

In some cases, you want to work with discontinuous functions, and in certain expressions they are still continuous. For example, consider the fractional part of a number, fract : ℝ → ℝ. In this case, you want to add conditions to when a function involving fract is continuous, so you get something like this: (assumption hf could be weakened, but the important thing is the shape of the conclusion)

lemma continuous_on.comp_fract {X Y : Type*} [topological_space X] [topological_space Y]
  {f : X → ℝ → Y} {g : X → ℝ} (hf : continuous ↿f) (hg : continuous g) (h : ∀ s, f s 0 = f s 1) :
  continuous (Îť x, f x (fract (g x)))

With continuous_at you can be even more precise about what to prove in case of discontinuities, see e.g. continuous_at.comp_div_cases.

referenced by: [1] [2] [3]

custom simps projection

You can specify custom projections for the @[simps] attribute. To do this for the projection my_structure.original_projection by adding a declaration my_structure.simps.my_projection that is definitionally equal to my_structure.original_projection but has the projection in the desired (simp-normal) form. Then you can call

initialize_simps_projections (original_projection → my_projection, ...)

to register this projection. See initialize_simps_projections_cmd for more information.

You can also specify custom projections that are definitionally equal to a composite of multiple projections. This is often desirable when extending structures (without old_structure_cmd).

has_coe_to_fun and notation class (like has_mul) instances will be automatically used, if they are definitionally equal to a projection of the structure (but not when they are equal to the composite of multiple projections).

referenced by: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66]

decidable arguments

As mathlib is primarily classical, if the type signature of a def or lemma does not require any decidable instances to state, it is preferable not to introduce any decidable instances that are needed in the proof as arguments, but rather to use the classical tactic as needed.

In the other direction, when decidable instances do appear in the type signature, it is better to use explicitly introduced ones rather than allowing Lean to automatically infer classical ones, as these may cause instance mismatch errors later.

decidable namespace

In most of mathlib, we use the law of excluded middle (LEM) and the axiom of choice (AC) freely. The decidable namespace contains versions of lemmas from the root namespace that explicitly attempt to avoid the axiom of choice, usually by adding decidability assumptions on the inputs.

You can check if a lemma uses the axiom of choice by using #print axioms foo and seeing if classical.choice appears in the list.

dsimp, simp

Many proofs in the category theory library use the dsimp, simp pattern, which typically isn't necessary elsewhere.

One would usually hope that the same effect could be achieved simply with simp.

The essential issue is that composition of morphisms involves dependent types. When you have a chain of morphisms being composed, say f : X ⟶ Y and g : Y ⟶ Z, then simp can operate succesfully on the morphisms (e.g. if f is the identity it can strip that off).

However if we have an equality of objects, say Y = Y', then simp can't operate because it would break the typing of the composition operations. We rarely have interesting equalities of objects (because that would be "evil" --- anything interesting should be expressed as an isomorphism and tracked explicitly), except of course that we have plenty of definitional equalities of objects.

dsimp can apply these safely, even inside a composition.

After dsimp has cleared up the object level, simp can resume work on the morphism level --- but without the dsimp step, because simp looks at expressions syntactically, the relevant lemmas might not fire.

There's no bound on how many times you potentially could have to switch back and forth, if the simp introduced new objects we again need to dsimp. In practice this does occur, but only rarely, because simp tends to shorten chains of compositions (i.e. not introduce new objects at all).

referenced by: [1]

fact non-instances

In most cases, we should not have global instances of fact; typeclass search only reads the head symbol and then tries any instances, which means that adding any such instance will cause slowdowns everywhere. We instead make them as lemmata and make them local instances as required.

referenced by: [1]

forgetful inheritance

Suppose that one can put two mathematical structures on a type, a rich one R and a poor one P, and that one can deduce the poor structure from the rich structure through a map F (called a forgetful functor) (think R = metric_space and P = topological_space). A possible implementation would be to have a type class rich containing a field R, a type class poor containing a field P, and an instance from rich to poor. However, this creates diamond problems, and a better approach is to let rich extend poor and have a field saying that F R = P.

To illustrate this, consider the pair metric_space / topological_space. Consider the topology on a product of two metric spaces. With the first approach, it could be obtained by going first from each metric space to its topology, and then taking the product topology. But it could also be obtained by considering the product metric space (with its sup distance) and then the topology coming from this distance. These would be the same topology, but not definitionally, which means that from the point of view of Lean's kernel, there would be two different topological_space instances on the product. This is not compatible with the way instances are designed and used: there should be at most one instance of a kind on each type. This approach has created an instance diamond that does not commute definitionally.

The second approach solves this issue. Now, a metric space contains both a distance, a topology, and a proof that the topology coincides with the one coming from the distance. When one defines the product of two metric spaces, one uses the sup distance and the product topology, and one has to give the proof that the sup distance induces the product topology. Following both sides of the instance diamond then gives rise (definitionally) to the product topology on the product space.

Another approach would be to have the rich type class take the poor type class as an instance parameter. It would solve the diamond problem, but it would lead to a blow up of the number of type classes one would need to declare to work with complicated classes, say a real inner product space, and would create exponential complexity when working with products of such complicated spaces, that are avoided by bundling things carefully as above.

Note that this description of this specific case of the product of metric spaces is oversimplified compared to mathlib, as there is an intermediate typeclass between metric_space and topological_space called uniform_space. The above scheme is used at both levels, embedding a topology in the uniform space structure, and a uniform structure in the metric space structure.

Note also that, when P is a proposition, there is no such issue as any two proofs of P are definitionally equivalent in Lean.

To avoid boilerplate, there are some designs that can automatically fill the poor fields when creating a rich structure if one doesn't want to do something special about them. For instance, in the definition of metric spaces, default tactics fill the uniform space fields if they are not given explicitly. One can also have a helper function creating the rich structure from a structure with fewer fields, where the helper function fills the remaining fields. See for instance uniform_space.of_core or real_inner_product.of_core.

For more details on this question, called the forgetful inheritance pattern, see Competing inheritance paths in dependent type theory: a case study in functional analysis.

referenced by: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]

function coercion

Many structures such as bundled morphisms coerce to functions so that you can transparently apply them to arguments. For example, if e : α ≃ β and a : α then you can write e a and this is elaborated as ⇑e a. This type of coercion is implemented using the has_coe_to_fun type class. There is one important consideration:

If a type coerces to another type which in turn coerces to a function, then it must implement has_coe_to_fun directly:

structure sparkling_equiv (α β) extends α ≃ β

-- if we add a `has_coe` instance,
instance {α β} : has_coe (sparkling_equiv α β) (α ≃ β) :=
⟨sparkling_equiv.to_equiv⟩

-- then a `has_coe_to_fun` instance **must** be added as well:
instance {ι β} : has_coe_to_fun (sparkling_equiv ι β) :=
⟨λ _, α → β, λ f, f.to_equiv.to_fun⟩

(Rationale: if we do not declare the direct coercion, then ⇑e a is not in simp-normal form. The lemma coe_fn_coe_base will unfold it to ⇑↑e a. This often causes loops in the simplifier.)

referenced by: [1] [2] [3] [4] [5] [6]

implicit instance arguments

There are places where typeclass arguments are specified with implicit {} brackets instead of the usual [] brackets. This is done when the instances can be inferred because they are implicit arguments to the type of one of the other arguments. When they can be inferred from these other arguments, it is faster to use this method than to use type class inference.

For example, when writing lemmas about (f : α →+* β), it is faster to specify the fact that α and β are semirings as {rα : semiring α} {rβ : semiring β} rather than the usual [semiring α] [semiring β].

referenced by: [1] [2] [3] [4] [5]

is_R_or_C instance

This instance generates a type-class problem with a metavariable ?m that should satisfy is_R_or_C ?m. Since this can only be satisfied by ℝ or ℂ, this does not cause problems.

likely generated binder names

In surface Lean, we can write anonymous Π binders (i.e. binders where the argument is not named) using the function arrow notation:

inductive test : Type
| intro : unit → test

After elaboration, however, every binder must have a name, so Lean generates one. In the example, the binder in the type of intro is anonymous, so Lean gives it the name áž°:

test.intro : ∀ (ᾰ : unit), test

When there are multiple anonymous binders, they are named áž°_1, áž°_2 etc.

Thus, when we want to know whether the user named a binder, we can check whether the name follows this scheme. Note, however, that this is not reliable. When the user writes (for whatever reason)

inductive test : Type
| intro : ∀ (ᾰ : unit), test

we cannot tell that the binder was, in fact, named.

The function name.is_likely_generated_binder_name checks if a name is of the form áž°, áž°_1, etc.

referenced by: [1] [2]

lower instance priority

Certain instances always apply during type-class resolution. For example, the instance add_comm_group.to_add_group {α} [add_comm_group α] : add_group α applies to all type-class resolution problems of the form add_group _, and type-class inference will then do an exhaustive search to find a commutative group. These instances take a long time to fail. Other instances will only apply if the goal has a certain shape. For example int.add_group : add_group ℤ or add_group.prod {α β} [add_group α] [add_group β] : add_group (α × β). Usually these instances will fail quickly, and when they apply, they are almost always the desired instance. For this reason, we want the instances of the second type (that only apply in specific cases) to always have higher priority than the instances of the first type (that always apply). See also #1561.

Therefore, if we create an instance that always applies, we set the priority of these instances to 100 (or something similar, which is below the default value of 1000).

referenced by: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

nolint_ge

Currently, the linter forbids the use of > and ≥ in definitions and statements, as they cause problems in rewrites. They are still allowed in statements such as bounded (≥) or ∀ ε > 0 or ⨆ n ≥ m, and the linter allows that. If you write a pattern where you bind two or more variables, like ∃ n m > 0, the linter will flag this as illegal, but it is also allowed. In this case, add the line

@[nolint ge_or_gt] -- see Note [nolint_ge]
referenced by: [1] [2] [3]

nonarchimedean non instances

The non archimedean subgroup basis lemmas cannot be instances because some instances (such as measure_theory.ae_eq_fun.add_monoid or topological_add_group.to_has_continuous_add) cause the search for @topological_add_group β ?m1 ?m2, i.e. a search for a topological group where the topology/group structure are unknown.

open expressions

Some declarations work with open expressions, i.e. an expr that has free variables. Terms will free variables are not well-typed, and one should not use them in tactics like infer_type or unify. You can still do syntactic analysis/manipulation on them. The reason for working with open types is for performance: instantiating variables requires iterating through the expression. In one performance test pi_binders was more than 6x quicker than mk_local_pis (when applied to the type of all imported declarations 100x).

referenced by: [1] [2] [3] [4] [5] [6] [7] [8]

operator precedence of big operators

There is no established mathematical convention for the operator precedence of big operators like ∏ and ∑. We will have to make a choice.

Online discussions, such as https://math.stackexchange.com/q/185538/30839 seem to suggest that ∏ and ∑ should have the same precedence, and that this should be somewhere between * and +. The latter have precedence levels 70 and 65 respectively, and we therefore choose the level 67.

In practice, this means that parentheses should be placed as follows:

∑ k in K, (a k + b k) = ∑ k in K, a k + ∑ k in K, b k →
  ∏ k in K, a k * b k = (∏ k in K, a k) * (∏ k in K, b k)

(Example taken from page 490 of Knuth's Concrete Mathematics.)

out-param inheritance

Diamond inheritance cannot depend on out_params in the following circumstances:

  • there are three classes top, middle, bottom
  • all of these classes have a parameter (Îą : out_param _)
  • all of these classes have an instance parameter [root Îą] that depends on this out_param
  • the root class has two child classes: left and right, these are siblings in the hierarchy
  • the instance bottom.to_middle takes a [left Îą] parameter
  • the instance middle.to_top takes a [right Îą] parameter
  • there is a leaf class that inherits from both left and right. In that case, given instances bottom Îą and leaf Îą, Lean cannot synthesize a top Îą instance, even though the hypotheses of the instances bottom.to_middle and middle.to_top are satisfied.

There are two workarounds:

  • You could replace the bundled inheritance implemented by the instance middle.to_top with unbundled inheritance implemented by adding a [top Îą] parameter to the middle class. This is the preferred option since it is also more compatible with Lean 4, at the cost of being more work to implement and more verbose to use.
  • You could weaken the bottom.to_middle instance by making it depend on a subclass of middle.to_top's parameter, in this example replacing [left Îą] with [leaf Îą].

partially-applied ext lemmas

When possible, ext lemmas are stated without a full set of arguments. As an example, for bundled homs f, g, and of, f.comp of = g.comp of → f = g is a better ext lemma than (∀ x, f (of x) = g (of x)) → f = g, as the former allows a second type-specific extensionality lemmas to be applied to f.comp of = g.comp of. If the domain of of is ℕ or ℤ and of is a ring_hom, such a lemma could then make the goal f (of 1) = g (of 1).

For bundled morphisms, there is a ext lemma that always applies of the form (∀ x, ⇑f x = ⇑g x) → f = g. When adding type-specific ext lemmas like the one above, we want these to be tried first. This happens automatically since the type-specific lemmas are inevitably defined later.

referenced by: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41]

pointwise nat action

Pointwise monoids (set, finset, filter) have derived pointwise actions of the form has_smul α β → has_smul α (set β). When α is ℕ or ℤ, this action conflicts with the nat or int action coming from set β being a monoid or div_inv_monoid. For example, 2 • {a, b} can both be {2 • a, 2 • b} (pointwise action, pointwise repeated addition, set.has_smul_set) and {a + a, a + b, b + a, b + b} (nat or int action, repeated pointwise addition, set.has_nsmul).

Because the pointwise action can easily be spelled out in such cases, we give higher priority to the nat and int actions.

referenced by: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]

range copy pattern

For many categories (monoids, modules, rings, ...) the set-theoretic image of a morphism f is a subobject of the codomain. When this is the case, it is useful to define the range of a morphism in such a way that the underlying carrier set of the range subobject is definitionally set.range f. In particular this means that the types ↥(set.range f) and ↥f.range are interchangeable without proof obligations.

A convenient candidate definition for range which is mathematically correct is map ⊤ f, just as set.range could have been defined as f '' set.univ. However, this lacks the desired definitional convenience, in that it both does not match set.range, and that it introduces a redudant x ∈ ⊤ term which clutters proofs. In such a case one may resort to the copy pattern. A copy function converts the definitional problem for the carrier set of a subobject into a one-off propositional proof obligation which one discharges while writing the definition of the definitionally convenient range (the parameter hs in the example below).

A good example is the case of a morphism of monoids. A convenient definition for monoid_hom.mrange would be (⊤ : submonoid M).map f. However since this lacks the required definitional convenience, we first define submonoid.copy as follows:

protected def copy (S : submonoid M) (s : set M) (hs : s = S) : submonoid M :=
{ carrier  := s,
  one_mem' := hs.symm ▸ S.one_mem',
  mul_mem' := hs.symm ▸ S.mul_mem' }

and then finally define:

def mrange (f : M →* N) : submonoid N :=
((⊤ : submonoid M).map f).copy (set.range f) set.image_univ.symm
referenced by: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

reducible non-instances

Some definitions that define objects of a class cannot be instances, because they have an explicit argument that does not occur in the conclusion. An example is preorder.lift that has a function f : α → β as an explicit argument to lift a preorder on β to a preorder on α.

If these definitions are used to define instances of this class and this class is an argument to some other type-class so that type-class inference will have to unfold these instances to check for definitional equality, then these definitions should be marked @[reducible].

For example, preorder.lift is used to define units.preorder and partial_order.lift is used to define units.partial_order. In some cases it is important that type-class inference can recognize that units.preorder and units.partial_order give rise to the same has_le instance. For example, you might have another class that takes [has_le Îą] as an argument, and this argument sometimes comes from units.preorder and sometimes from units.partial_order. Therefore, preorder.lift and partial_order.lift are marked @[reducible].

referenced by: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] [176] [177] [178] [179]

simp-normal form

This note gives you some tips to debug any errors that the simp-normal form linter raises.

The reason that a lemma was considered faulty is because its left-hand side is not in simp-normal form. These lemmas are hence never used by the simplifier.

This linter gives you a list of other simp lemmas: look at them!

Here are some tips depending on the error raised by the linter:

  1. 'the left-hand side reduces to XYZ': you should probably use XYZ as the left-hand side.

  2. 'simp can prove this': This typically means that lemma is a duplicate, or is shadowed by another lemma:

    2a. Always put more general lemmas after specific ones:

    @[simp] lemma zero_add_zero : 0 + 0 = 0 := rfl
    @[simp] lemma add_zero : x + 0 = x := rfl
    

    And not the other way around! The simplifier always picks the last matching lemma.

    2b. You can also use @[priority] instead of moving simp-lemmas around in the file.

    Tip: the default priority is 1000. Use @[priority 1100] instead of moving a lemma down, and @[priority 900] instead of moving a lemma up.

    2c. Conditional simp lemmas are tried last. If they are shadowed just remove the simp attribute.

    2d. If two lemmas are duplicates, the linter will complain about the first one. Try to fix the second one instead! (You can find it among the other simp lemmas the linter prints out!)

  3. 'try_for tactic failed, timeout': This typically means that there is a loop of simp lemmas. Try to apply squeeze_simp to the right-hand side (removing this lemma from the simp set) to see what lemmas might be causing the loop.

    Another trick is to set_option trace.simplify.rewrite true and then apply try_for 10000 { simp } to the right-hand side. You will see a periodic sequence of lemma applications in the trace message.

slow-failing instance priority

Certain instances trigger further searches when they are considered as candidate instances; these instances should be assigned a priority lower than the default of 1000 (for example, 900).

The conditions for this rule are as follows:

  • a class C has instances instT : C T and instT' : C T'
  • types T and T' are both specializations of another type S
  • the parameters supplied to S to produce T are not (fully) determined by instT, instead they have to be found by instance search If those conditions hold, the instance instT should be assigned lower priority.

For example, suppose the search for an instance of decidable_eq (multiset α) tries the candidate instance con.quotient.decidable_eq (c : con M) : decidable_eq c.quotient. Since multiset and con.quotient are both quotient types, unification will check that the relations list.perm and c.to_setoid.r unify. However, c.to_setoid depends on a has_mul M instance, so this unification triggers a search for has_mul (list α); this will traverse all subclasses of has_mul before failing. On the other hand, the search for an instance of decidable_eq (con.quotient c) for c : con M can quickly reject the candidate instance multiset.has_decidable_eq because the type of list.perm : list ?m_1 → list ?m_1 → Prop does not unify with M → M → Prop. Therefore, we should assign con.quotient.decidable_eq a lower priority because it fails slowly. (In terms of the rules above, C := decidable_eq, T := con.quotient, instT := con.quotient.decidable_eq, T' := multiset, instT' := multiset.has_decidable_eq, and S := quot.)

If the type involved is a free variable (rather than an instantiation of some type S), the instance priority should be even lower, see Note [lower instance priority].

the algebraic hierarchy

The algebraic hierarchy #

In any theorem proving environment, there are difficult decisions surrounding the design of the "algebraic hierarchy".

There is a danger of exponential explosion in the number of gadgets, especially once interactions between algebraic and order/topological/etc structures are considered.

In mathlib, we try to avoid this by only introducing new algebraic typeclasses either

  1. when there is "real mathematics" to be done with them, or
  2. when there is a meaninful gain in simplicity by factoring out a common substructure.

(As examples, at this point we don't have loop, or unital_magma, but we do have lie_submodule and topological_field! We also have group_with_zero, as an exemplar of point 2.)

Generally in mathlib we use the extension mechanism (so comm_ring extends ring) rather than mixins (e.g. with separate ring and comm_mul classes), in part because of the potential blow-up in term sizes described at https://www.ralfj.de/blog/2019/05/15/typeclasses-exponential-blowup.html However there is tension here, as it results in considerable duplication in the API, particularly in the interaction with order structures.

This library note is not intended as a design document justifying and explaining the history of mathlib's algebraic hierarchy! Instead it is intended as a developer's guide, for contributors wanting to extend (either new leaves, or new intermediate classes) the algebraic hierarchy as it exists.

(Ideally we would have both a tour guide to the existing hierarchy, and an account of the design choices. See https://arxiv.org/abs/1910.09336 for an overview of mathlib as a whole, with some attention to the algebraic hierarchy and https://leanprover-community.github.io/mathlib-overview.html for a summary of what is in mathlib today.)

Instances #

When adding a new typeclass Z to the algebraic hierarchy one should attempt to add the following constructions and results, when applicable:

  • Instances transferred elementwise to products, like prod.monoid. See algebra.group.prod for more examples.
    instance prod.Z [Z M] [Z N] : Z (M × N) := ...
    
  • Instances transferred elementwise to pi types, like pi.monoid. See algebra.group.pi for more examples.
    instance pi.Z [∀ i, Z $ f i] : Z (Π i : I, f i) := ...
    
  • Instances transferred to mul_opposite M, like mul_opposite.monoid. See algebra.opposites for more examples.
    instance mul_opposite.Z [Z M] : Z (mul_opposite M) := ...
    
  • Instances transferred to ulift M, like ulift.monoid. See algebra.group.ulift for more examples.
    instance ulift.Z [Z M] : Z (ulift M) := ...
    
  • Definitions for transferring the proof fields of instances along injective or surjective functions that agree on the data fields, like function.injective.monoid and function.surjective.monoid. We make these definitions @[reducible], see note [reducible non-instances]. See algebra.group.inj_surj for more examples.
    @[reducible]
    def function.injective.Z [Z M₂] (f : M₁ → M₂) (hf : injective f)
      (one : f 1 = 1) (mul : ∀ x y, f (x * y) = f x * f y) : Z M₁ := ...
    
    @[reducible]
    def function.surjective.Z [Z M₁] (f : M₁ → M₂) (hf : surjective f)
      (one : f 1 = 1) (mul : ∀ x y, f (x * y) = f x * f y) : Z M₂ := ...
    
  • Instances transferred elementwise to finsupps, like finsupp.semigroup. See data.finsupp.pointwise for more examples.
    instance finsupp.Z [Z β] : Z (α →₀ β) := ...
    
  • Instances transferred elementwise to sets, like set.monoid. See algebra.pointwise for more examples.
    instance set.Z [Z Îą] : Z (set Îą) := ...
    
  • Definitions for transferring the entire structure across an equivalence, like equiv.monoid. See data.equiv.transfer_instance for more examples. See also the transport tactic.
    def equiv.Z (e : α ≃ β) [Z β] : Z α := ...
    /- When there is a new notion of `Z`-equiv: -/
    def equiv.Z_equiv (e : α ≃ β) [Z β] : by { letI := equiv.Z e, exact α ≃Z β } := ...
    

Subobjects #

When a new typeclass Z adds new data fields, you should also create a new sub_Z structure with a carrier field.

This can be a lot of work; for now try to closely follow the existing examples (e.g. submonoid, subring, subalgebra). We would very much like to provide some automation here, but a prerequisite will be making all the existing APIs more uniform.

If Z extends Y, then sub_Z should usually extend sub_Y.

When Z adds only new proof fields to an existing structure Y, you should provide instances transferring Z Îą to Z (sub_Y Îą), like submonoid.to_comm_monoid. Typically this is done using the function.injective.Z definition mentioned above.

instance sub_Y.to_Z [Z Îą] : Z (sub_Y Îą) :=
coe_injective.Z coe ...

Morphisms and equivalences #

Category theory #

For many algebraic structures, particularly ones used in representation theory, algebraic geometry, etc., we also define "bundled" versions, which carry category instances.

These bundled versions are usually named in camel case, so for example we have AddCommGroup as a bundled add_comm_group, and TopCommRing (which bundles together comm_ring, topological_space, and topological_ring).

These bundled versions have many appealing features:

  • a uniform notation for morphisms X ⟶ Y
  • a uniform notation (and definition) for isomorphisms X ≅ Y
  • a uniform API for subobjects, via the partial order subobject X
  • interoperability with unbundled structures, via coercions to Type (so if G : AddCommGroup, you can treat G as a type, and it automatically has an add_comm_group instance) and lifting maps AddCommGroup.of G, when G is a type with an add_comm_group instance.

If, for example you do the work of proving that a typeclass Z has a good notion of tensor product, you are strongly encouraged to provide the corresponding monoidal_category instance on a bundled version. This ensures that the API for tensor products is complete, and enables use of general machinery. Similarly if you prove universal properties, or adjunctions, you are encouraged to state these using categorical language!

One disadvantage of the bundled approach is that we can only speak of morphisms between objects living in the same type-theoretic universe. In practice this is rarely a problem.

Making a pull request #

With so many moving parts, how do you actually go about changing the algebraic hierarchy?

We're still evolving how to handle this, but the current suggestion is:

  • If you're adding a new "leaf" class, the requirements are lower, and an initial PR can just add whatever is immediately needed.
  • A new "intermediate" class, especially low down in the hierarchy, needs to be careful about leaving gaps.

In a perfect world, there would be a group of simultaneous PRs that basically cover everything! (Or at least an expectation that PRs may not be merged immediately while waiting on other PRs that fill out the API.)

However "perfect is the enemy of good", and it would also be completely reasonable to add a TODO list in the main module doc-string for the new class, briefly listing the parts of the API which still need to be provided. Hopefully this document makes it easy to assemble this list.

Another alternative to a TODO list in the doc-strings is adding github issues.

use has_coe_t

We use the class has_coe_t instead of has_coe if the first argument is a variable, or if the second argument is a variable not occurring in the first. Using has_coe would cause looping of type-class inference. See https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/remove.20all.20instances.20with.20variable.20domain

referenced by: [1] [2] [3]

user attribute parameters

For performance reasons, it is inadvisable to use user_attribute.get_param. The parameter is stored as a reflected expression. When calling get_param, the stored parameter is evaluated using eval_expr, which first compiles the expression into VM bytecode. The unevaluated expression is available using user_attribute.get_param_untyped.

In particular, user_attribute.get_param MUST NEVER BE USED in the implementation of an attribute cache. This is because calling eval_expr disables the attribute cache.

There are several possible workarounds:

  1. Set a different attribute depending on the parameter.
  2. Use your own evaluation function instead of eval_expr, such as e.g. expr.to_nat.
  3. Write your own has_reflect Param instance (using a more efficient serialization format). The user_attribute code unfortunately checks whether the expression has the correct type, but you can use `(id %%e : Param) to pretend that your expression e has type Param.
referenced by: [1] [2]