mathlib documentation

linear_algebra.matrix.transvection

Transvections #

Transvections are matrices of the form 1 + std_basis_matrix i j c, where std_basis_matrix i j c is the basic matrix with a c at position (i, j). Multiplying by such a transvection on the left (resp. on the right) amounts to adding c times the j-th row to to the i-th row (resp c times the i-th column to the j-th column). Therefore, they are useful to present algorithms operating on rows and columns.

Transvections are a special case of elementary matrices (according to most references, these also contain the matrices exchanging rows, and the matrices multiplying a row by a constant).

We show that, over a field, any matrix can be written as L ⬝ D ⬝ L', where L and L' are products of transvections and D is diagonal. In other words, one can reduce a matrix to diagonal form by operations on its rows and columns, a variant of Gauss' pivot algorithm.

Main definitions and results #

Implementation details #

The proof of the reduction results is done inductively on the size of the matrices, reducing an (r + 1) × (r + 1) matrix to a matrix whose last row and column are zeroes, except possibly for the last diagonal entry. This step is done as follows.

If all the coefficients on the last row and column are zero, there is nothing to do. Otherwise, one can put a nonzero coefficient in the last diagonal entry by a row or column operation, and then subtract this last diagonal entry from the other entries in the last row and column to make them vanish.

This step is done in the type fin r ⊕ unit, where fin r is useful to choose arbitrarily some order in which we cancel the coefficients, and the sum structure is useful to use the formalism of block matrices.

To proceed with the induction, we reindex our matrices to reduce to the above situation.

def matrix.transvection {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (i j : n) (c : R) :
matrix n n R

The transvection matrix transvection i j c is equal to the identity plus c at position (i, j). Multiplying by it on the left (as in transvection i j c ⬝ M) corresponds to adding c times the j-th line of M to its i-th line. Multiplying by it on the right corresponds to adding c times the i-th column to the j-th column.

Equations
@[simp]
theorem matrix.transvection_zero {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (i j : n) :
theorem matrix.update_row_eq_transvection {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (i j : n) [fintype n] (c : R) :
1.update_row i (1 i + c 1 j) = matrix.transvection i j c

A transvection matrix is obtained from the identity by adding c times the j-th row to the i-th row.

theorem matrix.transvection_mul_transvection_same {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (i j : n) [fintype n] (h : i j) (c d : R) :
@[simp]
theorem matrix.transvection_mul_apply_same {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (i j : n) [fintype n] (b : n) (c : R) (M : matrix n n R) :
(matrix.transvection i j c M) i b = M i b + c * M j b
@[simp]
theorem matrix.mul_transvection_apply_same {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (i j : n) [fintype n] (a : n) (c : R) (M : matrix n n R) :
(M matrix.transvection i j c) a j = M a j + c * M a i
@[simp]
theorem matrix.transvection_mul_apply_of_ne {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (i j : n) [fintype n] (a b : n) (ha : a i) (c : R) (M : matrix n n R) :
(matrix.transvection i j c M) a b = M a b
@[simp]
theorem matrix.mul_transvection_apply_of_ne {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (i j : n) [fintype n] (a b : n) (hb : b j) (c : R) (M : matrix n n R) :
(M matrix.transvection i j c) a b = M a b
@[simp]
theorem matrix.det_transvection_of_ne {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (i j : n) [fintype n] (h : i j) (c : R) :
@[nolint]
structure matrix.transvection_struct (n : Type u_1) (R : Type u₂) [decidable_eq n] [comm_ring R] :
Type (max u_1 u₂)
  • i : n
  • j : n
  • hij : self.i self.j
  • c : R

A structure containing all the information from which one can build a nontrivial transvection. This structure is easier to manipulate than transvections as one has a direct access to all the relevant fields.

@[instance]
def matrix.transvection_struct.to_matrix {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (t : matrix.transvection_struct n R) :
matrix n n R

Associating to a transvection_struct the corresponding transvection matrix.

Equations
@[simp]
theorem matrix.transvection_struct.to_matrix_mk {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (i j : n) (hij : i j) (c : R) :
{i := i, j := j, hij := hij, c := c}.to_matrix = matrix.transvection i j c
@[simp]
theorem matrix.transvection_struct.det {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] [fintype n] (t : matrix.transvection_struct n R) :
@[simp]
theorem matrix.transvection_struct.inv_j {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (t : matrix.transvection_struct n R) :
t.inv.j = t.j

The inverse of a transvection_struct, designed so that t.inv.to_matrix is the inverse of t.to_matrix.

Equations
@[simp]
theorem matrix.transvection_struct.inv_i {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (t : matrix.transvection_struct n R) :
t.inv.i = t.i
@[simp]
theorem matrix.transvection_struct.inv_c {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] (t : matrix.transvection_struct n R) :
t.inv.c = -t.c
theorem matrix.transvection_struct.inv_mul {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] [fintype n] (t : matrix.transvection_struct n R) :
theorem matrix.transvection_struct.mul_inv {n : Type u_1} {R : Type u₂} [decidable_eq n] [comm_ring R] [fintype n] (t : matrix.transvection_struct n R) :
def matrix.transvection_struct.sum_inl {n : Type u_1} (p : Type u_2) {R : Type u₂} [decidable_eq n] [decidable_eq p] [comm_ring R] (t : matrix.transvection_struct n R) :

Given a transvection_struct on n, define the corresponding transvection_struct on n ⊕ p using the identity on p.

Equations
def matrix.transvection_struct.reindex_equiv {n : Type u_1} {p : Type u_2} {R : Type u₂} [decidable_eq n] [decidable_eq p] [comm_ring R] (e : n p) (t : matrix.transvection_struct n R) :

Given a transvection_struct on n and an equivalence between n and p, define the corresponding transvection_struct on p.

Equations

Reducing matrices by left and right multiplication by transvections #

In this section, we show that any matrix can be reduced to diagonal form by left and right multiplication by transvections (or, equivalently, by elementary operations on lines and columns). The main step is to kill the last row and column of a matrix in fin r ⊕ unit with nonzero last coefficient, by subtracting this coefficient from the other ones. The list of these operations is recorded in list_transvec_col M and list_transvec_row M. We have to analyze inductively how these operations affect the coefficients in the last row and the last column to conclude that they have the desired effect.

Once this is done, one concludes the reduction by induction on the size of the matrices, through a suitable reindexing to identify any fintype with fin r ⊕ unit.

def matrix.pivot.list_transvec_col {𝕜 : Type u_3} [field 𝕜] {r : } (M : matrix (fin r unit) (fin r unit) 𝕜) :
list (matrix (fin r unit) (fin r unit) 𝕜)

A list of transvections such that multiplying on the left with these transvections will replace the last column with zeroes.

Equations
def matrix.pivot.list_transvec_row {𝕜 : Type u_3} [field 𝕜] {r : } (M : matrix (fin r unit) (fin r unit) 𝕜) :
list (matrix (fin r unit) (fin r unit) 𝕜)

A list of transvections such that multiplying on the right with these transvections will replace the last row with zeroes.

Equations
theorem matrix.pivot.list_transvec_col_mul_last_row_drop {𝕜 : Type u_3} [field 𝕜] {r : } (M : matrix (fin r unit) (fin r unit) 𝕜) (i : fin r unit) {k : } (hk : k r) :

Multiplying by some of the matrices in list_transvec_col M does not change the last row.

theorem matrix.pivot.list_transvec_col_mul_last_row {𝕜 : Type u_3} [field 𝕜] {r : } (M : matrix (fin r unit) (fin r unit) 𝕜) (i : fin r unit) :

Multiplying by all the matrices in list_transvec_col M does not change the last row.

theorem matrix.pivot.list_transvec_col_mul_last_col {𝕜 : Type u_3} [field 𝕜] {r : } (M : matrix (fin r unit) (fin r unit) 𝕜) (hM : M (sum.inr ()) (sum.inr ()) 0) (i : fin r) :

Multiplying by all the matrices in list_transvec_col M kills all the coefficients in the last column but the last one.

theorem matrix.pivot.mul_list_transvec_row_last_col_take {𝕜 : Type u_3} [field 𝕜] {r : } (M : matrix (fin r unit) (fin r unit) 𝕜) (i : fin r unit) {k : } (hk : k r) :

Multiplying by some of the matrices in list_transvec_row M does not change the last column.

theorem matrix.pivot.mul_list_transvec_row_last_col {𝕜 : Type u_3} [field 𝕜] {r : } (M : matrix (fin r unit) (fin r unit) 𝕜) (i : fin r unit) :

Multiplying by all the matrices in list_transvec_row M does not change the last column.

theorem matrix.pivot.mul_list_transvec_row_last_row {𝕜 : Type u_3} [field 𝕜] {r : } (M : matrix (fin r unit) (fin r unit) 𝕜) (hM : M (sum.inr ()) (sum.inr ()) 0) (i : fin r) :

Multiplying by all the matrices in list_transvec_row M kills all the coefficients in the last row but the last one.

Multiplying by all the matrices either in list_transvec_col M and list_transvec_row M kills all the coefficients in the last row but the last one.

Multiplying by all the matrices either in list_transvec_col M and list_transvec_row M kills all the coefficients in the last column but the last one.

Multiplying by all the matrices either in list_transvec_col M and list_transvec_row M turns the matrix in block-diagonal form.

There exist two lists of transvection_struct such that multiplying by them on the left and on the right makes a matrix block-diagonal, when the last coefficient is nonzero.

There exist two lists of transvection_struct such that multiplying by them on the left and on the right makes a matrix block-diagonal.

Inductive step for the reduction: if one knows that any size r matrix can be reduced to diagonal form by elementary operations, then one deduces it for matrices over fin r ⊕ unit.

Reduction to diagonal form by elementary operations is invariant under reindexing.

Any matrix can be reduced to diagonal form by elementary operations. Formulated here on Type 0 because we will make an induction using fin r. See exists_list_transvec_mul_mul_list_transvec_eq_diagonal for the general version (which follows from this one and reindexing).

Any matrix can be reduced to diagonal form by elementary operations.

Any matrix can be written as the product of transvections, a diagonal matrix, and transvections.

theorem matrix.diagonal_transvection_induction {n : Type u_1} {𝕜 : Type u_3} [field 𝕜] [decidable_eq n] [fintype n] (P : matrix n n 𝕜 → Prop) (M : matrix n n 𝕜) (hdiag : ∀ (D : n → 𝕜), (matrix.diagonal D).det = M.detP (matrix.diagonal D)) (htransvec : ∀ (t : matrix.transvection_struct n 𝕜), P t.to_matrix) (hmul : ∀ (A B : matrix n n 𝕜), P AP BP (A B)) :
P M

Induction principle for matrices based on transvections: if a property is true for all diagonal matrices, all transvections, and is stable under product, then it is true for all matrices. This is the useful way to say that matrices are generated by diagonal matrices and transvections.

We state a slightly more general version: to prove a property for a matrix M, it suffices to assume that the diagonal matrices we consider have the same determinant as M. This is useful to obtain similar principles for SLₙ or GLₙ.

theorem matrix.diagonal_transvection_induction_of_det_ne_zero {n : Type u_1} {𝕜 : Type u_3} [field 𝕜] [decidable_eq n] [fintype n] (P : matrix n n 𝕜 → Prop) (M : matrix n n 𝕜) (hMdet : M.det 0) (hdiag : ∀ (D : n → 𝕜), (matrix.diagonal D).det 0P (matrix.diagonal D)) (htransvec : ∀ (t : matrix.transvection_struct n 𝕜), P t.to_matrix) (hmul : ∀ (A B : matrix n n 𝕜), A.det 0B.det 0P AP BP (A B)) :
P M

Induction principle for invertible matrices based on transvections: if a property is true for all invertible diagonal matrices, all transvections, and is stable under product of invertible matrices, then it is true for all invertible matrices. This is the useful way to say that invertible matrices are generated by invertible diagonal matrices and transvections.