1 Matrix essentials

1.1 Overview

In statistics we will frequently make use of matrix calculations and matrix notation.

Throughout we mostly work with real matrices, i.e. we assume all matrix elements are real numbers. However, one important matrix decomposition — the eigenvalue decomposition — can yield complex-valued matrices even when applied to real matrices. Thus occasionally we will also need to deal also with complex numbers.

For further details on matrix theory please consult the lecture notes of related modules (e.g. linear algebra).

1.2 Matrix basics

Matrix notation

In matrix notation we distinguish between scalars, vectors, and matrices:

Scalar: \(x\), \(X\), lower or upper case, plain type.

Vector: \(\boldsymbol x\), lower case, bold type. In handwriting an arrow \(\vec{x}\) indicates a vector.

In component notation we write \(\boldsymbol x= \begin{pmatrix} x_1 \\ \vdots\\ x_d\end{pmatrix}\). By default, a vector is a column vector, i.e. the elements are arranged in a column and index of the components \(x_i\) refers to the row.

The transpose of a vector (indicated by the superscript \(T\)) turns it into a row vector. To save space we can write the column vector \(\boldsymbol x\) as \(\boldsymbol x= (x_1, \ldots, x_d)^T\) so that \(\boldsymbol x^T\) is a row vector.

Matrix: \(\boldsymbol X\), upper case, bold type. In handwriting an underscore \(\underline{X}\) indicates a matrix.

In component notation we write \(\boldsymbol X= (x_{ij})\). By convention, the first index (here \(i\)) of the scalar elements \(x_{ij}\) denotes the row and the second index (here \(j\)) the column of the matrix. For \(n\) the number of rows and \(d\) the number of columns we can view the matrix \(\boldsymbol X= (\boldsymbol x_1, \ldots, \boldsymbol x_d)\) either as being composed of \(d\) column vectors \(\boldsymbol x_j = \begin{pmatrix} x_{1j} \\ \vdots \\ x_{nj}\\ \end{pmatrix}\) or \(\boldsymbol X= \begin{pmatrix} \boldsymbol z_1^T \\ \vdots \\ \boldsymbol z_n^T\end{pmatrix}\) being composed of \(n\) row vectors \(\boldsymbol z_i^T = (x_{i1}, \ldots, x_{id})\).

A (column) vector of dimension \(d\) is a matrix of size \(d\times 1\). A row vector of dimension \(d\) is a matrix of size \(1\times d\). A scalar is of dimension \(1\) and is a matrix of size \(1 \times 1\).

Notation for random vectors and matrices

A random matrix (vector) is a matrix (vector) whose elements are random variables.

It is common practise in univariate statistics to distinguish random variables and their fixed realisations by using upper case versus lower case. However, this convention breaks down when working with matrices and vectors.

Therefore, when working with multivariate random quantities it is best practise (see e.g. Mardia et al. 1979)¹ to always state explicitly whether a matrix, vector or scalar is a random variable, especially when this is not obvious from the context.

Special matrices

\(\boldsymbol I_d\) is the identity matrix. It is a square matrix of size \(d \times d\) with the diagonal filled with 1 and off-diagonals filled with 0. \[\boldsymbol I_d = \begin{pmatrix} 1 & 0 & 0 & \dots & 0\\ 0 & 1 & 0 & \dots & 0\\ 0 & 0 & 1 & & 0\\ \vdots & \vdots & & \ddots & \\ 0 & 0 & 0 & & 1 \\ \end{pmatrix}\]

\(\mathbf 1\) is a matrix that contains only ones. Most often it is used in the form of a column vector with \(d\) rows: \[\mathbf 1_d = \begin{pmatrix} 1 \\ 1 \\ 1 \\ \vdots \\ 1 \\ \end{pmatrix}\]

Similarly, \(\mathbf 0\) is a matrix that contains only zeros. Most often it is used in the form of a column vector with \(d\) rows: \[\mathbf 0_d = \begin{pmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 0 \\ \end{pmatrix}\]

A diagonal matrix is a matrix where all off-diagonal elements are zero. By \(\text{Diag}(\boldsymbol A)\) we access the diagonal elements of a matrix as vector and by \(\text{Diag}(a_1, \ldots, a_d)\) we specify a diagonal matrix by listing the diagonal elements.

Any matrix can be partitioned into blocks or sub-matrices. A block-structured matrix or block matrix partitioning rows and columns into two groups has the form \[ \boldsymbol A= \begin{pmatrix} \boldsymbol A_{11} & \boldsymbol A_{12} \\ \boldsymbol A_{21} & \boldsymbol A_{22} \\ \end{pmatrix} \, , \] where \(\boldsymbol A_{11}\), \(\boldsymbol A_{22}\), \(\boldsymbol A_{12}\) and \(\boldsymbol A_{21}\) are themselves matrices. If \(\boldsymbol A\) is symmetric (hence square) then \(\boldsymbol A_{11}\) and \(\boldsymbol A_{22}\) must also be symmetric and \(\boldsymbol A_{21} = \boldsymbol A_{12}^T\).

A block diagonal matrix is a symmetric block matrix with vanishing off-diagonal blocks and symmetric sub-matrices along the diagonal.

A triangular matrix is a square matrix whose elements either below or above the diagonal are all zero (upper vs. lower triangular matrix).

1.3 Simple matrix operations

Matrix addition and multiplication

Matrices behave much like common numbers. For example, we can add matrices \(\boldsymbol C= \boldsymbol A+ \boldsymbol B\) and multiply matrices \(\boldsymbol C= \boldsymbol A\boldsymbol B\).

For matrix addition \(\boldsymbol C= \boldsymbol A+ \boldsymbol B\) we add the corresponding elements \(c_{ij} = a_{ij} + b_{ij}\). For matrix addition \(\boldsymbol A\) and \(\boldsymbol B\) must have the same dimensions, i.e. the same number of rows and columns.

The dot product, or scalar product, of two vectors \(\boldsymbol a\) and \(\boldsymbol b\) is a scalar given by \(\boldsymbol a\cdot \boldsymbol b= \langle \boldsymbol a, \boldsymbol b\rangle = \boldsymbol a^T \boldsymbol b= \boldsymbol b^T \boldsymbol a= \sum_{i=1}^d a_{i} b_{i}\).

Matrix multiplication \(\boldsymbol C= \boldsymbol A\boldsymbol B\) is obtained by setting \(c_{ij} = \sum_{k=1}^m a_{ik} b_{kj}\) where \(m\) is the number of columns of \(\boldsymbol A\) and the number of rows in \(\boldsymbol B\). Thus, \(\boldsymbol C\) contains all possible dot products of the row vectors in \(\boldsymbol A\) with the column vectors in \(\boldsymbol B\). For matrix multiplication the number of columns in \(\boldsymbol A\) must match the number of rows in \(\boldsymbol B\). Note that matrix multiplication in general (for \(m > 1\)) does not commute, i.e. \(\boldsymbol A\boldsymbol B\neq \boldsymbol B\boldsymbol A\).

Matrix transpose

The matrix transpose \(\boldsymbol A^T\) indicate by the superscript \(T\) interchanges rows and columns of a matrix. The transpose is a linear operator \((\boldsymbol A+ \boldsymbol B)^T = \boldsymbol A^T + \boldsymbol B^T\) and applied to a matrix product it reverses the ordering, i.e. \((\boldsymbol A\boldsymbol B)^T =\boldsymbol B^T \boldsymbol A^T\).

If \(\boldsymbol A= \boldsymbol A^T\) then \(\boldsymbol A\) is symmetric (and square).

By construction given a rectangular \(\boldsymbol A\) the matrices \(\boldsymbol A^T \boldsymbol A\) and \(\boldsymbol A\boldsymbol A^T\) are symmetric with non-negative diagonal.

1.4 Matrix summaries

Row, column and grand sum

Assume a matrix \(\boldsymbol A\) of size \(n \times m\).

The sum over the \(m\) entries of row \(i\) is \(\sum_{j=1}^m a_{ij}\). In matrix notation the \(n\) row sums are given by \(\boldsymbol A\, \mathbf 1_m\).

The sum over the \(n\) entries of column \(j\) is \(\sum_{i=1}^n a_{ij}\). In matrix notation the \(m\) column sums are \(\boldsymbol A^T \mathbf 1_n\).

The grand sum of all matrix entries of \(\boldsymbol A\) is obtained by \[ \sum_{i=1}^n \sum_{j=1}^m a_{ij} = \mathbf 1_n^T \, \boldsymbol A\, \mathbf 1_m \]

Matrix trace

The trace of the matrix is the sum of the diagonal elements \(\text{Tr}(\boldsymbol A) = \sum a_{ii}\).

The trace is invariant against transposition, i.e. \[ \text{Tr}(\boldsymbol A) = \text{Tr}(\boldsymbol A^T ) \]

A useful identity for the matrix trace of the product of two matrices is \[ \text{Tr}(\boldsymbol A\boldsymbol B) = \text{Tr}( \boldsymbol B\boldsymbol A) \]

Intriguingly, the trace of a matrix equals the sum of the eigenvalues of the matrix (see further below).

Row, column and grand sum of squares

The sum over the \(m\) squared entries of row \(i\) is \(\sum_{j=1}^m a_{ij}^2\). In matrix notation the \(n\) row sums of squares are given by \(\text{Diag}( \boldsymbol A\boldsymbol A^T)\).

The sum over the \(n\) squared entries of column \(j\) is \(\sum_{i=1}^n a_{ij}^2\). In matrix notation the \(m\) column sums of squares are \(\text{Diag}( \boldsymbol A^T \boldsymbol A)\).

The grand sum of all squared elements of \(\boldsymbol A\) is obtained by \[ \sum_{i=1}^n \sum_{j=1}^m a_{ij}^2 = \text{Tr}(\boldsymbol A^T \boldsymbol A) = \text{Tr}(\boldsymbol A\boldsymbol A^T) \] This is also known as the squared Frobenius norm of \(\boldsymbol A\) (see below).

Sum of squared diagonal entries

The sum of the squared entries on the diagonal is in matrix notation \[ \text{Diag}(\boldsymbol A)^T \text{Diag}(\boldsymbol A) = \sum_{i=1}^{\min(n,m)} a_{ii}^2 \]

Frobenius inner product

The Frobenius inner product between two rectangular matrices of the same dimension is the scalar \[ \begin{split} \langle \boldsymbol A, \boldsymbol B\rangle &= \text{Tr}(\boldsymbol A\boldsymbol B^T) = \text{Tr}(\boldsymbol B\boldsymbol A^T)\\ &= \text{Tr}(\boldsymbol A^T \boldsymbol B) = \text{Tr}(\boldsymbol B^T \boldsymbol A)\\ &= \sum_{i,j} a_{ij} b_{ij} \,. \end{split} \] This generalises the dot product between two vectors. Note that the dot product can therefore also be written as the trace of a matrix \[ \langle \boldsymbol a, \boldsymbol b\rangle = \text{Tr}( \boldsymbol a\boldsymbol b^T ) = \text{Tr}( \boldsymbol b\boldsymbol a^T) \,. \]

Euclidean norm

The squared Euclidean norm or the squared length of the vector \(\boldsymbol a\) is the dot product \(||\boldsymbol a||^2_2 = \boldsymbol a\cdot \boldsymbol a= \langle \boldsymbol a, \boldsymbol a\rangle = \boldsymbol a^T \boldsymbol a= \boldsymbol a\boldsymbol a^T = \sum_{i=1}^d a_i^2\).

The squared Frobenius norm is a generalisation of the squared Euclidean vector norm to a rectangular matrix and is the sum of the squares of all its elements. Using the trace it can be written as \[ \begin{split} ||\boldsymbol A||_F^2 &= \langle \boldsymbol A, \boldsymbol A\rangle \\ &= \text{Tr}(\boldsymbol A^T \boldsymbol A) = \text{Tr}(\boldsymbol A\boldsymbol A^T) \\ &= \sum_{i,j} a_{ij}^2 \,. \end{split} \]

A useful identity for the squared Frobenius norm of the difference of two matrices is \[ \begin{split} ||\boldsymbol A- \boldsymbol B||_F^2 &= ||\boldsymbol A||_F^2 + ||\boldsymbol B||_F^2 - 2 \langle \boldsymbol A, \boldsymbol B\rangle \\ &= \text{Tr}(\boldsymbol A^T \boldsymbol A) +\text{Tr}(\boldsymbol B^T \boldsymbol B) - 2 \text{Tr}(\boldsymbol A^T \boldsymbol B) \\ &= \sum_{i,j} (a_{ij}-b_{ij})^2 \,. \end{split} \]

The Frobenius norm of a matrix \(||\boldsymbol A||_F\) is not to be confused with the induced \(2\)-norm of a matrix \(||\boldsymbol A||_2\). The latter equals the maximum absolute eigenvalue of the matrix, with \(||\boldsymbol A||_2 \leq ||\boldsymbol A||_F\).

Determinant of a matrix

If \(\boldsymbol A\) is a square matrix the determinant \(\det(\boldsymbol A)\) is a scalar measuring the volume spanned by the column vectors in \(\boldsymbol A\) with the sign determined by the orientation of the vectors.

If \(\det(\boldsymbol A) \neq 0\) the matrix \(\boldsymbol A\) is non-singular or non-degenerate. Conversely, if \(\det(\boldsymbol A) =0\) the matrix \(\boldsymbol A\) is singular or degenerate.

Intriguingly, the determinant of \(\boldsymbol A\) is the product of the eigenvalues of \(\boldsymbol A\) (see further below).

One way to compute the determinant of a matrix \(\boldsymbol A\) is the Laplace cofactor expansion approach that proceeds recursively based on the determinants of the sub-matrices \(\boldsymbol A_{-i,-j}\) obtained by deleting row \(i\) and column \(j\) from \(\boldsymbol A\). Specifically, at each level we compute the

cofactor expansion either
1. along the \(i\)-th row — pick any row \(i\): \[\det(\boldsymbol A) = \sum_{j=1}^d a_{ij} (-1)^{i+j} \det(\boldsymbol A_{-i,-j}) \text{ , or}\]
2. along the \(j\)-th column — pick any \(j\): \[\det(\boldsymbol A) = \sum_{i=1}^d a_{ij} (-1)^{i+j} \det(\boldsymbol A_{-i,-j})\].
Then repeat until the sub-matrix is a scalar \(a\) and \(\det(a)=a \,.\)

The recursive nature of this algorithm leads to a complexity of order \(O(d!)\) so it is not practical except for very small \(d\). Therefore, in practice other more efficient algorithms for computing determinants are used but these still have algorithmic complexity in the order of \(O(d^3)\) so for large dimensions obtaining determinants is very expensive.

However, some specially structured matrices do allow for very fast calculation.

The determinant of a triangular matrix (and thus also of a diagonal matrix) \[ \boldsymbol A= \begin{pmatrix} a_{11} & 0 & \cdots & 0\\ a_{21} & a_{22} & \cdots & 0\\ \vdots & \vdots & \ddots & 0 \\ a_{d1} & a_{d2} & \cdots & a_{dd} \\ \end{pmatrix} \] is the product of its diagonal elements, i.e. \(\det(\boldsymbol A) = \prod_{i=1}^d a_{ii}\).

For a two-dimensional matrix \(\boldsymbol A= \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\\end{pmatrix}\) the determinant is \(\det(A) = a_{11} a_{22} - a_{12} a_{21}\).

For a block-structured square matrix \[ \boldsymbol A= \begin{pmatrix} \boldsymbol A_{11} & \boldsymbol A_{12} \\ \boldsymbol A_{21} & \boldsymbol A_{22} \\ \end{pmatrix} \, , \] where the matrices on the diagonal \(\boldsymbol A_{11}\) and \(\boldsymbol A_{22}\) are themselves square but \(\boldsymbol A_{12}\) and \(\boldsymbol A_{21}\) may be rectangular, the determinant is \[ \det(\boldsymbol A) = \det(\boldsymbol A_{22}) \det(\boldsymbol C_1) = \det(\boldsymbol A_{11}) \det(\boldsymbol C_2) \] with the (Schur complement of \(\boldsymbol A_{22}\)) \[ \boldsymbol C_1 = \boldsymbol A_{11} - \boldsymbol A_{12} \boldsymbol A_{22}^{-1} \boldsymbol A_{21} \] and (Schur complement of \(\boldsymbol A_{11}\)) \[ \boldsymbol C_2 = \boldsymbol A_{22} - \boldsymbol A_{21} \boldsymbol A_{11}^{-1} \boldsymbol A_{12} \] Note that \(\boldsymbol C_1\) and \(\boldsymbol C_2\) are square matrices.

For a block-diagonal matrix \(\boldsymbol A\) with \(\boldsymbol A_{12} = 0\) and \(\boldsymbol A_{21} = 0\) the determinant is \(\det(\boldsymbol A) = \det(\boldsymbol A_{11}) \det(\boldsymbol A_{22})\), i.e. the product of the determinants of the sub-matrices along the diagonal.

Determinants have a multiplicative property, \[\det(\boldsymbol A\boldsymbol B) = \det(\boldsymbol B\boldsymbol A) = \det(\boldsymbol A) \det(\boldsymbol B) \,.\] In the above \(\boldsymbol A\) and \(\boldsymbol B\) are both square and of the same dimension.

For rectangular \(\boldsymbol A\) (\(n \times m\)) and rectangular \(\boldsymbol B\) (\(m \times n\)) with \(m \geq n\) this generalises to the Cauchy-Binet formula \[ \det(\boldsymbol A\boldsymbol B) = \sum_{w} \det(\boldsymbol A_{,w}) \det(\boldsymbol B_{w,}) \] where the summation is over all \(\binom{m}{n}\) index subsets \(w\) of size \(n\) taken from \(\{1, \ldots, m\}\) keeping the ordering and \(\boldsymbol A_{,w}\) and \(\boldsymbol B_{w,}\) are the corresponding square \(n \times n\) sub-matrices. If \(m < n\) then \(\det(\boldsymbol A\boldsymbol B) = 0\).

For scalar \(a\) \(\det(a \boldsymbol B) = a^d \det(\boldsymbol B)\) where \(d\) is the dimension of \(\boldsymbol B\).

Another important identity is \[\det(\boldsymbol I_n + \boldsymbol A\boldsymbol B) = \det(\boldsymbol I_m + \boldsymbol B\boldsymbol A)\] where \(\boldsymbol A\) and \(\boldsymbol B\) are rectangular matrices. This is called the Weinstein-Aronszajn determinant identity (also credited to Sylvester).

1.5 Matrix inverse

Inversion of square matrix

If \(\boldsymbol A\) is a square matrix then the inverse matrix \(\boldsymbol A^{-1}\) is a matrix such that \[\boldsymbol A^{-1} \boldsymbol A= \boldsymbol A\boldsymbol A^{-1}= \boldsymbol I\, .\] Only non-singular matrices with \(\det(\boldsymbol A) \neq 0\) are invertible.

As \(\det(\boldsymbol A^{-1} \boldsymbol A) = \det(\boldsymbol I) = 1\) the determinant of the inverse matrix equals the inverse determinant, \[\det(\boldsymbol A^{-1}) = \det(\boldsymbol A)^{-1} \,.\]

The transpose of the inverse is the inverse of the transpose as \[ \begin{split} (\boldsymbol A^{-1})^T &= (\boldsymbol A^{-1})^T \, \boldsymbol A^T (\boldsymbol A^{T})^{-1} \\ &= (\boldsymbol A\boldsymbol A^{-1})^T \, (\boldsymbol A^{T})^{-1} = (\boldsymbol A^{T})^{-1} \,. \\ \end{split} \]

The inverse of a matrix product \((\boldsymbol A\boldsymbol B)^{-1} = \boldsymbol B^{-1} \boldsymbol A^{-1}\) is the product of the individual matrix inverses in reverse order.

There are many different algorithms to compute the inverse of a matrix (which is essentially a problem of solving a system of equations). The computational complexity of matrix inversion is of the order \(O(d^3)\) where \(d\) is the dimension of \(\boldsymbol A\). Therefore matrix inversion is very costly in higher dimensions.

Example 1.1 Inversion of a \(2 \times 2\) matrix:

The inverse of the matrix \(A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}\) is \(A^{-1} = \frac{1}{ad-bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}\)

Inversion of structured matrices

However, for specially structured matrices inversion can be done effectively:

The inverse of a diagonal matrix is another diagonal matrix obtained by inverting the diagonal elements.
More generally, the inverse of a block-diagonal matrix is obtained by individually inverting the blocks along the diagonal.

The Woodbury matrix identity simplifies the inversion of matrices that can be written as \(\boldsymbol A+ \boldsymbol U\boldsymbol B\boldsymbol V\) where \(\boldsymbol A\) and \(\boldsymbol B\) are both square and \(\boldsymbol U\) and \(\boldsymbol V\) are suitable rectangular matrices: \[ (\boldsymbol A+ \boldsymbol U\boldsymbol B\boldsymbol V)^{-1} = \boldsymbol A^{-1} - \boldsymbol A^{-1} \boldsymbol U(\boldsymbol B^{-1} + \boldsymbol V\boldsymbol A^{-1} \boldsymbol U)^{-1} \boldsymbol V\boldsymbol A^{-1} \] Typically, the inverse \(\boldsymbol A^{-1}\) is either already known or can be easily obtained and the dimension of \(\boldsymbol B\) is much lower than that of \(\boldsymbol A\).

The class of matrices that can be most easily inverted are orthogonal matrices whose inverse is obtained simply by transposing the matrix.

1.6 Orthogonal matrices

Properties

An orthogonal matrix \(\boldsymbol Q\) is a square matrix with the property that \(\boldsymbol Q^T = \boldsymbol Q^{-1}\), i.e. the transpose is also the inverse. This implies that \(\boldsymbol Q\boldsymbol Q^T = \boldsymbol Q^T \boldsymbol Q= \boldsymbol I\).

Both the column and the row vectors in \(\boldsymbol Q\) all have length 1. This implies that each element \(q_{ij}\) of \(\boldsymbol Q\) can only take a value in the interval \([-1, 1]\).

The identity matrix \(\boldsymbol I\) is the simplest example of an orthogonal matrix.

The squared Euclidean and Frobenius norm is preserved when a vector \(\boldsymbol a\) or matrix \(\boldsymbol A\) is multiplied with an orthogonal matrix \(\boldsymbol Q\): \[ || \boldsymbol Q\boldsymbol a||^2_2 = (\boldsymbol Q\boldsymbol a)^T \boldsymbol Q\boldsymbol a= \boldsymbol a^T \boldsymbol a= || \boldsymbol a||^2_2 \] and \[ || \boldsymbol Q\boldsymbol A||^2_F = \text{Tr}\left((\boldsymbol Q\boldsymbol A)^T \boldsymbol Q\boldsymbol A\right) = \text{Tr}\left(\boldsymbol A^T \boldsymbol A\right) = || \boldsymbol A||^2_F \]

Multiplication of \(\boldsymbol Q\) with a vector results in a new vector of the same length but with a change in direction (unless \(\boldsymbol Q=\boldsymbol I\)). An orthogonal matrix \(\boldsymbol Q\) can thus be interpreted geometrically as an operator performing rotation, reflection and/or permutation.

The product \(\boldsymbol Q_3 = \boldsymbol Q_1 \boldsymbol Q_2\) of two orthogonal matrices \(\boldsymbol Q_1\) and \(\boldsymbol Q_2\) yields another orthogonal matrix as \(\boldsymbol Q_3 \boldsymbol Q_3^T = \boldsymbol Q_1 \boldsymbol Q_2 (\boldsymbol Q_1 \boldsymbol Q_2)^T = \boldsymbol Q_1 \boldsymbol Q_2 \boldsymbol Q_2^T \boldsymbol Q_1^T = \boldsymbol I\).

The determinant \(\det(\boldsymbol Q)\) of an orthogonal matrix is either +1 or -1, because \(\boldsymbol Q\boldsymbol Q^T = \boldsymbol I\) and thus \(\det(\boldsymbol Q)\det(\boldsymbol Q^T) = \det(\boldsymbol Q)^2 = \det(\boldsymbol I) = 1\).

The set of all orthogonal matrices of dimension \(d\) together with multiplication form a group called the orthogonal group \(O(d)\). The subset of orthogonal matrices with \(\det(\boldsymbol Q)=1\) are called rotation matrices and form with multiplication the special orthogonal group \(SO(d)\). Orthogonal matrices with \(\det(\boldsymbol Q)=-1\) are rotation-reflection matrices.

Semi-orthogonal matrices

A rectangular \(d \times k\) matrix \(\boldsymbol Q\) is semi-orthogonal if for \(k < d\) the \(k\) column vectors are orthonormal and hence \(\boldsymbol Q^T \boldsymbol Q= \boldsymbol I_k\), or if for \(k > d\) the \(d\) row vectors are orthonormal with \(\boldsymbol Q\boldsymbol Q^T = \boldsymbol I_d\).

The set of all (semi)-orthogonal matrices \(\boldsymbol Q\) with \(k \leq d\) column vectors is known as the Stiefel manifold \(\text{St}(d, k)\).

Generating orthogonal matrices

In two dimensions \((d=2)\) all orthogonal matrices \(\boldsymbol R\) representing rotations with \(\det(\boldsymbol R)=1\) are given by \[ \boldsymbol R(\theta) = \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix} \] and those representing rotation-reflections \(\boldsymbol G\) with \(\det(\boldsymbol G)=-1\) by \[ \boldsymbol G(\theta) = \begin{pmatrix} \cos \theta & \sin \theta \\ \sin \theta & -\cos \theta \end{pmatrix}\,. \] Every orthogonal matrix of dimension \(d=2\) can be represented as the product of at most two rotation-reflection matrices because \[ \boldsymbol R(\theta) = \boldsymbol G(\theta)\, \boldsymbol G(0) = \begin{pmatrix} \cos \theta & \sin \theta \\ \sin \theta & -\cos \theta \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}\,. \] Thus, the matrix \(\boldsymbol G\) is a generator of two-dimensional orthogonal matrices. Note that \(\boldsymbol G(\theta)\) is symmetric, orthogonal and has determinant -1.

More generally, and applicable in arbitrary dimension, the role of generator is taken by the Householder reflection matrix \[ \boldsymbol Q_{HH}(\boldsymbol v) = \boldsymbol I- 2 \boldsymbol v\boldsymbol v^T \] where \(\boldsymbol v\) is a vector of unit length (with \(\boldsymbol v^T \boldsymbol v=1\)) orthogonal to the reflection hyperplane. Note that \(\boldsymbol Q_{HH}(\boldsymbol v) = \boldsymbol Q_{HH}(-\boldsymbol v)\). By construction the matrix \(\boldsymbol Q_{HH}(\boldsymbol v)\) is symmetric, orthogonal and has determinant -1.

It can be shown that any \(d\)-dimensional orthogonal matrix \(\boldsymbol Q\) can be represented as the product of at most \(d\) Householder reflection matrices. The two-dimensional generator \(\boldsymbol G(\theta)\) is recovered as the Householder matrix \(\boldsymbol Q_{HH}(\boldsymbol v)\) with \(\boldsymbol v= \begin{pmatrix} -\sin \frac{\theta}{2} \\ \cos \frac{\theta}{2} \end{pmatrix}\) or \(\boldsymbol v= \begin{pmatrix} \sin \frac{\theta}{2} \\ -\cos \frac{\theta}{2} \end{pmatrix}\).

Permutation matrix

A special type of an orthogonal matrix is a permutation matrix \(\boldsymbol P\) created by permuting rows and/or columns of the identity matrix \(\boldsymbol I\). Thus, each row and column of \(\boldsymbol P\) contains exactly one entry of 1, but not necessarily on the diagonal.

If a permutation matrix \(\boldsymbol P\) is multiplied with a matrix \(\boldsymbol A\) it acts as an operator permuting the columns (\(\boldsymbol A\boldsymbol P\)) or the rows (\(\boldsymbol P\boldsymbol A\)). For a set of \(d\) elements there exist \(d!\) permutations. Thus, for dimension \(d\) there are \(d!\) possible permutation matrices (including the identity matrix).

The determinant of a permutation matrix is either +1 or -1. The product of two permutation matrices yields another permutation matrix.

Symmetric permutation matrices correspond to self-inverse permutations (i.e. the permutation matrix is its own inverse), and are also called permutation involutions. They can have determinant +1 and -1.

A transposition is a permutation where only two elements are exchanged. Thus, in a transposition matrix \(\boldsymbol T\) exactly two rows and/or columns are exchanged compared to identity matrix \(\boldsymbol I\). Transpositions are self-inverse, and transposition matrices are symmetric. There are \(\frac{d (d-1)}{2}\) different transposition matrices. The determinant of a transposition matrix is \(\det(\boldsymbol T)= -1\).

Note that the transposition matrix is an instance of a Householder matrix \(\boldsymbol Q_{HH}(\boldsymbol v)\) with vector \(\boldsymbol v\) filled with zeros except for two elements that have value \(\frac{\sqrt{2}}{2}\) and \(-\frac{\sqrt{2}}{2}\).

Any permutation of \(d\) elements can be generated by a series of at most \(d-1\) transpositions. Correspondingly, any permutation matrix \(\boldsymbol P\) can be constructed by multiplication of the identity matrix with at most \(d-1\) transposition matrices. If the number of transpositions is even then \(\det(\boldsymbol P) = 1\) otherwise for an uneven number \(\det(\boldsymbol P) = -1\). This is called the sign or signature of the permutation.

The set of all permutations form the symmetric group \(S_d\), the subset of even permutations (with positive sign and \(\det(\boldsymbol P)=1\)) the alternating group \(A_d\).

1.7 Eigenvalues and eigenvectors

Definition

Assume a square matrix \(\boldsymbol A\) of size \(d \times d\). A vector \(\boldsymbol u\neq 0\) is called an eigenvector of the matrix \(\boldsymbol A\) and \(\lambda\) the corresponding eigenvalue if
\[\boldsymbol A\boldsymbol u= \boldsymbol u\lambda \, .\] This is called eigenvalue equation or eigenequation.

Finding eigenvalues and vectors

To find the eigenvalues and eigenvectors the eigenequation is rewritten as \[(\boldsymbol A-\boldsymbol I\lambda ) \; \boldsymbol u= \mathbf 0\,.\] For this equation to hold for an eigenvector \(\boldsymbol u\neq 0\) with eigenvalue \(\lambda\) implies that the matrix \(\boldsymbol A-\boldsymbol I\lambda\) is singular. Correspondingly, its determinant must vanish \[\det(\boldsymbol A-\boldsymbol I\lambda ) =0 \,.\] This is called the characteristic equation of the matrix \(\boldsymbol A\), and its solution yields the \(d\) eigenvalues \(\lambda_1, \ldots, \lambda_d\). Note the eigenvalues need not be distinct and they may be complex even if the matrix \(\boldsymbol A\) is real.

If there are complex eigenvalues, for a real matrix those eigenvalues come in conjugate pairs. Hence, for a complex \(\lambda_1 = r e^{i \phi}\) there will also be a corresponding complex eigenvalue \(\lambda_2 = r e^{-i \phi}\).

Given the eigenvalues we then solve the eigenequation for the corresponding non-zero eigenvectors \(\boldsymbol u_1, \ldots, \boldsymbol u_d\). Note that eigenvectors of real matrices can have complex components. Also the eigenvector is only defined by the eigenequation up to a scalar. By convention eigenvectors are therefore typically standardised to unit length but this still leaves a sign ambiguity for real eigenvectors and implies that complex eigenvectors are defined only up to a factor with modulus 1.

Eigenequation in matrix notation

With the matrix \[\boldsymbol U= (\boldsymbol u_1, \ldots, \boldsymbol u_d)\] containing the standardised eigenvectors in the columns and the diagonal matrix \[\boldsymbol \Lambda= \begin{pmatrix} \lambda_{1} & \dots & 0\\ \vdots & \ddots & \vdots \\ 0 & \dots & \lambda_{d} \end{pmatrix}\] containing the eigenvalues (typically sorted in order of magnitude) the eigenvalue equation can be written as \[\boldsymbol A\boldsymbol U= \boldsymbol U\boldsymbol \Lambda\,.\]

Permutation of eigenvalues

If eigenvalues are not in order, we may apply a permutation matrix \(\boldsymbol P\) to arrange them in order. With \(\boldsymbol \Lambda^{\text{sort}} = \boldsymbol P^T \boldsymbol \Lambda\boldsymbol P\) as the sorted eigenvalues and \(\boldsymbol U^{\text{sort}} = \boldsymbol U\boldsymbol P\) as the corresponding eigenvectors the eigenequation becomes \[\boldsymbol A\boldsymbol U^{\text{sort}} = \boldsymbol A\boldsymbol U\boldsymbol P= \boldsymbol U\boldsymbol \Lambda\boldsymbol P= \boldsymbol U\boldsymbol P\boldsymbol P^T \boldsymbol \Lambda\boldsymbol P= \boldsymbol U^{\text{sort}} \boldsymbol \Lambda^{\text{sort}} \,.\]

Similar matrices

Two matrices \(\boldsymbol A\) and \(\boldsymbol B\) are called similar if they share the same eigenvalues.

From \(\boldsymbol A\) with eigenvalues \(\boldsymbol \Lambda\) and eigenvectors \(\boldsymbol U\) we can construct a similar \(\boldsymbol B\) via the similarity transformation \(\boldsymbol B= \boldsymbol M\boldsymbol A\boldsymbol M^{-1}\) where \(\boldsymbol M\) is an invertible matrix.

Then \(\boldsymbol \Lambda\) are the eigenvalues of \(\boldsymbol B\) and \(\boldsymbol V= \boldsymbol M\boldsymbol U\) its eigenvectors as \[ \boldsymbol B\boldsymbol V= \boldsymbol M\boldsymbol A\boldsymbol M^{-1} \boldsymbol M\boldsymbol U= \boldsymbol M\boldsymbol A\boldsymbol U= \boldsymbol M\boldsymbol U\boldsymbol \Lambda= \boldsymbol V\boldsymbol \Lambda\,. \]

Defective matrix

In most cases the eigenvectors \(\boldsymbol u_i\) will be linearly independent so that they form a basis to span a \(d\) dimensional space.

However, if this is not the case and the matrix \(\boldsymbol A\) does not have a complete basis of eigenvectors, then the matrix is called defective. In this case the matrix \(\boldsymbol U\) containing the eigenvectors is singular and \(\det(\boldsymbol U)=0\).

An example of a defective matrix is \(\begin{pmatrix} 1 &1 \\ 0 & 1 \\ \end{pmatrix}\) which has determinant 1 so that it can be inverted and its column vectors do form a complete basis but has only one distinct eigenvector \((1,0)^T\) so that the eigenvector basis is incomplete.

Eigenvalues of a diagonal or triangular matrix

In the special case that \(\boldsymbol A\) is diagonal or a triangular matrix the eigenvalues are easily determined. This follows from the simple form of their determinants as the product of the diagonal elements. Hence for these matrices the characteristic equation becomes \(\prod_{i}^d (a_{ii} -\lambda) = 0\) and has solution \(\lambda_i=a_{ii}\), i.e. the eigenvalues are equal to the diagonal elements.

Eigenvalues and vectors of a symmetric matrix

If \(\boldsymbol A\) is symmetric, i.e. \(\boldsymbol A= \boldsymbol A^T\), then its eigenvalues and eigenvectors have special properties:

all eigenvalues of \(\boldsymbol A\) are real,
the eigenvectors are orthogonal, i.e \(\boldsymbol u_i^T \boldsymbol u_j = 0\) for \(i \neq j\), and real. Thus, the matrix \(\boldsymbol U\) containing the standardised orthonormal eigenvectors is orthogonal.
\(\boldsymbol A\) is never defective as \(\boldsymbol U\) forms a complete basis.

Furthermore, for a symmetric matrix \(\boldsymbol A\) with diagonal elements \(p_1 \geq \ldots \geq p_d\) and eigenvalues \(\lambda_1 \geq \ldots \geq \lambda_d\) (note both written in decreasing order) the sum of the largest \(k\) eigenvalues forms an upper bound of the sum of the largest \(k\) diagonal elements: \[ \sum_{i}^k \lambda_i \geq \sum_{i}^k p_i \] This theorem is due to Schur (1923) ². The equality holds for \(k=d\) (as the trace of \(\boldsymbol A\) equals the sum of its eigenvalues) and for any \(k\) if \(\boldsymbol A\) is diagonal (as in this case of the diagonal elements equal the eigenvalues).

Eigenvalues of orthogonal matrices

The eigenvalues of an orthogonal matrix \(\boldsymbol Q\) are not necessarily real but they all have modulus 1 and lie on the unit circle . Thus, the eigenvalues of \(\boldsymbol Q\) all have the form \(\lambda = e^{i \phi} = \cos \phi + i \sin \phi\).

In any real matrix complex eigenvalues come in conjugate pairs. Hence if an orthogonal matrix \(\boldsymbol Q\) has the complex eigenvalue \(e^{i \phi}\) it also has an complex eigenvalue \(e^{-i \phi} =\cos \phi - i \sin \phi\). The product of these two conjugate eigenvalues is 1. Thus, an orthogonal matrix of uneven dimension has at least one real eigenvalue (+1 or -1).

The eigenvalues of a Hausholder matrix \(\boldsymbol Q_{HH}(\boldsymbol v)\) are all real (recall that it is symmetric!). In fact, in dimension \(d\) its eigenvalues are -1 (one time) and 1 ( \(d-1\) times). Since a transposition matrix \(\boldsymbol T\) is a special Householder matrix they have the same eigenvalues.

Positive definite matrices

If all eigenvalues of a square matrix \(\boldsymbol A\) are real and \(\lambda_i \geq 0\) then \(\boldsymbol A\) is called positive semi-definite. If all eigenvalues are strictly positive \(\lambda_i > 0\) then \(\boldsymbol A\) is called positive definite.

Note that a matrix does not need to be symmetric to be positive definite, e.g. \(\begin{pmatrix} 2 & 3 \\ 1 & 4 \\ \end{pmatrix}\) has positive eigenvalues 5 and 1. It also has a complete set of eigenvectors and is diagonalisable.

A symmetric matrix \(\boldsymbol A\) is positive definite if the quadratic form \(\boldsymbol x^T \boldsymbol A\boldsymbol x> 0\) for any non-zero \(\boldsymbol x\), and it is positive semi-definite if \(\boldsymbol x^T \boldsymbol A\boldsymbol x\geq 0\). This holds also the other way around: a symmetric positive definite matrix (with positive eigenvalues) has a positive quadratic form, and a symmetric positive semi-definite matrix (with non-negative eigenvalues) a non-negative quadratic form.

A symmetric positive definite matrix always has a positive diagonal (this can be seen by setting \(\boldsymbol x\) above to a unit vector with 1 at a single position, and 0 at all other elements). However, just requiring a positive diagonal is too weak to ensure positive definiteness of a symmetric matrix, for example \(\begin{pmatrix} 1 &10 \\ 10 & 1 \\ \end{pmatrix}\) has a negative eigenvalue of -9. On the other hand, a symmetric matrix is indeed positive definite if it is strictly diagonally dominant, i.e. if all its diagonal elements are positive and are larger than the absolute value of any of the corresponding row or column elements. However, diagonal dominance is too restrictive as criterion to characterise all symmetric positive definite matrices, since there are many symmetric matrices that are positive definite but not diagonally dominant, such as \(\begin{pmatrix} 1 & 2 \\ 2 & 5 \\ \end{pmatrix}\).

Finally, the sum of a symmetric positive semi-definite matrix \(\boldsymbol A\) and a symmetric positive definite matrix \(\boldsymbol B\) is itself symmetric positive definite because the corresponding quadratic form \(\boldsymbol x^T ( \boldsymbol A+\boldsymbol B) \boldsymbol x= \boldsymbol x^T \boldsymbol A\boldsymbol x+ \boldsymbol x^T \boldsymbol B\boldsymbol x> 0\) is positive. Similarly, the sum of two symmetric positive (semi)-definite matrices is itself symmetric positive (semi)-definite.

1.8 Matrix decompositions

Diagonalisation and eigenvalue decomposition

If \(\boldsymbol A\) is a square non-defective matrix then the eigensystem \(\boldsymbol U\) is invertible and we can rewrite the eigenvalue equation to \[\boldsymbol A= \boldsymbol U\boldsymbol \Lambda\boldsymbol U^{-1} \,.\] This is called the eigendecomposition, or spectral decomposition, of \(\boldsymbol A\) and equivalently \[\boldsymbol \Lambda= \boldsymbol U^{-1} \boldsymbol A\boldsymbol U\] is the diagonalisation of \(\boldsymbol A\).

If \(\boldsymbol A\) is defective (i.e. \(\boldsymbol U\) is singular) one can still approximately diagonalise \(\boldsymbol A\) as there always exists a similarity transformation to \(\boldsymbol J= \boldsymbol M\boldsymbol A\boldsymbol M^{-1}\) where \(\boldsymbol M\) is a invertible matrix and \(\boldsymbol J\) has Jordan canonical form, i.e. \(\boldsymbol J\) is upper triangular with the (potentially complex) eigenvalues on the diagonal and some non-zero entries equal to 1 immediately above the main diagonal.

Orthogonal eigenvalue decomposition

For symmetric \(\boldsymbol A\) with real eigenvalues and orthogonal matrix \(\boldsymbol U\) the spectral decomposition becomes \[\boldsymbol A= \boldsymbol U\boldsymbol \Lambda\boldsymbol U^T\] and \[\boldsymbol \Lambda= \boldsymbol U^T \boldsymbol A\boldsymbol U\,.\] This special case is known as the orthogonal diagonalisation of \(\boldsymbol A\).

The orthogonal decomposition for symmetric \(\boldsymbol A\) is unique apart from the signs of the eigenvectors (columns of \(\boldsymbol U\)). Thus, in a computer application depending on the specific implementation of a numerical algorithm for eigenvalue decomposition the signs may vary.

Singular value decomposition

The singular value decomposition (SVD) is a generalisation of the orthogonal eigenvalue decomposition for symmetric matrices.

Any (!) rectangular matrix \(\boldsymbol A\) of size \(n\times d\) can be factored into the product \[\boldsymbol A= \boldsymbol U\boldsymbol D\boldsymbol V^T\] where \(\boldsymbol U\) is a \(n \times n\) orthogonal matrix, \(\boldsymbol V\) is a second \(d \times d\) orthogonal matrix and \(\boldsymbol D\) is a diagonal but rectangular matrix of size \(n\times d\) with \(m=min(n,d)\) real diagonal elements \(d_1, \ldots d_m\). The \(d_i\) are called singular values, and appear along the diagonal in \(\boldsymbol D\) by order of magnitude.

The SVD is unique apart from the signs of the columns vectors in \(\boldsymbol U\), \(\boldsymbol V\) and \(\boldsymbol D\) (you can freely specify the column signs of any two of the three matrices). By convention the signs are chosen such that the singular values in \(\boldsymbol D\) are all non-negative, which leaves ambiguity in columns signs of \(\boldsymbol U\) and \(\boldsymbol V\). Alternatively, one may fix the columns signs of \(\boldsymbol U\) and \(\boldsymbol V\), e.g. by requiring a positive diagonal, which then determines the sign of the singular values (thus allowing for negative singular values as well).

If \(\boldsymbol A\) is symmetric then the SVD and the orthogonal eigenvalue decomposition coincide (apart from different sign conventions for singular values, eigenvalues and eigenvectors).

Since \(\boldsymbol A^T \boldsymbol A= \boldsymbol V\boldsymbol D^T \boldsymbol D\boldsymbol V^T\) and \(\boldsymbol A\boldsymbol A^T = \boldsymbol U\boldsymbol D\boldsymbol D^T \boldsymbol U^T\) the squared singular values correspond to the eigenvalues of \(\boldsymbol A^T \boldsymbol A\) and \(\boldsymbol A\boldsymbol A^T\). It also follows that \(\boldsymbol A^T \boldsymbol A\) and \(\boldsymbol A\boldsymbol A^T\) are both positive semi-definite symmetric matrices, and that \(\boldsymbol V\) and \(\boldsymbol U\) contain the respective sets of eigenvectors.

Polar decomposition

Any square matrix \(\boldsymbol A\) can be factored into the product \[ \boldsymbol A= \boldsymbol Q\boldsymbol B \] of an orthogonal matrix \(\boldsymbol Q\) and a symmetric positive semi-definite matrix \(\boldsymbol B\).

This follows from the SVD of \(\boldsymbol A\) given as \[ \begin{split} \boldsymbol A&= \boldsymbol U\boldsymbol D\boldsymbol V^T \\ &= ( \boldsymbol U\boldsymbol V^T ) ( \boldsymbol V\boldsymbol D\boldsymbol V^T ) \\ &= \boldsymbol Q\boldsymbol B\\ \end{split} \] with non-negative \(\boldsymbol D\). Note that this decomposition is unique as the sign ambiguities in the columns of \(\boldsymbol U\) and \(\boldsymbol V\) cancel out in \(\boldsymbol Q\) and \(\boldsymbol B\).

Cholesky decomposition

A symmetric positive definite matrix \(\boldsymbol A\) can be decomposed into a product of a triangular matrix \(\boldsymbol L\) with its transpose \[ \boldsymbol A= \boldsymbol L\boldsymbol L^T \,. \] Here, \(\boldsymbol L\) is a lower triangular matrix with positive diagonal elements.

This decomposition is unique and is called Cholesky factorisation. It is often used to check whether a symmetric matrix is positive definite as it is algorithmically less demanding than eigenvalue decomposition.

Note that some implementations of the Cholesky decomposition (e.g. in R) use upper triangular matrices \(\boldsymbol K\) with positive diagonal so that \(\boldsymbol A= \boldsymbol K^T \boldsymbol K\) and \(\boldsymbol L= \boldsymbol K^T\).

1.9 Matrix summaries based on eigenvalues and singular values

Trace and determinant computed from eigenvalues

The eigendecomposition \(\boldsymbol A=\boldsymbol U\boldsymbol \Lambda\boldsymbol U^{-1}\) allows to establish a link between the trace and the determinant and the eigenvalues of a matrix.

Specifically, \[ \begin{split} \text{Tr}(\boldsymbol A) & = \text{Tr}(\boldsymbol U\boldsymbol \Lambda\boldsymbol U^{-1} ) = \text{Tr}( \boldsymbol \Lambda\boldsymbol U^{-1} \boldsymbol U) \\ &= \text{Tr}( \boldsymbol \Lambda) = \sum_{i=1}^d \lambda_i \\ \end{split} \] thus the trace of a square matrix \(\boldsymbol A\) is equal to the sum of its eigenvalues. Likewise, \[ \begin{split} \det(\boldsymbol A) & = \det(\boldsymbol U) \det(\boldsymbol \Lambda) \det(\boldsymbol U^{-1} ) \\ &=\det( \boldsymbol \Lambda) = \prod_{i=1}^d \lambda_i \\ \end{split} \] therefore the determinant of \(\boldsymbol A\) is the product of the eigenvalues.

The relationship between the eigenvalues of a square matrix and the trace and the determinant of that matrix is shown above for diagonalisable matrices. However, it holds more generally for any square matrix, i.e. also for defective matrices. For the latter the Jordan canonical form \(\boldsymbol J\) replaces \(\boldsymbol \Lambda\) (in both cases the eigenvalues are simply the entries on the diagonal).

If any of the eigenvalues are equal to zero then \(\det(\boldsymbol A) = 0\) and as hence \(\boldsymbol A\) is singular and not invertible.

The trace and determinant of a real matrix are always real even though the individual eigenvalues may be complex.

Eigenvalues of a squared matrix

From the eigendecomposition \(\boldsymbol A=\boldsymbol U\boldsymbol \Lambda\boldsymbol U^{-1}\) it is easy to see that the eigenvalues of \(\boldsymbol A^2\) are simply the squared eigenvalues of \(\boldsymbol A\) as \[ \boldsymbol A^2 = \boldsymbol U\boldsymbol \Lambda\boldsymbol U^{-1} \boldsymbol U\boldsymbol \Lambda\boldsymbol U^{-1} = \boldsymbol U\boldsymbol \Lambda^2 \boldsymbol U^{-1} \] As a result we can compute the trace of \(\boldsymbol A^2\) as the sum of the squared eigenvalues of \(\boldsymbol A\), i.e. \(\text{Tr}(\boldsymbol A^2) = \sum_{i=1}^d \lambda_i^2\), and the determinant as the product of squared eigenvalues, i.e \(\det(\boldsymbol A^2) = \prod_{i=1}^d \lambda_i^2\).

If \(\boldsymbol A\) is symmetric then \(\text{Tr}(\boldsymbol A^2) = \text{Tr}(\boldsymbol A\boldsymbol A^T) = || A ||^2_F = \sum_{i=1}^d \sum_{j=1}^d a_{ij}^2\). This leads to the identity \[ \sum_{i=1}^d \lambda_i^2 = \sum_{i=1}^d \sum_{j=1}^d a_{ij}^2 \] between the sum of the squared eigenvalues and the sum of all squared entries of a symmetric matrix \(\boldsymbol A\).

Rank and condition number

The rank is the dimension of the space spanned by both the column and row vectors. A rectangular matrix of dimension \(n \times d\) will have rank of at most \(m = \min(n, d)\), and if the maximum is indeed achieved then it has full rank.

The condition number describes how well- or ill-conditioned a full rank matrix is. For example, for a square matrix a large condition number implies that the matrix is close to being singular and thus ill-conditioned. If the condition number is infinite then the matrix is not full rank.

The rank and condition of a matrix can both be determined from the \(m\) singular values \(d_1, \ldots, d_m\) of a matrix obtained by SVD:

The rank is number of non-zero singular values.
The condition number is the ratio of the largest singular value divided by the smallest singular value (absolute values if signs are allowed).

If a square matrix \(\boldsymbol A\) is singular then the condition number is infinite, and it will not have full rank. On the other hand, a non-singular square matrix, such as a positive definite matrix, has full rank.

1.10 Functions of symmetric matrices

We focus on symmetric square matrices \(\boldsymbol A=\boldsymbol U\boldsymbol \Lambda\boldsymbol U^T\) which are always diagonalisable with real eigenvalues \(\boldsymbol \Lambda\) and orthogonal eigenvectors \(\boldsymbol U\).

Definition of a matrix function

Assume a real-valued function \(f(a)\) of a real number \(a\). Then the corresponding matrix function \(f(\boldsymbol A)\) is defined as \[ f(\boldsymbol A) = \boldsymbol Uf(\boldsymbol \Lambda) \boldsymbol U^T = \boldsymbol U\begin{pmatrix} f(\lambda_{1}) & \dots & 0\\ \vdots & \ddots & \vdots \\ 0 & \dots & f(\lambda_{d}) \end{pmatrix} \boldsymbol U^T \] where the function \(f(a)\) is applied to the eigenvalues of \(\boldsymbol A\). By construction \(f(\boldsymbol A)\) is real, symmetric and has real eigenvalues \(f(\lambda_i)\).

Examples:

Example 1.2 Matrix power: \(f(a) = a^p\) (with \(p\) a real number)

Special cases of matrix power include :

Matrix inversion: \(f(a) = a^{-1}\)
Note that if the matrix \(\boldsymbol A\) is singular, i.e. contains one or more eigenvalues \(\lambda_i=0\), then \(\boldsymbol A^{-1}\) is not defined and therefore \(\boldsymbol A\) is not invertible.

However, a so-called pseudoinverse can still be computed, by inverting the non-zero eigenvalues, and keeping the zero eigenvalues as zero.

Matrix square root: \(f(a) = a^{1/2}\)
Since there are multiple solutions to the square root there are also multiple matrix square roots. The principal matrix square root is obtained by using the positive square roots of all the eigenvalues. Thus the principal matrix square root of a positive semi-definite matrix is also positive semi-definite and it is unique.

Example 1.3 Matrix exponential: \(f(a) = \exp(a)\)
Note that because \(\exp(a) \geq 0\) for all real \(a\) the matrix \(\exp(\boldsymbol A)\) is positive semi-definite. Thus, the matrix exponential can be used to generate positive semi-definite matrices.

If \(\boldsymbol A\) and \(\boldsymbol B\) commute, i.e. if \(\boldsymbol A\boldsymbol B= \boldsymbol B\boldsymbol A\), then \(\exp(\boldsymbol A+\boldsymbol B) = \exp(\boldsymbol A) \exp(\boldsymbol B)\). However, this is not the case otherwise!

Example 1.4 Matrix logarithm: \(f(a) = \log(a)\)
As the logarithm requires \(a >0\) the matrix \(\boldsymbol A\) needs to be positive definite for \(\log(\boldsymbol A)\) to be defined.

Identities for the matrix exponential and logarithm

The above give rise to useful identities:

For any symmetric matrix \(\boldsymbol A\) we have \[ \det(\exp(\boldsymbol A)) = \exp(\text{Tr}(\boldsymbol A)) \] because \(\prod_i \exp(\lambda_i) = \exp( \sum_i \lambda_i)\) where \(\lambda_i\) are the eigenvalues of \(\boldsymbol A\).
If we take the logarithm on both sides and replace \(\exp(\boldsymbol A)=\boldsymbol B\) we get another identity for a symmetric positive definite matrix \(\boldsymbol B\): \[ \log \det(\boldsymbol B) = \text{Tr}(\log(\boldsymbol B)) \] because \(\log( \prod_i \lambda_i) = \sum_i \log(\lambda_i)\) where \(\lambda_i\) are the eigenvalues of \(\boldsymbol B\).

Mardia, K. V., J. T. Kent and J. M. Bibby. 1979. Multivariate Analysis. Academic Press.↩︎
Schur, I. 1923. Über eine Klasse von Mittelbildungen mit Anwendungen auf die Determinantentheorie. Sitzungsber. Berl. Math. Ges. 22:9–29.↩︎