A Brief refresher on matrices

This is intended a very short recap of some essentials you need to know about matrices. We will frequently make use of matrix calculations. Matrix notation helps to make multivariate equations simpler and to understand them better.

For more details please consult the lecture notes of earlier modules (e.g. linear algebra).

In this course we mostly work with real matrices, i.e. we assume all matrix elements are real numbers. However, one important matrix decomposition — the eigenvalues decomposition — can yield complex-valued matrices when applied to real matrices. Below we will point out when this is the case.

A.1 Matrix basics

A.1.1 Matrix notation

In matrix notation we distinguish between scalars, vectors, and matrices:

Scalar: \(x\), \(X\), lower or upper case, plain type.

Vector: \(\boldsymbol x\), lower case, bold type. In handwriting an arrow \(\vec{x}\) indicates a vector.

In component notation we write \(\boldsymbol x= (x_1, \ldots, x_d)^T = (x_i)^T\). By convention, a vector is a column vector, i.e. the elements are arranged in a column and the index (here \(i\)) refers to the row of the column. If you transpose a column vector it becomes a row vector \(\boldsymbol x^T = (x_1, \ldots, x_d) =(x_i)\) and the index now refers to the column.

Matrix: \(\boldsymbol X\), upper case, bold type. In handwriting an underscore \(\underline{X}\) indicates a matrix.

In component notation we write \(\boldsymbol X= (x_{ij})\). By convention, the first index (here \(i\)) of the scalar elements \(x_{ij}\) denotes the row and the second index (here \(j\)) the column of the matrix. Assuming that \(n\) is the number of rows and \(d\) is the number of columns you can also view the matrix \(\boldsymbol X= (\boldsymbol x_j) = (\boldsymbol z_i)^T\) as being composed of column vectors \(\boldsymbol x_j = (x_{1j}, \ldots, x_{nj})^T\) or of row vectors \(\boldsymbol z_i^T = (x_{i1}, \ldots, x_{id})\).

A (column) vector is a matrix of size \(d\times 1\). A row vector is a matrix of size \(1\times d\). A scalar is a matrix of size \(1 \times 1\).

A.1.2 Random matrix

A random matrix (vector) is a matrix (vector) whose elements are random variables.

Note that the standard notation used in univariate statistics to distinguish random variables and their realisations (i.e. upper versus lower case) does not work in multivariate statistics. Therefore, you need to determine from the context whether a quantity represents a random variable, or whether it is a constant.

A.1.3 Special matrices

\(\boldsymbol I_d\) is the identity matrix. It is a square matrix of size \(d \times d\) with the diagonal filled with 1 and off-diagonals filled with 0. \[\boldsymbol I_d = \begin{pmatrix} 1 & 0 & 0 & \dots & 0\\ 0 & 1 & 0 & \dots & 0\\ 0 & 0 & 1 & & 0\\ \vdots & \vdots & & \ddots & \\ 0 & 0 & 0 & & 1 \\ \end{pmatrix}\]

\(\boldsymbol 1\) is a matrix that contains only 1s. Most often it is used in the form of a column vector with \(d\) rows: \[\boldsymbol 1_d = \begin{pmatrix} 1 \\ 1 \\ 1 \\ \vdots \\ 1 \\ \end{pmatrix}\]

A diagonal matrix is a matrix where all off-diagonal elements are zero.

A triangular matrix is a square matrix whose elements either below or above the diagonal are all zero (upper vs. lower triangular matrix).

A.2 Simple matrix operations

A.2.1 Matrix addition and multiplication

Matrices behave much like common numbers. For example, there exist matrix addition \(\boldsymbol C= \boldsymbol A+ \boldsymbol B\) and matrix multiplication \(\boldsymbol C= \boldsymbol A\boldsymbol B\).

Matrix addition is simply the result of the addition of the corresponding elements in \(\boldsymbol A\) and \(\boldsymbol B\), i.e \(c_{ij} = a_{ij} + b_{ij}\). For matrix addition \(\boldsymbol A\) and \(\boldsymbol B\) must have the same size, i.e. the same number of rows and columns.

The dot product between two vectors is \(\boldsymbol a\cdot \boldsymbol b= \boldsymbol a^T \boldsymbol b= \boldsymbol a\boldsymbol b^T = \sum_{i=1}^d a_{i} b_{i}\).

Matrix multiplication is defined as \(c_{ij} = \sum_{k=1}^m a_{ik} b_{kj}\) where \(m\) is the number of columns of \(\boldsymbol A\) and the number of rows in \(\boldsymbol B\). Thus, \(\boldsymbol C\) contains all possible dot products of the row vectors in \(\boldsymbol A\) with the column vectors in \(\boldsymbol B\). For matrix multiplication the number of columns in \(\boldsymbol A\) must match the number of rows in \(\boldsymbol B\). Note that matrix multiplication is in general not commutative, i.e. \(\boldsymbol A\boldsymbol B\neq \boldsymbol B\boldsymbol A\).

A.2.2 Matrix transpose

The matrix transpose \(t(\boldsymbol A) = \boldsymbol A^T\) interchanges rows and columns. The transpose is a linear operator \((\boldsymbol A+ \boldsymbol B)^T = \boldsymbol A^T + \boldsymbol B^T\) and applied to a matrix product it reverses the ordering, i.e. \((\boldsymbol A\boldsymbol B)^T =\boldsymbol B^T \boldsymbol A^T\).

If \(\boldsymbol A= \boldsymbol A^T\) then \(\boldsymbol A\) is symmetric (and square).

By construction given a rectangular \(\boldsymbol A\) the matrices \(\boldsymbol A^T \boldsymbol A\) and \(\boldsymbol A\boldsymbol A^T\) are symmetric with non-negative diagonal.

A.3 Matrix summaries

A.3.1 Matrix trace

The trace of the matrix is the sum of the diagonal elements \(\text{Tr}(\boldsymbol A) = \sum a_{ii}\).

A useful identity for the matrix trace is \[ \text{Tr}(\boldsymbol A\boldsymbol B) = \text{Tr}( \boldsymbol B\boldsymbol A) \] with for two vectors becomes \[ \boldsymbol a^T \boldsymbol b= \text{Tr}( \boldsymbol b\boldsymbol a^T) \,. \]

The squared Frobenius norm, i.e. the sum of the squares of all entries of a rectangular matrix \(\boldsymbol A=(a_{ij})\), can be written using the trace as follows:
\[ \begin{split} ||\boldsymbol A||_F^2 &= \sum_{i,j} a_{ij}^2 \\ &= \text{Tr}(\boldsymbol A^T \boldsymbol A) = \text{Tr}(\boldsymbol A\boldsymbol A^T) \,. \end{split} \]

A.3.2 Determinant of a matrix

If \(\boldsymbol A\) is a square matrix the determinant \(\det(\boldsymbol A)\) is a scalar measuring the volume spanned by the column vectors in \(\boldsymbol A\) with the sign determined by the orientation of the vectors.

If \(\det(\boldsymbol A) \neq 0\) the matrix \(\boldsymbol A\) is non-singular or non-degenerate. Conversely, if \(\det(\boldsymbol A) =0\) the matrix \(\boldsymbol A\) is singular or degenerate.

One way to compute the determinant of a matrix \(\boldsymbol A\) is the Laplace cofactor expansion approach that proceeds recursively based on the determinants of the submatrices \(\boldsymbol A_{-i,-j}\) obtained by deleting row \(i\) and column \(j\) from \(\boldsymbol A\). Specifically, at each level we compute the

  1. cofactor expansion either
    1. along the \(i\)-th row — pick any row \(i\): \[\det(\boldsymbol A) = \sum_{j=1}^d a_{ij} (-1)^{i+j} \det(\boldsymbol A_{-i,-j}) \text{ , or}\]
    2. along the \(j\)-th column — pick any \(j\): \[\det(\boldsymbol A) = \sum_{i=1}^d a_{ij} (-1)^{i+j} \det(\boldsymbol A_{-i,-j})\].
  2. Then repeat until the submatrix is a scalar \(a\) and \(\det(a)=a \,.\)

The recursive nature of this algorithm leads to a complexity of order \(O(d!)\) so it is not practical except for very small \(d\). Therefore, in practice other more efficient algorithms for computing determinants are used but these still have algorithmic complexity in the order of \(O(d^3)\) so for large dimensions obtaining determinants is very expensive.

However, some specially structured matrices do allow for very fast calculation. In particular, it turns out that the determinant of a triangular matrix (which includes diagonal matrices) is simply the product of the diagonal elements.

For a two-dimensional matrix \(\boldsymbol A= \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\\end{pmatrix}\) the determinant is \(\det(A) = a_{11} a_{22} - a_{12} a_{21}\).

For a block-structured square matrix \[ \boldsymbol A= \begin{pmatrix} \boldsymbol A_{11} & \boldsymbol A_{12} \\ \boldsymbol A_{21} & \boldsymbol A_{22} \\ \end{pmatrix} \, , \] where the matrices on the diagonal \(\boldsymbol A_{11}\) and \(\boldsymbol A_{22}\) are themselves square but \(\boldsymbol A_{21}\) and \(\boldsymbol A_{21}\) can have any shape, the determinant is \[ \det(\boldsymbol A) = \det(\boldsymbol A_{22}) \det(\boldsymbol C_1) = \det(\boldsymbol A_{11}) \det(\boldsymbol C_2) \] with the (Schur complement of \(\boldsymbol A_{22}\)) \[ \boldsymbol C_1 = \boldsymbol A_{11} - \boldsymbol A_{12} \boldsymbol A_{22}^{-1} \boldsymbol A_{21} \] and (Schur complement of \(\boldsymbol A_{11}\)) \[ \boldsymbol C_2 = \boldsymbol A_{22} - \boldsymbol A_{21} \boldsymbol A_{11}^{-1} \boldsymbol A_{12} \] Note that \(\boldsymbol C_1\) and \(\boldsymbol C_2\) are square matrices.

For a block-diagonal matrix \(\boldsymbol A\) with \(\boldsymbol A_{12} = 0\) and \(\boldsymbol A_{21} = 0\) the determinant is \(\det(\boldsymbol A) = \det(\boldsymbol A_{11}) \det(\boldsymbol A_{22})\).

Determinants have a multiplicative property, \[\det(\boldsymbol A\boldsymbol B) = \det(\boldsymbol B\boldsymbol A) = \det(\boldsymbol A) \det(\boldsymbol B) \,.\] For scalar \(a\) this becomes \(\det(a \boldsymbol B) = a^d \det(\boldsymbol B)\) where \(d\) is the dimension of \(\boldsymbol B\).

Another important identity is \[\det(\boldsymbol I_n + \boldsymbol A\boldsymbol B) = \det(\boldsymbol I_m + \boldsymbol B\boldsymbol A)\] where \(\boldsymbol A\) is a \(n \times m\) and \(\boldsymbol B\) is a \(m \times n\) matrix. This is called the Weinstein-Aronszajn determinant identity (also credited to Sylvester).

A.4 Matrix inverse

A.4.1 Inversion of square matrix

If \(\boldsymbol A\) is a square matrix then the inverse matrix \(\boldsymbol A^{-1}\) is a matrix such that \[\boldsymbol A^{-1} \boldsymbol A= \boldsymbol A\boldsymbol A^{-1}= \boldsymbol I\, .\] Only non-singular matrices with \(\det(\boldsymbol A) \neq 0\) are invertible.

As \(\det(\boldsymbol A^{-1} \boldsymbol A) = \det(\boldsymbol I) = 1\) the determinant of the inverse matrix equals the inverse determinant, \[\det(\boldsymbol A^{-1}) = \det(\boldsymbol A)^{-1} \,.\]

The transpose of the inverse is the inverse of the transpose as \[ \begin{split} (\boldsymbol A^{-1})^T &= (\boldsymbol A^{-1})^T \, \boldsymbol A^T (\boldsymbol A^{T})^{-1} \\ &= (\boldsymbol A\boldsymbol A^{-1})^T \, (\boldsymbol A^{T})^{-1} = (\boldsymbol A^{T})^{-1} \,. \\ \end{split} \]

The inverse of a matrix product \((\boldsymbol A\boldsymbol B)^{-1} = \boldsymbol B^{-1} \boldsymbol A^{-1}\) is the product of the indivdual matrix inverses in reverse order.

There are many different algorithms to compute the inverse of a matrix (which is essentially a problem of solving a system of equations). The computational complexity of matrix inversion is of the order \(O(d^3)\) where \(d\) is the dimension of \(\boldsymbol A\). Therefore matrix inversion is very costly in higher dimensions.

Example A.1 Inversion of a \(2 \times 2\) matrix:

The inverse of the matrix \(A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}\) is \(A^{-1} = \frac{1}{ad-bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}\)

A.4.2 Inversion of structured matrices

However, for specially structured matrices inversion can be done effectively:

  • The inverse of a diagonal matrix is another diagonal matrix obtained by inverting the diagonal elements.
  • More generally, the inverse of a block-diagonal matrix is obtained by individually inverting the blocks along the diagonal.

The Woodbury matrix identity simplifies the inversion of matrices that can be written as \(\boldsymbol A+ \boldsymbol U\boldsymbol B\boldsymbol V\) where \(\boldsymbol A\) and \(\boldsymbol B\) are both square and \(\boldsymbol U\) and \(\boldsymbol V\) are suitable rectangular matrices: \[ (\boldsymbol A+ \boldsymbol U\boldsymbol B\boldsymbol V)^{-1} = \boldsymbol A^{-1} - \boldsymbol A^{-1} \boldsymbol U(\boldsymbol B^{-1} + \boldsymbol V\boldsymbol A^{-1} \boldsymbol U)^{-1} \boldsymbol V\boldsymbol A^{-1} \] Typically, the inverse \(\boldsymbol A^{-1}\) is either already known or can be easily obtained and the dimension of \(\boldsymbol B\) is much lower than that of \(\boldsymbol A\).

The class of matrices that can be most easily inverted are orthogonal matrices whose inverse is obtained simply by transposing the matrix.

A.5 Orthogonal matrices

A.5.1 Properties

An orthogonal matrix \(\boldsymbol Q\) is a square matrix with the property that \(\boldsymbol Q^T = \boldsymbol Q^{-1}\), i.e. the transpose is also the inverse. This implies that \(\boldsymbol Q\boldsymbol Q^T = \boldsymbol Q^T \boldsymbol Q= \boldsymbol I\).

The identity matrix \(\boldsymbol I\) is the simplest example of an orthogonal matrix.

An orthogonal matrix \(\boldsymbol Q\) can be interpreted geometrically as an operator performing rotation, reflection and/or permutation. Multiplication of \(\boldsymbol Q\) with a vector will result in a new vector of the same length but with a change in direction (unless \(\boldsymbol Q=\boldsymbol I\)).

The product \(\boldsymbol Q_3 = \boldsymbol Q_1 \boldsymbol Q_2\) of two orthogonal matrices \(\boldsymbol Q_1\) and \(\boldsymbol Q_2\) yields another orthogonal matrix as \(\boldsymbol Q_3 \boldsymbol Q_3^T = \boldsymbol Q_1 \boldsymbol Q_2 (\boldsymbol Q_1 \boldsymbol Q_2)^T = \boldsymbol Q_1 \boldsymbol Q_2 \boldsymbol Q_2^T \boldsymbol Q_1^T = \boldsymbol I\).

The determinant \(\det(\boldsymbol Q)\) of an orthogonal matrix is either +1 or -1, because \(\boldsymbol Q\boldsymbol Q^T = \boldsymbol I\) and thus \(\det(\boldsymbol Q)\det(\boldsymbol Q^T) = \det(\boldsymbol Q)^2 = \det(\boldsymbol I) = 1\).

The set of all orthogonal matrices of dimension \(d\) together with multiplication form a group called the orthogonal group \(O(d)\). The subset of orthogonal matrices with \(\det(\boldsymbol Q)=1\) are called rotation matrices and form with multiplication the special orthogonal group \(SO(d)\). Orthogonal matrices with \(\det(\boldsymbol Q)=-1\) are rotation-reflection matrices.

A.5.2 Generating orthogonal matrices

In two dimensions \((d=2)\) all orthogonal matrices \(\boldsymbol R\) representing rotations with \(\det(\boldsymbol R)=1\) are given by \[ \boldsymbol R(\theta) = \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix} \] and those representing rotation-reflections \(\boldsymbol G\) with \(\det(\boldsymbol G)=-1\) by \[ \boldsymbol G(\theta) = \begin{pmatrix} \cos \theta & \sin \theta \\ \sin \theta & -\cos \theta \end{pmatrix}\,. \] Every orthogonal matrix of dimension \(d=2\) can be represented as the product of at most two rotation-reflection matrices because \[ \boldsymbol R(\theta) = \boldsymbol G(\theta)\, \boldsymbol G(0) = \begin{pmatrix} \cos \theta & \sin \theta \\ \sin \theta & -\cos \theta \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}\,. \] Thus, the matrix \(\boldsymbol G\) is a generator of two-dimensional orthogonal matrices. Note that \(\boldsymbol G(\theta)\) is symmetric, orthogonal and has determinant -1.

More generally, and applicable in arbitrary dimension, the role of generator is taken by the Householder reflection matrix \[ \boldsymbol Q_{HH}(\boldsymbol v) = \boldsymbol I- 2 \boldsymbol v\boldsymbol v^T \] where \(\boldsymbol v\) is a vector of unit length (with \(\boldsymbol v^T \boldsymbol v=1\)) orthogonal to the reflection hyperplane. Note that \(\boldsymbol Q_{HH}(\boldsymbol v) = \boldsymbol Q_{HH}(-\boldsymbol v)\). By construction the matrix \(\boldsymbol Q_{HH}(\boldsymbol v)\) is symmetric, orthogonal and has determinant -1.

It can be shown that any \(d\)-dimensional orthogonal matrix \(\boldsymbol Q\) can be represented as the product of at most \(d\) Householder reflection matrices. The two-dimensional generator \(\boldsymbol G(\theta)\) is recovered as the Householder matrix \(\boldsymbol Q_{HH}(\boldsymbol v)\) with \(\boldsymbol v= \begin{pmatrix} -\sin \frac{\theta}{2} \\ \cos \frac{\theta}{2} \end{pmatrix}\) or \(\boldsymbol v= \begin{pmatrix} \sin \frac{\theta}{2} \\ -\cos \frac{\theta}{2} \end{pmatrix}\).

A.5.3 Permutation matrix

A special type of an orthogonal matrix is a permutation matrix \(\boldsymbol P\) created by permuting rows and/or columns of the identity matrix \(\boldsymbol I\). Thus, each row and column of \(\boldsymbol P\) contains exactly one entry of 1, but not necessarily on the diagonal.

If a permutation matrix \(\boldsymbol P\) is multiplied with a matrix \(\boldsymbol A\) it acts as an operator permuting the columns (\(\boldsymbol A\boldsymbol P\)) or the rows (\(\boldsymbol P\boldsymbol A\)). For a set of \(d\) elements there exist \(d!\) permutations. Thus, for dimension \(d\) there are \(d!\) possible permutation matrices (including the identity matrix).

The determinant of a permutation matrix is either +1 or -1. The product of two permutation matrices yields another permutation matrix.

Symmetric permutation matrices correspond to self-inverse permutations (i.e. the permutation matrix is its own inverse), and are also called permutation involutions. They can have determinant +1 and -1.

A transposition is a permutation where only two elements are exchanged. Thus, in a transposition matrix \(\boldsymbol T\) exactly two rows and/or columns are exchanged compared to identity matrix \(\boldsymbol I\). Transpositions are self-inverse, and transposition matrices are symmetric. There are \(\frac{d (d-1)}{2}\) different transposition matrices. The determinant of a transposition matrix is \(\det(\boldsymbol T)= -1\).

Note that the transposition matrix is an instance of a Householder matrix \(\boldsymbol Q_{HH}(\boldsymbol v)\) with vector \(\boldsymbol v\) filled with zeros except for two elements that have value \(\frac{\sqrt{2}}{2}\) and \(-\frac{\sqrt{2}}{2}\).

Any permutation of \(d\) elements can be generated by a series of at most \(d-1\) transpositions. Correspondingly, any permutation matrix \(\boldsymbol P\) can be constructed by multiplication of the identity matrix with at most \(d-1\) transposition matrices. If the number of transpositions is even then \(\det(\boldsymbol P) = 1\) otherwise for an uneven number \(\det(\boldsymbol P) = -1\). This is called the sign or signature of the permutation.

The set of all permutations form the symmetric group \(S_d\), the subset of even permutations (with positive sign and \(\det(\boldsymbol P)=1\)) the alternating group \(A_d\).

A.6 Eigenvalues and eigenvectors

A.6.1 Definition

Assume a square symmetric matrix \(\boldsymbol A\) of size \(d \times d\). A vector \(\boldsymbol u\neq 0\) is called an eigenvector of the matrix \(\boldsymbol A\) and \(\lambda\) the corresponding eigenvalue if
\[\boldsymbol A\boldsymbol u= \boldsymbol u\lambda \, .\]

A.6.2 Finding eigenvalues and vectors

To find the eigenvalues and eigenvector the eigenequation is rewritten as \[(\boldsymbol A-\boldsymbol I\lambda ) \boldsymbol u= 0 \,.\] For any solution \(\boldsymbol u\neq 0\) the corresponding eigenvalue \(\lambda\) must make the matrix \(\boldsymbol A-\boldsymbol I\lambda\) singular, i.e. the determinant must vanish \[\det(\boldsymbol A-\boldsymbol I\lambda ) =0 \,.\] This is the characteristic equation of the matrix \(\boldsymbol A\), and its solution yields \(d\) not necessarily distinct and also potentially complex eigenvalues \(\lambda_1, \ldots, \lambda_d\).

If there are complex eigenvalues, for a real matrix those eigenvalues come in conjugate pairs. Specifically, for a complex \(\lambda_1 = r e^{i \phi}\) there will also be a corresponding complex eigenvalue \(\lambda_2 = r e^{-i \phi}\).

Given the eigenvalues we then solve the eigenequation for the corresponding non-zero eigenvectors \(\boldsymbol u_1, \ldots, \boldsymbol u_d\). Note that eigenvectors of real matrices can have complex components. Also the eigenvector is only defined by the eigenequation up to a scalar. By convention eigenvectors are therefore typically standardised to unit length but this still leaves a sign ambiguity for real eigenvectors and implies that complex eigenvectors are defined only up to a factor with modulus 1.

A.6.3 Eigenequation in matrix notation

With the matrix \[\boldsymbol U= (\boldsymbol u_1, \ldots, \boldsymbol u_d)\] containing the standardised eigenvectors in the columns and the diagonal matrix \[\boldsymbol \Lambda= \begin{pmatrix} \lambda_{1} & \dots & 0\\ \vdots & \ddots & \vdots \\ 0 & \dots & \lambda_{d} \end{pmatrix}\] containing the eigenvalues (typically sorted in order of magnitude) the eigenvalue equation can be written as \[\boldsymbol A\boldsymbol U= \boldsymbol U\boldsymbol \Lambda\,.\]

Note that if eigenvalues are not in order, we can alwas apply a permutation matrix \(\boldsymbol P\) to sort them in order, such that \(\boldsymbol U' = \boldsymbol U\boldsymbol P\) reorders the eigenvectors and \(\boldsymbol \Lambda' = \boldsymbol P^T \boldsymbol \Lambda\boldsymbol P\) the eigenvalues, with \[\boldsymbol A\boldsymbol U' = \boldsymbol A\boldsymbol U\boldsymbol P= \boldsymbol U\boldsymbol \Lambda\boldsymbol P= \boldsymbol U\boldsymbol P\boldsymbol P^T \boldsymbol \Lambda\boldsymbol P= \boldsymbol U' \boldsymbol \Lambda' \,.\]

A.6.4 Defective matrix

In most cases the eigenvectors \(\boldsymbol u_i\) will be linearly independent so that they form a basis to span a \(d\) dimensional space.

However, if this is not the case and the matrix \(\boldsymbol A\) does not have a complete basis of eigenvectors, then the matrix is called defective. In this case the matrix \(\boldsymbol U\) containing the eigenvectors is singular and \(\det(\boldsymbol U)=0\).

An example of a defective matrix is \(\begin{pmatrix} 1 &1 \\ 0 & 1 \\ \end{pmatrix}\) which has determinant 1 so that it can be inverted and its column vectors do form a complete basis but has only one distinct eigenvector \((1,0)^T\) so that the eigenvector basis is incomplete.

A.6.5 Eigenvalues of a diagonal or triangular matrix

In the special case that \(\boldsymbol A\) is diagonal or a triangular matrix the eigenvalues are easily determined. This follows from the simple form of their determinants as the product of the diagonal elements. Hence for these matrices the characteristic equation becomes \(\prod_{i}^d (a_{ii} -\lambda) = 0\) and has solution \(\lambda_i=a_{ii}\), i.e. the eigenvalues are equal to the diagonal elements.

A.6.6 Eigenvalues and vectors of a symmetric matrix

If \(\boldsymbol A\) is symmetric, i.e. \(\boldsymbol A= \boldsymbol A^T\), then its eigenvalues and eigenvectors have special properties:

  1. all eigenvalues of \(\boldsymbol A\) are real,
  2. the eigenvectors are orthogonal, i.e \(\boldsymbol u_i^T \boldsymbol u_j = 0\) for \(i \neq j\), and real. Thus, the matrix \(\boldsymbol U\) containing the standardised orthonormal eigenvectors is orthogonal.
  3. \(\boldsymbol A\) is never defective as \(\boldsymbol U\) forms a complete basis.

A.6.7 Eigenvalues of orthogonal matrices

The eigenvalues of an orthogonal matrix \(\boldsymbol Q\) are not necessarily real but they all have modulus 1 and lie on the unit circle . Thus, the eigenvalues of \(\boldsymbol Q\) all have the form \(\lambda = e^{i \phi} = \cos \phi + i \sin \phi\).

In any real matrix complex eigenvalues come in conjugate pairs. Hence if an orthogonal matrix \(\boldsymbol Q\) has the complex eigenvalue \(e^{i \phi}\) it also has an complex eigenvalue \(e^{-i \phi} =\cos \phi - i \sin \phi\). The product of these two conjugate eigenvalues is 1. Thus, an orthogonal matrix of uneven dimension has at least one real eigenvalue (+1 or -1).

The eigenvalues of a Hausholder matrix \(\boldsymbol Q_{HH}(\boldsymbol v)\) are all real (recall that it is symmetric!). In fact, in dimension \(d\) its eigenvalues are -1 (one time) and 1 ( \(d-1\) times). Since a transposition matrix \(\boldsymbol T\) is a special Householder matrix they have the same eigenvalues.

A.6.8 Positive definite matrices

If all eigenvalues of a square matrix \(\boldsymbol A\) are real and \(\lambda_i \geq 0\) then \(\boldsymbol A\) is called positive semi-definite. If all eigenvalues are strictly positive \(\lambda_i > 0\) then \(\boldsymbol A\) is called positive definite.

Note that a matrix does not need to be symmetric to be positive definite, e.g. \(\begin{pmatrix} 2 & 3 \\ 1 & 4 \\ \end{pmatrix}\) has positive eigenvalues 5 and 1. It also has a complete set of eigenvectors and is diagonisable.

A symmetric matrix \(\boldsymbol A\) is positive definite if the quadratic form \(\boldsymbol x^T \boldsymbol A\boldsymbol x> 0\) for any non-zero \(\boldsymbol x\), and it is positive semi-definite if \(\boldsymbol x^T \boldsymbol A\boldsymbol x\geq 0\). This holds also the other way around: a symmetric positive definite matrix (with positive eigenvalues) has a positive quadratic form, and a symmetric positive semi-definite matrix (with non-negative eigenvalues) a non-negative quadratic form.

A symmetric positive definite matrix always has a positive diagonal (this can be seen by setting \(\boldsymbol x\) above to a unit vector with 1 at a single position, and 0 at all other elements). However, just requiring a positive diagonal is too weak to ensure positive definiteness of a symmetric matrix, for example \(\begin{pmatrix} 1 &10 \\ 10 & 1 \\ \end{pmatrix}\) has a negative eigenvalue of -9. On the other hand, a symmetric matrix is indeed positive definite if it is strictly diagonally dominant, i.e. if all its diagonal elements are positive and are larger than the absolute value of any of the corresponding row or column elements. However, diagonal dominance is too restrictive as criterion to characterise all symmetric positive definite matrices, since there are many symmetric matrices that are positive definite but not diagonally dominant, such as \(\begin{pmatrix} 1 & 2 \\ 2 & 5 \\ \end{pmatrix}\).

Finally, the sum of a symmetric positive semi-definite matrix \(\boldsymbol A\) and a symmetric positive definite matrix \(\boldsymbol B\) is itself symmetric positive definite because the corresponding quadratic form \(\boldsymbol x^T ( \boldsymbol A+\boldsymbol B) \boldsymbol x= \boldsymbol x^T \boldsymbol A\boldsymbol x+ \boldsymbol x^T \boldsymbol B\boldsymbol x> 0\) is positive. Similarly, the sum of two symmetric positive (semi)-definite matrices is itself symmetric positive (semi)-definite.

A.7 Matrix decompositions

A.7.1 Diagonalisation and eigenvalue decomposition

If \(\boldsymbol A\) is a square non-defective matrix then \(\boldsymbol U\) is invertible and we can rewrite the eigenvalue equation to \[\boldsymbol A= \boldsymbol U\boldsymbol \Lambda\boldsymbol U^{-1} \,.\] This is called the eigendecomposition, or spectral decomposition, of \(\boldsymbol A\) and equivalently \[\boldsymbol \Lambda= \boldsymbol U^{-1} \boldsymbol A\boldsymbol U\] is the diagonalisation of \(\boldsymbol A\). Thus any matrix \(\boldsymbol A\) that is not defective is diagonalisable using eigenvalue decomposition.

A.7.2 Orthogonal eigenvalue decomposition

For symmetric \(\boldsymbol A\) this becomes \[\boldsymbol A= \boldsymbol U\boldsymbol \Lambda\boldsymbol U^T\] with real eigenvalues and orthogonal matrix \(\boldsymbol U\) and \[\boldsymbol \Lambda= \boldsymbol U^T \boldsymbol A\boldsymbol U\,.\] This special case is known as the orthogonal diagonalisation of \(\boldsymbol A\).

The orthogonal decomposition for symmetric \(\boldsymbol A\) is unique apart from the signs of the eigenvectors. In order to make it fully unique one needs to impose further restrictions (e.g. require a positive diagonal of \(\boldsymbol U\)). Note that this can be particularly important in computer application where the sign can vary depending on the specific implementation of the underlying numerical algorithms.

A.7.3 Singular value decomposition

The singular value decomposition (SVD) is a generalisation of the orthogonal eigenvalue decomposition for symmetric matrices.

Any (!) rectangular matrix \(\boldsymbol A\) of size \(n\times d\) can be factored into the product \[\boldsymbol A= \boldsymbol U\boldsymbol D\boldsymbol V^T\] where \(\boldsymbol U\) is a \(n \times n\) orthogonal matrix, \(\boldsymbol V\) is a second \(d \times d\) orthogonal matrix and \(\boldsymbol D\) is a diagonal but rectangular matrix of size \(n\times d\) with \(m=min(n,d)\) real diagonal elements \(d_1, \ldots d_m\). The \(d_i\) are called singular values, and appear along the diagonal in \(\boldsymbol D\) by order of magnitude.

The SVD is unique apart from the signs of the columns vectors in \(\boldsymbol U\), \(\boldsymbol V\) and \(\boldsymbol D\) (you can freely specify the column signs of any two of the three matrices). By convention the signs are chosen such that the singular values in \(\boldsymbol D\) are all non-negative, which leaves ambiguity in columns signs of \(\boldsymbol U\) and \(\boldsymbol V\). Alternatively, one may fix the columns signs of \(\boldsymbol U\) and \(\boldsymbol V\), e.g. by requiring a positive diagonal, which then determines the sign of the singular values (thus allowing for negative singular values as well).

If \(\boldsymbol A\) is symmetric then the SVD and the orthogonal eigenvalue decomposition coincide (apart from different sign conventions for singular values, eigenvalues and eigenvectors).

Since \(\boldsymbol A^T \boldsymbol A= \boldsymbol V\boldsymbol D^T \boldsymbol D\boldsymbol V^T\) and \(\boldsymbol A\boldsymbol A^T = \boldsymbol U\boldsymbol D\boldsymbol D^T \boldsymbol U^T\) the squared singular values correspond to the eigenvalues of \(\boldsymbol A^T \boldsymbol A\) and \(\boldsymbol A\boldsymbol A^T\). It also follows that \(\boldsymbol A^T \boldsymbol A\) and \(\boldsymbol A\boldsymbol A^T\) are both positive semi-definite symmetric matrices, and that \(\boldsymbol V\) and \(\boldsymbol U\) contain the respective sets of eigenvectors.

A.7.4 Polar decomposition

Any square matrix \(\boldsymbol A\) can be factored into the product \[ \boldsymbol A= \boldsymbol Q\boldsymbol B \] of an orthogonal matrix \(\boldsymbol Q\) and a symmetric positive semi-definite matrix \(\boldsymbol B\).

This follows from the SVD of \(\boldsymbol A\) given as \[ \begin{split} \boldsymbol A&= \boldsymbol U\boldsymbol D\boldsymbol V^T \\ &= ( \boldsymbol U\boldsymbol V^T ) ( \boldsymbol V\boldsymbol D\boldsymbol V^T ) \\ &= \boldsymbol Q\boldsymbol B\\ \end{split} \] with non-negative \(\boldsymbol D\). Note that this decomposition is unique as the sign ambiguities in the columns of \(\boldsymbol U\) and \(\boldsymbol V\) cancel out in \(\boldsymbol Q\) and \(\boldsymbol B\).

A.7.5 Cholesky decomposition

A symmetric positive definite matrix \(\boldsymbol A\) can be decomposed into a product of a triangular matrix \(\boldsymbol L\) with its transpose \[ \boldsymbol A= \boldsymbol L\boldsymbol L^T \,. \] Here, \(\boldsymbol L\) is a lower triangular matrix with positive diagonal elements.

This decomposition is unique and is called Cholesky factorisation. It is often used to check whether a symmetric matrix is positive definite as it is algorithmically less demanding than eigenvalue decomposition.

Note that some implementations of the Cholesky decomposition (e.g. R) use upper triangular matrices \(\boldsymbol K\) with positive diagonal so that \(\boldsymbol A= \boldsymbol K^T \boldsymbol K\) and \(\boldsymbol L= \boldsymbol K^T\).

A.8 Matrix summaries based on eigenvalues and singular values

A.8.1 Trace and determinant computed from eigenvalues

The eigendecomposition \(\boldsymbol A=\boldsymbol U\boldsymbol \Lambda\boldsymbol U^{-1}\) allows to establish a link between trace and determinant and eigenvalues.

Specifically, \[ \begin{split} \text{Tr}(\boldsymbol A) & = \text{Tr}(\boldsymbol U\boldsymbol \Lambda\boldsymbol U^{-1} ) = \text{Tr}( \boldsymbol \Lambda\boldsymbol U^{-1} \boldsymbol U) \\ &= \text{Tr}( \boldsymbol \Lambda) = \sum_{i=1}^d \lambda_i \\ \end{split} \] thus the trace of a square matrix \(\boldsymbol A\) is equal to the sum of its eigenvalues. Likewise, \[ \begin{split} \det(\boldsymbol A) & = \det(\boldsymbol U) \det(\boldsymbol \Lambda) \det(\boldsymbol U^{-1} ) \\ &=\det( \boldsymbol \Lambda) = \prod_{i=1}^d \lambda_i \\ \end{split} \] therefore the determinant of \(\boldsymbol A\) is the product of the eigenvalues.

The relationship between eigenvalues and the trace and the determinant is demonstrated here for diagonisable non-defective matrices. However, it does hold also in general for any matrix. This can be shown by using certain non-diagonal matrix decompositions (e.g. Jordan decomposition).

As a result, if any of the eigenvalues is equal to zero then \(\det(\boldsymbol A) = 0\) and as hence \(\boldsymbol A\) is singular and not invertible.

The trace and determinant of a real matrix are always real even though the individual eigenvalues may be complex.

A.8.2 Rank and condition number

The rank is the dimension of the space spanned by both the column and row vectors. A rectangular matrix of dimension \(n \times d\) will have rank of at most \(m = \min(n, d)\), and if the maximum is indeed achieved then it has full rank.

The condition number describes how well- or ill-conditioned a full rank matrix is. For example, for a square matrix a large condition number implies that the matrix is close to being singular and thus ill-conditioned. If the condition number is infinite then the matrix is not full rank.

The rank and condition of a matrix can both be determined from the \(m\) singular values \(d_1, \ldots, d_m\) of a matrix obtained by SVD:

  1. The rank is number of non-zero singular values.
  2. The condition number is the ratio of the largest singular value divided by the smallest singular value (absolute values if signs are allowed).

If a square matrix \(\boldsymbol A\) is singular then the condition number is infinite, and it will not have full rank. On the other hand, a non-singular square matrix, such as a positive definite matrix, has full rank.

A.9 Functions of symmetric matrices

We focus on symmetric square matrices \(\boldsymbol A=\boldsymbol U\boldsymbol \Lambda\boldsymbol U^T\) which are always diagonisable with real eigenvalues \(\boldsymbol \Lambda\) and orthogonal eigenvectors \(\boldsymbol U\).

A.9.1 Definition of a matrix function

Assume a real-valued function \(f(a)\) of a real number \(a\). Then the corresponding matrix function \(f(\boldsymbol A)\) is defined as \[ f(\boldsymbol A) = \boldsymbol Uf(\boldsymbol \Lambda) \boldsymbol U^T = \boldsymbol U\begin{pmatrix} f(\lambda_{1}) & \dots & 0\\ \vdots & \ddots & \vdots \\ 0 & \dots & f(\lambda_{d}) \end{pmatrix} \boldsymbol U^T \] where the function \(f(a)\) is applied to the eigenvalues of \(\boldsymbol A\). By construction \(f(\boldsymbol A)\) is real, symmetric and has real eigenvalues \(f(\lambda_i)\).

Examples:

Example A.2 Matrix power: \(f(a) = a^p\) (with \(p\) a real number)

Special cases of matrix power include :

  • Matrix inversion: \(f(a) = a^{-1}\)
    Note that if the matrix \(\boldsymbol A\) is singular, i.e. contains one or more eigenvalues \(\lambda_i=0\), then \(\boldsymbol A^{-1}\) is not defined and therefore \(\boldsymbol A\) is not invertible.

However, a so-called pseudoinverse can still be computed, by inverting the non-zero eigenvalues, and keeping the zero eigenvalues as zero.

  • Matrix square root: \(f(a) = a^{1/2}\)
    Since there are multiple solutions to the square root there are also multiple matrix square roots. The principal matrix square root is obtained by using the positive square roots of all the eigenvalues. Thus the principal matrix square root of a positive semi-definite matrix is also positive semi-definite and it is unique.

Example A.3 Matrix exponential: \(f(a) = \exp(a)\)
Note that because \(\exp(a) \geq 0\) for all real \(a\) the matrix \(\exp(\boldsymbol A)\) is positive semi-definite. Thus, the matrix exponential can be used to generate positive semi-definite matrices.

If \(\boldsymbol A\) and \(\boldsymbol B\) commute, i.e. if \(\boldsymbol A\boldsymbol B= \boldsymbol B\boldsymbol A\), then \(\exp(\boldsymbol A+\boldsymbol B) = \exp(\boldsymbol A) \exp(\boldsymbol B)\). However, this is not the case otherwise!

Example A.4 Matrix logarithm: \(f(a) = \log(a)\)
As the logarithm requires \(a >0\) the matrix \(\boldsymbol A\) needs to be positive definite for \(\log(\boldsymbol A)\) to be defined.

A.9.2 Identities for the matrix exponential and logarithm

The above give rise to useful identities:

  1. For any symmetric matrix \(\boldsymbol A\) we have \[ \det(\exp(\boldsymbol A)) = \exp(\text{Tr}(\boldsymbol A)) \] because \(\prod_i \exp(\lambda_i) = \exp( \sum_i \lambda_i)\) where \(\lambda_i\) are the eigenvalues of \(\boldsymbol A\).

  2. If we take the logarithm on both sides and replace \(\exp(\boldsymbol A)=\boldsymbol B\) we get another identity for a symmetric positive definite matrix \(\boldsymbol B\): \[ \log \det(\boldsymbol B) = \text{Tr}(\log(\boldsymbol B)) \] because \(\log( \prod_i \lambda_i) = \sum_i \log(\lambda_i)\) where \(\lambda_i\) are the eigenvalues of \(\boldsymbol B\).

A.10 Matrix calculus

A.10.1 First order vector derivatives

A.10.1.1 Gradient

The nabla operator (also known as del operator) is the row vector \[ \nabla = (\frac{\partial}{\partial x_1}, \ldots, \frac{\partial}{\partial x_d}) = \frac{\partial}{\partial \boldsymbol x} \] containing the first order partial derivative operators.

The gradient of a scalar-valued function \(f(\boldsymbol x)\) with vector argument \(\boldsymbol x= (x_1, \ldots, x_d)^T\) is also a row vector (with \(d\) columns) and can be expressed using the nabla operator \[ \begin{split} \nabla f(\boldsymbol x) &= \left( \frac{\partial f(\boldsymbol x)}{\partial x_1}, \ldots, \frac{\partial f(\boldsymbol x)}{\partial x_d} \right) \\ & = \frac{\partial f(\boldsymbol x)}{\partial \boldsymbol x} = \text{grad} f(\boldsymbol x) \, .\\ \end{split} \] Note the various notations for the gradient.

Example A.5 \(f(\boldsymbol x)=\boldsymbol a^T \boldsymbol x+ b\). Then \(\nabla f(\boldsymbol x) = \frac{\partial f(\boldsymbol x)}{\partial \boldsymbol x} = \boldsymbol a^T\).

Example A.6 \(f(\boldsymbol x)=\boldsymbol x^T \boldsymbol x\). Then \(\nabla f(\boldsymbol x) = \frac{\partial f(\boldsymbol x)}{\partial \boldsymbol x} = 2 \boldsymbol x^T\).

Example A.7 \(f(\boldsymbol x)=\boldsymbol x^T \boldsymbol A\boldsymbol x\). Then \(\nabla f(\boldsymbol x) = \frac{\partial f(\boldsymbol x)}{\partial \boldsymbol x} = \boldsymbol x^T (\boldsymbol A+ \boldsymbol A^T)\).

A.10.1.2 Jacobian matrix

For a vector-valued function \[ \boldsymbol f(\boldsymbol x) = ( f_1(\boldsymbol x), \ldots, f_m(\boldsymbol x) )^T \,. \] the computation of the gradient of each component yields the Jacobian matrix (with \(m\) rows and \(d\) columns) \[ \begin{split} D \boldsymbol f(\boldsymbol x) &= \left( {\begin{array}{c} \nabla f_1(\boldsymbol x) \\ \vdots \\ \nabla f_m(\boldsymbol x) \\ \end{array} } \right) \\ & = \left(\frac{\partial f_i(\boldsymbol x)}{\partial x_j}\right) \\ &= \frac{\partial \boldsymbol f(\boldsymbol x)}{\partial \boldsymbol x} \\ \end{split} \] Again, note the various notations for the Jacobian matrix!

Example A.8 \(\boldsymbol f(\boldsymbol x)=\boldsymbol A\boldsymbol x+ \boldsymbol b\). Then \(D \boldsymbol f(\boldsymbol x) = \frac{\partial \boldsymbol f(\boldsymbol x)}{\partial \boldsymbol x} = \boldsymbol A\).

If \(m=d\) then the Jacobian matrix is a square matrix and this allows to compute the Jacobian determinant \[\det D \boldsymbol f(\boldsymbol x) = \det\left(\frac{\partial \boldsymbol f(\boldsymbol x)}{\partial \boldsymbol x}\right)\]

If \(\boldsymbol y= \boldsymbol f(\boldsymbol x)\) is an invertible function with \(\boldsymbol x= \boldsymbol f^{-1}(\boldsymbol y)\) then the Jacobian matrix is invertible and the inverted matrix is in fact the Jacobian of the inverse function!

This allows to compute the Jacobian determinant of the backtransformation as as the inverse of the Jacobian determinant the original function: \[\det D \boldsymbol f^{-1}(\boldsymbol y) = ( \det D \boldsymbol f(\boldsymbol x) )^{-1}\] or in alternative notation \[\det D \boldsymbol x(\boldsymbol y) = \frac{1}{ \det D \boldsymbol y(\boldsymbol x) }\].

A.10.2 Second order vector derivatives

The matrix of all second order partial derivates of scalar-valued function with vector-valued argument is called the Hessian matrix and is computed by double application of the nabla operator: \[ \begin{split} \nabla^T \nabla f(\boldsymbol x) &= \begin{pmatrix} \frac{\partial^2 f(\boldsymbol x)}{\partial x_1^2} & \frac{\partial^2 f(\boldsymbol x)}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f(\boldsymbol x)}{\partial x_1 \partial x_d} \\ \frac{\partial^2 f(\boldsymbol x)}{\partial x_2 \partial x_1} & \frac{\partial^2 f(\boldsymbol x)}{\partial x_2^2} & \cdots & \frac{\partial^2 f(\boldsymbol x)}{\partial x_2 \partial x_d} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f(\boldsymbol x)}{\partial x_d \partial x_1} & \frac{\partial^2 f(\boldsymbol x)}{\partial x_d \partial x_2} & \cdots & \frac{\partial^2 f(\boldsymbol x)}{\partial x_d^2} \end{pmatrix} \\ & = \left(\frac{\partial f(\boldsymbol x)}{\partial x_i \partial x_j}\right) = {\left(\frac{\partial}{\partial \boldsymbol x}\right)}^T \frac{\partial f(\boldsymbol x)}{\partial \boldsymbol x} \,.\\ \end{split} \] By construction it is square and symmetric.

A.10.3 First order matrix derivatives

The derivative of a scalar-valued function \(f(\boldsymbol X)\) with regard to a matrix argument \(\boldsymbol X\) can also be defined and results in a matrix with transposed dimensions compared to \(\boldsymbol X\).

Two important specific examples are:

Example A.9 \(\frac{\partial \text{Tr}(\boldsymbol A\boldsymbol X)}{\partial \boldsymbol X} = \boldsymbol A\)

Example A.10 \(\frac{\partial \log \det(\boldsymbol X)}{\partial \boldsymbol X} = \frac{\partial \text{Tr}(\log \boldsymbol X)}{\partial \boldsymbol X} = \boldsymbol X^{-1}\)