1 Combinatorics
1.1 Some basic mathematical notation
Summation: \[ \sum_{i=1}^n x_i = x_1 + x_2 + \ldots + x_n \]
Multiplication: \[ \prod_{i=1}^n x_i = x_1 \times x_2 \times \ldots \times x_n \]
Indicator function (in Iverson bracket notation): \[ [A] = \begin{cases} 1 & \text{if $A$ is true}\\ 0 & \text{if $A$ is not true}\\ \end{cases} \]
Scalar: plain type, typically lower case (\(x\), \(\theta\)), sometimes upper case (\(K\)).
Vector: bold type, lower case (\(\boldsymbol x\), \(\boldsymbol \theta\)).
Matrix: bold type, upper case (\(\boldsymbol X\), \(\boldsymbol \Sigma\)).
1.2 Number of permutations
The number of possible orderings, or permutations, of \(n\) distinct items is the number of ways to put \(n\) items in \(n\) bins with exactly one item in each bin. It is given by the factorial \[ n! = \prod_{i=1}^n i = 1 \times 2 \times \ldots \times n \] where \(n\) is a positive integer. For \(n=0\) the factorial is defined as \[ 0! = 1 \] as there is exactly one permutation of zero objects.
The factorial can also be obtained using the gamma function \[ \Gamma(x) = \int_0^\infty t^{x-1} e^{-t} dt \] which can be viewed as continuous version of the factorial with \(\Gamma(x) = (x-1)!\) for any positive integer \(x\).
1.3 De Moivre-Sterling approximation of the factorial
The factorial is frequently approximated by the following formula derived by Abraham de Moivre (1667–1754) and James Stirling (1692-1770) \[ n! \approx \sqrt{2 \pi} n^{n+\frac{1}{2}} e^{-n} \] or equivalently on logarithmic scale \[ \log n! \approx \left(n+\frac{1}{2}\right) \log n -n + \frac{1}{2}\log \left( 2 \pi\right) \] The approximation is good for small \(n\) (but fails for \(n=0\)) and becomes more and more accurate with increasing \(n\). For large \(n\) the approximation can be simplified to \[ \log n! \approx n \log n -n \]
1.4 Multinomial and binomial coefficient
The number of possible permutation of \(n\) items of \(K\) distinct types, with \(n_1\) of type 1, \(n_2\) of type 2 and so on, equals the number of ways to put \(n\) items into \(K\) bins with \(n_1\) items in the first bin, \(n_2\) in the second and so on. It is given by the multinomial coefficient \[ \binom{n}{n_1, \ldots, n_K} = \frac {n!}{n_1! \times n_2! \times\ldots \times n_K! } \] with \(\sum_{k=1}^K n_k = n\) and \(K \leq n\). Note that it equals the number of permutation of all items divided by the number of permutations of the items in each bin (or of each type).
If all \(n_k=1\) and hence \(K=n\) the multinomial coefficient reduces to the factorial.
If there are only two bins / types (\(K=2\)) the multinomial coefficients becomes the binomial coefficient \[ \binom{n}{n_1} = \binom{n}{n_1, n-n_1} = \frac {n!}{n_1! (n - n_1)!} \] which counts the number of ways to choose \(n_1\) elements from a set of \(n\) elements.
For large \(n\) and \(n_k\) we can apply the De Moivre-Sterling approximation to the multinomial coefficient, yielding \[ \log\binom{n}{n_1, \ldots, n_K} = - n \sum_{k=1}^K \frac{n_k}{n} \log\left( \frac{n_k}{n} \right) \] Note this is \(n\) times the Shannon entropy of a categorical distribution with \(n_k/n\) as class probabilities.