1 Combinatorics
1.1 Some basic mathematical notation
Scalar quantity: plain font, typically lower case (\(x\), \(\theta\), n), sometimes upper case (\(K\), \(R^2\), distribution functions \(F\), \(P\), \(Q\)).
Sets: plain font, upper case (\(\Omega, \mathcal{F}\))
Vector quantity: bold font, lower case (\(\boldsymbol x\), \(\boldsymbol \theta\)).
Matrix quantity: bold font, upper case (\(\boldsymbol X\), \(\boldsymbol \Sigma\)).
Summation: \[ \sum_{i=1}^n x_i = x_1 + x_2 + \ldots + x_n \]
Product: \[ \begin{split} \prod_{i=1}^n x_i &= x_1 \times x_2 \times \ldots \times x_n\\ &= x_1 x_2 \ldots x_n \end{split} \] The multiplication sign \(\times\) between the factors is usually omitted unless it is needed for clarity.
Indicator function (in Iverson bracket notation): \[ [A] = \begin{cases} 1 & \text{if $A$ is true}\\ 0 & \text{if $A$ is not true}\\ \end{cases} \]
1.2 Number of permutations
A permutation or ordering is a specific arrangement of items in sequence or positions.
The number of permutations of \(n\) items, each of distinct type, is the factorial \[ n! = \prod_{i=1}^n i = 1 \times 2 \times \ldots \times n \] where \(n\) is a positive integer. For \(n=0\) the factorial is defined as \[ 0! = 1 \]
1.3 Multinomial and binomial coefficient
The number of permutations of \(n\) items, partitioned into \(K\leq n\) distinct types, with \(n_k\) items of type \(k\) and \(\sum_{k=1}^K n_k=n\), is given by the multinomial coefficient \[ \begin{split} W_K &= \binom{n}{n_1, \ldots, n_K} \\ &= \frac {n!}{n_1! \, n_2! \, \ldots \, n_K! } \end{split} \]
If all items are of distinct type (\(K=n\) and \(n_k=1\)) the multinomial coefficient reduces to the factorial.
If there are only two types (\(K=2\)) the multinomial coefficient becomes the binomial coefficient \[ \begin{split} W_2 &= \binom{n}{n_1, n_2} = \binom{n}{n_1, n- n_1}\\ &= \frac {n!}{n_1! \, (n - n_1)!}\\ & = \binom{n}{n_1} \end{split} \]
1.4 De Moivre-Sterling approximation
The factorial is frequently approximated by the following formula derived by Abraham de Moivre (1667–1754) and James Stirling (1692-1770) \[ n! \approx \sqrt{2 \pi} n^{n+\frac{1}{2}} e^{-n} \] or equivalently on logarithmic scale \[ \log n! \approx \left(n+\frac{1}{2}\right) \log n -n + \frac{1}{2}\log \left( 2 \pi\right) \] The approximation is good for small \(n\) (but fails for \(n=0\)) and becomes more and more accurate with increasing \(n\). For large \(n\) the approximation can be simplified to \[ \log n! \approx n \log n -n \]
The de Moivre-Sterling approximation applied to the multinomial coefficient yields \[ \log W_K \approx - n \sum_{k=1}^K \frac{n_k}{n} \log\left( \frac{n_k}{n} \right) \] Hence, for large \(n\) and large \(n_k\) the logarithm of the multinomial coefficient equals \(n\) times the Shannon-Gibbs entropy of a categorical distribution with class probabilities \(n_k/n\).