B Distributions used in Bayesian analysis

This appendix introduces a number of distributions essential for Bayesian analysis.

See in particular the Chapter “Bayesian learning in practise”.

B.1 Beta distribution

B.1.1 Standard parameterisation

The density of the beta distribution $\text{Beta}(\alpha, \beta)$ is \[ p(x | \alpha, \beta) = \frac{1}{B(\alpha, \beta)} x^{\alpha-1} (1-x)^{\beta-1} \] with $x \in [0,1]$ and $\alpha>0$ and $\beta>0$. The density depends on the beta function $B(z_1, z_1) = \frac{ \Gamma(z_1) \Gamma(z_2)}{\Gamma(z_1 + z_2)}$ which in turn is defined via Euler’s gamma function $\Gamma(x)$. Note that $\Gamma(x) = (x-1)!$ for any positive integer $x$.

The mean of the beta distribution is \[ \text{E}(x) = \frac{\alpha}{\alpha+\beta} \] and its variance is \[ \text{Var}(x)=\frac{\alpha \beta}{(\alpha+\beta)^2 (\alpha+\beta+1)} \]

The beta distribution is very flexible and can assume a number of different shapes, depending on the value of $\alpha$ and $\beta$:

B.1.2 Mean parameterisation

A useful reparameterisation $\text{Beta}(\mu, k)$ of the beta distribution is in terms of a mean parameter $\mu \in [0,1]$ and a concentration parameter $k > 0$. These are given by \[ k=\alpha+\beta \] and \[\mu = \frac{\alpha}{\alpha+\beta} \] The original parameters can be recovered by \[\alpha= \mu k\] and \[\beta=(1-\mu) k\]

The mean and variance of the beta distribution expressed in terms of $\mu$ and $k$ are \[ \text{E}(x) = \mu \] and \[ \text{Var}(x)=\frac{\mu (1-\mu)}{k+1} \] With increasing concentration parameter $k$ the variance decreases and thus the probability mass becomes more concentrated around the mean.

B.2 Inverse gamma (inverse Wishart) distribution

B.2.1 Standard parameterisation

The inverse gamma (IG) distribution $\text{Inv-Gam}(\alpha, \beta)$ has density \[ \frac{\beta^{\alpha}}{\Gamma(\alpha)} (1/x)^{\alpha+1} e^{-\beta/x} \] with two parameters $\alpha >0$ (shape parameter) and $\beta >0$ (scale parameter) and support $x >0$.

The mean of the inverse gamma distribution is \[\text{E}(x) = \frac{\beta}{\alpha-1}\] and the variance \[\text{Var}(x) = \frac{\beta^2}{(\alpha-1)^2 (\alpha-2)}\]

Thus, for the mean to exist we have the restriction $\alpha>1$ and for the variance to exist $\alpha>2$.

The IG distribution is closely linked with the gamma distribution. If $x \sim \text{Inv-Gam}(\alpha, \beta)$ is IG-distributed then the inverse of $x$ is gamma distributed: \[\frac{1}{x} \sim \text{Gam}(\alpha, \theta=\beta^{-1})\] where $\alpha$ is the shared shape parameter and $\theta$ the scale parameter of the gamma distribution.

B.2.2 Wishart parameterisation

The inverse gamma distribution is frequently used with a different set of parameters $\psi = 2\beta$ (scale parameter) and $\nu = 2\alpha$ (shape parameter), or conversely $\alpha=\nu/2$ and $\beta=\psi/2$. In this form it is called one-dimensional inverse Wishart distribution $W^{-1}_1(\psi, \nu)$ with mean and variance given by \[ \text{E}(x) = \frac{\psi}{\nu-2} = \mu \] for $\nu>2$ and \[ \text{Var}(x) =\frac{2 \psi^2}{(\nu-4) (\nu-2)^2} = \frac{2 \mu^2}{\nu-4} \] for $\nu >4$.

Instead of $\psi$ and $\nu$ we may also equivalently use $\mu$ and $\kappa=\nu-2$ as parameters for the inverse Wishart distribution, so that $W^{-1}_1(\psi=\kappa \mu, \nu=\kappa+2)$ has mean \[\text{E}(x) = \mu\] with $\kappa>0$ and the variance is \[\text{Var}(x) = \frac{2 \mu^2}{\kappa-2}\] with $\kappa>2$. This mean parameterisation is useful when employing the IG distribution as prior and posterior.

Finally, with $W^{-1}_1(\psi=\nu \tau^2, \nu)$, where $\tau^2 = \mu \frac{ \kappa}{\kappa+2} = \frac{\psi}{\nu}$ is a biased mean parameter, we get the scaled inverse chi-squared distribution $\tau^2 \text{Inv-$\chi^2_{\nu}$}$ with \[ \text{E}(x) = \tau^2 \frac{ \nu}{\nu-2} \] for $\nu>2$ and \[ \text{Var}(x) =\frac{2 \tau^4}{\nu-4} \frac{\nu^2}{(\nu-2)^2} \] for $\nu >4$.

The inverse Wishart and Wishart distributions are linked. If $x \sim W^{-1}_1(\psi, \nu)$ is inverse-Wishart distributed then the inverse of $x$ is Wishart distributed with inverted scale parameter: \[\frac{1}{x} \sim W_1(s^2=\psi^{-1}, k=\nu)\] where $k$ is the shape parameter and $s^2$ the scale parameter of the Wishart distribution.

B.3 Location-scale $t$-distribution as compound distribution

Suppose that \[ x | s^2 \sim N(\mu,s^2) \] with corresponding density $p(x | s^2)$ and mean $\text{E}(x | s^2) = \mu$ and variance $\text{Var}(x|s^2) = s^2$.

Now let the variance $s^2$ be distributed as inverse gamma / inverse Wishart \[ s^2 \sim W^{-1}(\psi=\kappa \sigma^2, \nu=\kappa+2) = W^{-1}(\psi=\tau^2\nu, \nu) \] with corresponding density $p(s^2)$ and mean $\text{E}(s^2) = \sigma^2 = \tau^2 \nu/(\nu-2)$. Note we use here both the mean parameterisation ($\sigma^2, \kappa$) and the inverse chi-squared parameterisation ($\tau^2, \nu$).

The joint density for $x$ and $s^2$ is $p(x, s^2) = p(x | s^2) p(s^2)$. We are interested in the marginal density for $x$: \[ p(x) = \int p(x, s^2) ds^2 = \int p(s^2) p(x | s^2) ds^2 \] This is a compound distribution of a normal with fixed mean $\mu$ and variance $s^2$ varying according the inverse gamma distribution. Calculating the integral results in the location-scale $t$-distribution with parameters \[ x \sim \text{lst}\left(\mu, \sigma^2 \frac{\kappa}{\kappa+2}, \kappa+2\right) = \text{lst}\left(\mu, \tau^2, \nu\right) \] with mean \[ \text{E}(x) = \mu \] and variance \[ \text{Var}(x) = \sigma^2 =\tau^2 \frac{\nu}{\nu-2} \]

From the law of total expectation and variance we can also directly verify that \[ \text{E}(x) = \text{E}( \text{E}(x| s^2) ) =\mu \] and \[ \text{Var}(x) = \text{E}(\text{Var}(x|s^2))+ \text{Var}(\text{E}(x|s^2)) = \text{E}(s^2) = \sigma^2 =\tau^2 \frac{\nu}{\nu-2} \]

A Refresher

C Further study

Statistical Methods: Likelihood, Bayes and Regression