Gibbs Sampling for the Multivariate Normal
Posted on
This note is based on Chapter 7 of Hoff PD. A first course in Bayesian statistical methods. Springer Science & Business Media; 2009 Jun 2.
A $p$-dimensional data vector $\Y$ has a multivariate normal distribution if its sampling density is given by
\[p(\y\mid \btheta,\Sigma) = (2\pi)^{-p/2}\vert \Sigma\vert^{-1/2}\exp\{-(\y-\btheta)'\Sigma^{-1}(\y-\btheta)/2\}\]A convenient prior distribution for the multivariate mean $\btheta$ is a multivariate normal distribution, which can be parameterized as
\[\begin{align*} p(\btheta) &= MN(\bmu_0,\Lambda_0)\\ &\propto \exp\Big(-\frac 12\btheta'\Lambda_0^{-1}\btheta + \btheta'\Lambda_0^{-1}\bmu_0\Big)\\ &\triangleq\exp\Big(-\frac 12\btheta'\A\btheta + \btheta'\b_0\Big)\,. \end{align*}\]The joint sampling density of the observed vectors $\y_1,\ldots,y_n$ is
\[p(\y_1,\ldots,y_n\mid \btheta,\Sigma) \propto \exp\Big(-\frac 12\btheta'\A_1\btheta+\btheta'\b_1\Big)\,,\]where $\A_1=n\Sigma^{-1},\b_1 = n\Sigma^{-1}\bar \y$. Thus, the posterior distribution of $\btheta$ is
\[\begin{align*} p(\btheta\mid \y_1,\ldots,\y_n,\Sigma) &\propto \exp\Big(-\frac 12\btheta'\A_0\btheta+\btheta'\b_0\Big)\cdot \exp\Big(-\frac 12\btheta'\A_1\btheta+\btheta'\b_1\Big)\\ &=\exp\Big(-\frac 12\btheta'\A_n\btheta+\btheta'\b_n\Big)\,, \end{align*}\]where
\[\begin{align*} \A_n &= \A_0 + \A_1 = \Lambda_0^{-1} + n\Sigma^{-1}\\ \b_n &= \b_0 + \b_1 = \Lambda_0^{-1}\bmu_0 + n\Sigma^{-1}\bar\y\,. \end{align*}\]Hence,
\[\btheta\mid \y_1,\ldots,\y_n,\Sigma\sim MN(\bmu_n,\Lambda_n)\,,\]where
\[\begin{align*} \Lambda_n &= \A_n^{-1} = (\Lambda_0^{-1}+n\Sigma^{-1})^{-1}\\ \bmu_n &= \A_n^{-1}\b_n = (\Lambda_0^{-1}+n\Sigma^{-1})^{-1}(\Lambda_0^{-1}\bmu_0+n\Sigma^{-1}\bar\y) \end{align*}\]The sum of squares matrix of a collection of multivariate vectors $\z_1,\ldots,\z_n$ is given by
\[\sum_{i=1}^n\z_i\z_i' = \Z'\Z\,,\]where $\Z$ is the $n\times p$ matrix whose $i$-th row is $\z_i’$.
This suggests the following construction of a “random” covariance matrix: For a given positive integer $\nu_0$ and a $p\times p$ covariance matrix $\Phi_0$,
- sample $\z_1,\ldots,\z_{\nu_0}\sim MN(\0, \Phi_0)$ i.i.d.
- calculate $\Z’\Z=\sum_{i=1}^{\nu_0}\z_i\z_i’$
Repeat this procedure over and over again, generating matrices $\Z_1’\Z_1,\ldots,\Z_S’\Z_S$. The population distribution of these sum of squares matrices is called a Wishart distribution with parameters $(\nu_0,\Phi_0)$.
Consider the prior distribution $\Sigma\sim IW(\nu_0,\S_0^{-1})$, the full conditional distribution of $\Sigma$ would be
\[\Sigma\mid \y_1,\ldots,\y_n,\btheta\sim IW(\nu_0+n, [\S_0+\S_\btheta]^{-1})\,.\]Hopefully this result seems somewhat intuitive: We can think of $\nu_0+n$ as the “posterior sample size”, being the sum of the “prior sample size” $\nu_0$ and the data sample size.
Now, it is ready to perform Gibbs sampling:
- Sample $\btheta^{s+1}\sim MN(\bmu_n,\Lambda_n)$
- Sample $\Sigma^{s+1}\sim IW(\nu_0+n,\S_n^{-1})$