WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Gibbs Sampling for the Multivariate Normal

Posted on
Tags: Gibbs, Multivariate, Normal

This note is based on Chapter 7 of Hoff PD. A first course in Bayesian statistical methods. Springer Science & Business Media; 2009 Jun 2.

A $p$-dimensional data vector $\Y$ has a multivariate normal distribution if its sampling density is given by

\[p(\y\mid \btheta,\Sigma) = (2\pi)^{-p/2}\vert \Sigma\vert^{-1/2}\exp\{-(\y-\btheta)'\Sigma^{-1}(\y-\btheta)/2\}\]

A convenient prior distribution for the multivariate mean $\btheta$ is a multivariate normal distribution, which can be parameterized as

\[\begin{align*} p(\btheta) &= MN(\bmu_0,\Lambda_0)\\ &\propto \exp\Big(-\frac 12\btheta'\Lambda_0^{-1}\btheta + \btheta'\Lambda_0^{-1}\bmu_0\Big)\\ &\triangleq\exp\Big(-\frac 12\btheta'\A\btheta + \btheta'\b_0\Big)\,. \end{align*}\]

The joint sampling density of the observed vectors $\y_1,\ldots,y_n$ is

\[p(\y_1,\ldots,y_n\mid \btheta,\Sigma) \propto \exp\Big(-\frac 12\btheta'\A_1\btheta+\btheta'\b_1\Big)\,,\]

where $\A_1=n\Sigma^{-1},\b_1 = n\Sigma^{-1}\bar \y$. Thus, the posterior distribution of $\btheta$ is

\[\begin{align*} p(\btheta\mid \y_1,\ldots,\y_n,\Sigma) &\propto \exp\Big(-\frac 12\btheta'\A_0\btheta+\btheta'\b_0\Big)\cdot \exp\Big(-\frac 12\btheta'\A_1\btheta+\btheta'\b_1\Big)\\ &=\exp\Big(-\frac 12\btheta'\A_n\btheta+\btheta'\b_n\Big)\,, \end{align*}\]

where

\[\begin{align*} \A_n &= \A_0 + \A_1 = \Lambda_0^{-1} + n\Sigma^{-1}\\ \b_n &= \b_0 + \b_1 = \Lambda_0^{-1}\bmu_0 + n\Sigma^{-1}\bar\y\,. \end{align*}\]

Hence,

\[\btheta\mid \y_1,\ldots,\y_n,\Sigma\sim MN(\bmu_n,\Lambda_n)\,,\]

where

\[\begin{align*} \Lambda_n &= \A_n^{-1} = (\Lambda_0^{-1}+n\Sigma^{-1})^{-1}\\ \bmu_n &= \A_n^{-1}\b_n = (\Lambda_0^{-1}+n\Sigma^{-1})^{-1}(\Lambda_0^{-1}\bmu_0+n\Sigma^{-1}\bar\y) \end{align*}\]

The sum of squares matrix of a collection of multivariate vectors $\z_1,\ldots,\z_n$ is given by

\[\sum_{i=1}^n\z_i\z_i' = \Z'\Z\,,\]

where $\Z$ is the $n\times p$ matrix whose $i$-th row is $\z_i’$.

This suggests the following construction of a “random” covariance matrix: For a given positive integer $\nu_0$ and a $p\times p$ covariance matrix $\Phi_0$,

  • sample $\z_1,\ldots,\z_{\nu_0}\sim MN(\0, \Phi_0)$ i.i.d.
  • calculate $\Z’\Z=\sum_{i=1}^{\nu_0}\z_i\z_i’$

Repeat this procedure over and over again, generating matrices $\Z_1’\Z_1,\ldots,\Z_S’\Z_S$. The population distribution of these sum of squares matrices is called a Wishart distribution with parameters $(\nu_0,\Phi_0)$.

Consider the prior distribution $\Sigma\sim IW(\nu_0,\S_0^{-1})$, the full conditional distribution of $\Sigma$ would be

\[\Sigma\mid \y_1,\ldots,\y_n,\btheta\sim IW(\nu_0+n, [\S_0+\S_\btheta]^{-1})\,.\]

Hopefully this result seems somewhat intuitive: We can think of $\nu_0+n$ as the “posterior sample size”, being the sum of the “prior sample size” $\nu_0$ and the data sample size.

Now, it is ready to perform Gibbs sampling:

  • Sample $\btheta^{s+1}\sim MN(\bmu_n,\Lambda_n)$
  • Sample $\Sigma^{s+1}\sim IW(\nu_0+n,\S_n^{-1})$

Published in categories Note