The Gibbs Sampler
Posted on (Update: ) 0 Comments
Gibbs sampler is an iterative algorithm that constructs a dependent sequence of parameter values whose distribution converges to the target joint posterior distribution.
A semiconjugate prior distribution
In the case where $\tau_0^2$ is not proportional to $\sigma^2$, the marginal density of $1/\sigma^2$ is not a gamma distribution.
Discrete approximations
A discrete approximation to the posterior distribution makes use of these facts by constructing a posterior distribution over a grid of parameter values, based on relative posterior probabilities.
Construct a twodimensional grid of value of ${\theta, \tilde \sigma^2}$
An example
mu0 = 1.9; t20 = 0.95^2; s20 = 0.01; nu0 = 1
y = c(1.64, 1.70, 1.72, 1.74, 1.82, 1.82, 1.82, 1.90, 2.08)
G = 100; H = 100
mean.grid = seq(1.505, 2.00, length = G)
prec.grid = seq(1.75,175, length = H)
post.grid = matrix(nrow = G, ncol = H)
for (g in 1:G){
for (h in 1:H){
post.grid[g, h] =
dnorm(mean.grid[g], mu0, sqrt(t20)) *
dgamma(prec.grid[h], nu0/2, s20*nu20/2) *
prod(dnorm(y, mean.grid[g], 1/sqrt(prec.grid[h])))
}
}
post.grid = post.grid/sum(post.grid)
Sampling from the conditional distributions
if we know $\theta$, we can easily sample directly from $p(\sigma^2\mid \theta,y_1,\ldots,y_n)$. And we can easily sample from $p(\theta\mid\sigma^2,y_1,\ldots,y_n)$.

suppose we sample from the marginal posterior distribution $p(\sigma^2\mid y_1,\ldots, y_n)$ and obtain $\sigma^{2(1)}$.

then we could sample $\theta^{(1)}\sim p(\theta\mid \sigma^{2(1)},y_1,\ldots,y_n)$, and {$\theta^{(1)},\sigma^{2(1)}$} would be a sample from the joint distribution of {$\theta,\sigma^2$}.

$\theta^{(1)}$ could be considered a sample from the marginal distribution $p(\theta\mid y_1,\ldots,y_n)$. For this $\theta$value, sample $\sigma^{(2)}\sim p(\sigma^2\mid \theta^{(1)},y_1,\ldots,y_n)$

{$\theta^{(1)},\sigma^{2(2)}$} is also a sample from the joint distribution of {$\theta,\sigma^2$}. This in turn means that $\sigma^{2(2)}$ is a sample from the marginal distribution $p(\sigma^2\mid y_1,\ldots,y_n)$, which then could be used to generate a new sample $\theta^{(2)}$.
Gibbs sampling
the full conditional distributions \(p(\theta\mid \sigma^2,y_1,\ldots,y_n)\) and \(p(\sigma^2\mid \theta,y_1,\ldots,y_n)\)
given a current state of the parameters \(\phi^{(s)}=\{\theta^{(s)}, \tilde \sigma^{2(s)}\}\)
generate a new state as follows
 sample $\theta^{(s+1)}\sim p(\theta\mid \tilde\sigma^{2(s)},y_1,\ldots,y_n)$
 sample $\tilde\sigma^{2(s+1)}\sim p(\tilde\sigma^2\mid \theta^{(s+1)},y_1,\ldots,y_n)$
 let $\phi^{(s+1)}={\theta^{s+1},\tilde\sigma^{2(s+1)}}$
General properties of the Gibbs sampler
Markov property, Markov chain.
\[\Pr(\phi^{(s)}\in A)\rightarrow \int_A p(\phi)d\phi\;as\;s\rightarrow \infty\]the sampling distribution of $\phi^{(s)}$ approaches the target distribution as $s \rightarrow \infty $, no matter what the starting value $\phi^{(0)}$ is (although some starting values will get you to the target sooner than others).
More importantly, for most functions $g$ of interest,
\[\frac{1}{S}\sum\limits_{s=1}^Sg(\phi^{(s)})\rightarrow E(g(\phi))=\int g(\phi)p(\phi)d\phi\;as\;S\rightarrow \infty\]the necessary ingredients of a Bayesian data analysis are
 Model specification: $p(y\mid \phi)$
 Prior specification: $p(\phi)$
 Posterior summary: $p(\phi\mid y)$
Remarks:
Monte Carlo and MCMC sampling algorithms
 are not models
 do not generate “more information” than is in $y$ and $p(\phi)$
 simply “ways of looking at” $p(\phi\mid y)$
for example,
\[\frac{1}{S}\sum\phi^{(s)}\approx \int\phi p(\phi\mid y)d\phi\] \[\frac{1}{S}1(\phi^{(s)}\le c)\approx Pr(\phi\le c\mid y)=\int_{\infty}^cp(\phi\mid y)d\phi\]Distinguishing parameter estimation from posterior approximation
Introduction to MCMC diagnostics
difference between MC and MCMC
 Independent MC samples automatically create a sequence that is representative of $p(\phi)$: The probability that $\phi\in A$ for any set $A$ is $\int_A p(\phi) d\phi$. This is true for every $s \in {1, . . . , S}$ and conditionally or unconditionally on the other values in the sequence.
 NOT true for MCMC, instead we have
Two types of Gibbs sampler
Suppose we can decompose the random variable into $d$ components, $\mathbf x=(x_1,\ldots, x_d)$. In the Gibbs sampler, one randomly or symmetrically chooses a coordinate, say $x_1$, then updates it with new sample $x_1’$ draw from the conditional distribution $\pi(\cdot\mid \mathbf x_{[1]})$.
Randomscan Gibbs Sampler
Let $\mathbf x^{(t)}=(x_1^{(t)},\ldots, x_d^{(t)})$ for iteration $t$. Then, at iteration $t+1$, conduct the following steps:
 Randomly select a coordinate $i$ from ${1, \ldots, d}$ according to a given probability vector $(\alpha_1, \ldots, \alpha_d)$
 Draw $x_{i}^{(t+1)}$ from the conditional distribution $\pi(\cdot\mid x_{[i]}^{(t)})$ and leave the remaining components unchanged
SystematicScan Gibbs Sampler
Let $\mathbf x^{(t)}=(x_1^{(t)},\ldots, x_d^{(t)})$ for iteration $t$. Then, at iteration $t+1$,
 Draw $x_i^{i+1}$ from the conditional distribution
In the both samplers, at every update step, $\pi$ is still invariant. Suppose $\mathbf x^{(t)}\sim \pi$, then $\mathbf x_{[i]}^{(t)}$ follows its marginal distribution under $\pi$. Thus,
\[\pi(x_i^{(t+1)}\mid \mathbf x_{[i]}^{(t)})\times \pi(\mathbf x_{[i]}^{(t)}) = \pi(\mathbf x_{[i]}^{(t)}, x_i^{(t+1)})\]which means that after one conditional update, the new configuration still follows distribution $\pi$.