WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Reversible jump Markov chain Monte Carlo

Posted on
Tags: RJMCMC, Model Selection

The note is for Green, P.J. (1995). “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination”. Biometrika. 82 (4): 711–732.

Abstract

MCMC methods for Bayesian computation have until recently been restricted to problems where the joint distribution of all variables has a density with respect to some fixed standard underlying measure. Not available for application to Bayesian model determination, where the dimensionality of the parameter vector is typically not fixed.

A new framework for the construction of reversible Markov chain samplers that jumps between parameter subspaces of differing dimensionality, which is flexible and entirely constructive.

Illustrated with applications to multiple change-point analysis in one and two dimensions, and to a Bayesian comparison of binomial experiments.

A discrete choice between a set of models, a parameter vector with an interpretation depending on the models in question, and data, influenced by the model and parameter values

Model

Suppose that we have a countable collection of candidate models $\{\cM_k,k\in\cK\}$. Model $\cM_k$ has a vector $\theta^{(k)}$ of unknown parameters, assumed to lie in $\IR^{n_k}$, where the dimension $n_k$ may vary from model to model. We observe data $y$. There is a natural hierarchical structure expressed by modelling the joint distribution of $(k,\theta^{(k)},y)$ as

\[p(k,\theta^{(k)},y) = p(k)p(\theta^{(k)}\mid k)p(y\mid k, \theta^{(k)})\,,\]

that is, the product of model probability, prior and likelihood. We can abbreviate the pair $(k,\theta^{(k)})$ by $x$. For given $k$, $x$ lies in $\cC_k = \{k\}\times \IR^{n_k}$.

Consider a concrete example, consider a change-point problem in which there is an unknown number of change-points in a piecewise constant regression function on the interval $[0,L]$. For $k\in \cK=\{0,1,2,\ldots\}$, model $\cM_k$ says that there are exactly $k$ change-points. To parametrise the resulting step function, we need to specify the position of each chang-point, and the value of the function on each of the $(k+1)$ subintervals into which $[0,L]$ is divided. Thus $\theta^{(k)}$ is a vector of length $n_k=2k+1$\,.

Bayesian inference about $k$ and $\theta^{(k)}$ will be based on the joint posterior $p(k,\theta^{(k)}\mid y)$, which is the target of the Markov chain Monte Carlo computations described below. We often factorise this as

\[p(k,\theta^{(k)}\mid y)=p(k\mid y)p(\theta^{(k)}\mid k,y)\]

Consider the Bayes factor for one model relative to another,

\[\frac{p(y\mid k_1)}{p(y\mid k_0)} = \frac{p(k_1\mid y)}{p(k_0\mid y)}\div \frac{p(k_1)}{p(k_0)}\]

Some work on MCMC computation with application to aspects of Bayesian model determination:

  • jump-diffusion samplers
  • embedding method in which the ${\cC_k}$ are mapped onto subsets of a single parameter space.

Let $\pi(dx)$ be the target distribution of interest. In MCMC, construct a Markov transition kernel $P(x,dx’)$ that is aperiodic and irreducible, and satisfies detailed balance,

\[\int_A\int_B\pi(dx)P(x,dx') = \int_B\int_A\pi(dx')P(x',dx)\]

for all appropriate $A,B$.

In a typical application with multiple parameter subspaces $\{\cC_k\}$ of different dimensionality, it will be necessary to devise different types of move between the subspaces.

When the current state is $x$, we propose a move of type $m$, that would take the state to $dx’$, with probability $q_m(x,dx’)$. For the moment, this is an arbitrary sub-probability measure on $m$ and $x’$. Thus $\sum_mq_m(x,\cC)\le$, and with probability $1-\sum_mq_m(x,\cC)$, no change to the present state is proposed.

The transition kernel can be written as

\[P(x,B) = \sum_m\int_B q_m(x,dx')\alpha_m(x,x') + s(x)I(x\in B)\]

for Borel sets $B$ in $\cC$, and the probability of not moving from $x$, either through a proposed move being rejected, or because no move is attempted is

\[s(x)=\sum_m\int_\cC q_m(x,dx')\{1-\alpha_m(x,x')\}+1-\sum_mq_m(x,\cC)\]

The detailed balance relation requires the equilibrium probability of moving from $A$ to $B$ to equal that from $B$ to $A$, for all Borel sets $A,B$ in $\cC$. It is sufficient that

\[\int_A\pi(dx)\int_Bq_m(x,dx')\alpha_m(x,x') = \int_B\pi(dx')\int_A q_m(x',dx)\alpha_m(x',x)\]

By some discussion, we can take

\[\alpha_m(x,x') = \min\Big\{1,\frac{\pi(dx')q_m(x',dx)}{\pi(dx)q_m(x,dx')}\Big\}\]

Example: Coal mining disasters

For data points ${y_i,i=1,2,\ldots,n}\in [0,L]$ from a Poisson process with rate given by the function $x(t)$, the log-likelihood is

\[\sum_{i=1}^n\log\{x(y_i)\} - \int_0^Lx(t)dt\]

Assuming that the rate function $x(\cdot)$ on $[0,L]$ is a step function. Suppose that there are $k$ steps, at positions $0 < s_1 < s_2 < \ldots < s_k < L$, and that the step function takes the value $h_j$, which we call its height, on the subinterval $[s_j,s_{j+1})$ for $j=0,1,\ldots,k$, writing $s_0=0, s_{k+1}=L$ for convenience. The prior model is specified by supposing that $k$ is drawn from the Poisson distribution

\[p(k) = e^{-\lambda}\frac{\lambda^k}{k!}\]

the heights $h_0,h_1,\ldots,h_k$ are independently drawn from $\Gamma(\alpha,\beta)$.

If $x$ is a step function on $[0,L]$, some possible transitions are:

  • a change to the height of a randomly chosen step
  • a change to the position of a randomly chosen step
  • “birth” of a new step at a randomly chose location in $[0,L]$
  • “death” of a randomly chosen step

the last two involve changing the dimension of $x$, the countable set of moves is denoted as ${H,P,0,1,2,\ldots}$

At each transition, an independent random choice is made between attempting each of the at most four available move types $(H,P,k,k-1)$. These have probabilities $\eta_k$ for $H$, $\pi_k$ for $P$, $b_k$ for $k$, and $d_k$ for $k-1$.


Published in categories Note