Reversible jump Markov chain Monte Carlo
Posted on
The note is for Green, P.J. (1995). “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination”. Biometrika. 82 (4): 711–732.
Abstract
MCMC methods for Bayesian computation have until recently been restricted to problems where the joint distribution of all variables has a density with respect to some fixed standard underlying measure. Not available for application to Bayesian model determination, where the dimensionality of the parameter vector is typically not fixed.
A new framework for the construction of reversible Markov chain samplers that jumps between parameter subspaces of differing dimensionality, which is flexible and entirely constructive.
Illustrated with applications to multiple change-point analysis in one and two dimensions, and to a Bayesian comparison of binomial experiments.
A discrete choice between a set of models, a parameter vector with an interpretation depending on the models in question, and data, influenced by the model and parameter values
Model
Suppose that we have a countable collection of candidate models {Mk,k∈K}. Model Mk has a vector θ(k) of unknown parameters, assumed to lie in IRnk, where the dimension nk may vary from model to model. We observe data y. There is a natural hierarchical structure expressed by modelling the joint distribution of (k,θ(k),y) as
p(k,θ(k),y)=p(k)p(θ(k)∣k)p(y∣k,θ(k)),that is, the product of model probability, prior and likelihood. We can abbreviate the pair (k,θ(k)) by x. For given k, x lies in Ck={k}×IRnk.
Consider a concrete example, consider a change-point problem in which there is an unknown number of change-points in a piecewise constant regression function on the interval [0,L]. For k∈K={0,1,2,…}, model Mk says that there are exactly k change-points. To parametrise the resulting step function, we need to specify the position of each chang-point, and the value of the function on each of the (k+1) subintervals into which [0,L] is divided. Thus θ(k) is a vector of length nk=2k+1\,.
Bayesian inference about k and θ(k) will be based on the joint posterior p(k,θ(k)∣y), which is the target of the Markov chain Monte Carlo computations described below. We often factorise this as
p(k,θ(k)∣y)=p(k∣y)p(θ(k)∣k,y)Consider the Bayes factor for one model relative to another,
p(y∣k1)p(y∣k0)=p(k1∣y)p(k0∣y)÷p(k1)p(k0)Some work on MCMC computation with application to aspects of Bayesian model determination:
- jump-diffusion samplers
- embedding method in which the Ck are mapped onto subsets of a single parameter space.
Let π(dx) be the target distribution of interest. In MCMC, construct a Markov transition kernel P(x,dx′) that is aperiodic and irreducible, and satisfies detailed balance,
∫A∫Bπ(dx)P(x,dx′)=∫B∫Aπ(dx′)P(x′,dx)for all appropriate A,B.
In a typical application with multiple parameter subspaces {Ck} of different dimensionality, it will be necessary to devise different types of move between the subspaces.
When the current state is x, we propose a move of type m, that would take the state to dx′, with probability qm(x,dx′). For the moment, this is an arbitrary sub-probability measure on m and x′. Thus ∑mqm(x,C)≤, and with probability 1−∑mqm(x,C), no change to the present state is proposed.
The transition kernel can be written as
P(x,B)=∑m∫Bqm(x,dx′)αm(x,x′)+s(x)I(x∈B)for Borel sets B in C, and the probability of not moving from x, either through a proposed move being rejected, or because no move is attempted is
s(x)=∑m∫Cqm(x,dx′){1−αm(x,x′)}+1−∑mqm(x,C)The detailed balance relation requires the equilibrium probability of moving from A to B to equal that from B to A, for all Borel sets A,B in C. It is sufficient that
∫Aπ(dx)∫Bqm(x,dx′)αm(x,x′)=∫Bπ(dx′)∫Aqm(x′,dx)αm(x′,x)By some discussion, we can take
αm(x,x′)=minExample: Coal mining disasters
For data points {y_i,i=1,2,\ldots,n}\in [0,L] from a Poisson process with rate given by the function x(t), the log-likelihood is
\sum_{i=1}^n\log\{x(y_i)\} - \int_0^Lx(t)dtAssuming that the rate function x(\cdot) on [0,L] is a step function. Suppose that there are k steps, at positions 0 < s_1 < s_2 < \ldots < s_k < L, and that the step function takes the value h_j, which we call its height, on the subinterval [s_j,s_{j+1}) for j=0,1,\ldots,k, writing s_0=0, s_{k+1}=L for convenience. The prior model is specified by supposing that k is drawn from the Poisson distribution
p(k) = e^{-\lambda}\frac{\lambda^k}{k!}the heights h_0,h_1,\ldots,h_k are independently drawn from \Gamma(\alpha,\beta).
If x is a step function on [0,L], some possible transitions are:
- a change to the height of a randomly chosen step
- a change to the position of a randomly chosen step
- “birth” of a new step at a randomly chose location in [0,L]
- “death” of a randomly chosen step
the last two involve changing the dimension of x, the countable set of moves is denoted as {H,P,0,1,2,\ldots}
At each transition, an independent random choice is made between attempting each of the at most four available move types (H,P,k,k-1). These have probabilities \eta_k for H, \pi_k for P, b_k for k, and d_k for k-1.