Reversible jump Markov chain Monte Carlo
Posted on
The note is for Green, P.J. (1995). “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination”. Biometrika. 82 (4): 711–732.
Abstract
MCMC methods for Bayesian computation have until recently been restricted to problems where the joint distribution of all variables has a density with respect to some fixed standard underlying measure. Not available for application to Bayesian model determination, where the dimensionality of the parameter vector is typically not fixed.
A new framework for the construction of reversible Markov chain samplers that jumps between parameter subspaces of differing dimensionality, which is flexible and entirely constructive.
Illustrated with applications to multiple change-point analysis in one and two dimensions, and to a Bayesian comparison of binomial experiments.
A discrete choice between a set of models, a parameter vector with an interpretation depending on the models in question, and data, influenced by the model and parameter values
Model
Suppose that we have a countable collection of candidate models {Mk,k∈K}. Model Mk has a vector θ(k) of unknown parameters, assumed to lie in IRnk, where the dimension nk may vary from model to model. We observe data y. There is a natural hierarchical structure expressed by modelling the joint distribution of (k,θ(k),y) as
p(k,θ(k),y)=p(k)p(θ(k)∣k)p(y∣k,θ(k)),that is, the product of model probability, prior and likelihood. We can abbreviate the pair (k,θ(k)) by x. For given k, x lies in Ck={k}×IRnk.
Consider a concrete example, consider a change-point problem in which there is an unknown number of change-points in a piecewise constant regression function on the interval [0,L]. For k∈K={0,1,2,…}, model Mk says that there are exactly k change-points. To parametrise the resulting step function, we need to specify the position of each chang-point, and the value of the function on each of the (k+1) subintervals into which [0,L] is divided. Thus θ(k) is a vector of length nk=2k+1\,.
Bayesian inference about k and θ(k) will be based on the joint posterior p(k,θ(k)∣y), which is the target of the Markov chain Monte Carlo computations described below. We often factorise this as
p(k,θ(k)∣y)=p(k∣y)p(θ(k)∣k,y)Consider the Bayes factor for one model relative to another,
p(y∣k1)p(y∣k0)=p(k1∣y)p(k0∣y)÷p(k1)p(k0)Some work on MCMC computation with application to aspects of Bayesian model determination:
- jump-diffusion samplers
- embedding method in which the Ck are mapped onto subsets of a single parameter space.
Let π(dx) be the target distribution of interest. In MCMC, construct a Markov transition kernel P(x,dx′) that is aperiodic and irreducible, and satisfies detailed balance,
∫A∫Bπ(dx)P(x,dx′)=∫B∫Aπ(dx′)P(x′,dx)for all appropriate A,B.
In a typical application with multiple parameter subspaces {Ck} of different dimensionality, it will be necessary to devise different types of move between the subspaces.
When the current state is x, we propose a move of type m, that would take the state to dx′, with probability qm(x,dx′). For the moment, this is an arbitrary sub-probability measure on m and x′. Thus ∑mqm(x,C)≤, and with probability 1−∑mqm(x,C), no change to the present state is proposed.
The transition kernel can be written as
P(x,B)=∑m∫Bqm(x,dx′)αm(x,x′)+s(x)I(x∈B)for Borel sets B in C, and the probability of not moving from x, either through a proposed move being rejected, or because no move is attempted is
s(x)=∑m∫Cqm(x,dx′){1−αm(x,x′)}+1−∑mqm(x,C)The detailed balance relation requires the equilibrium probability of moving from A to B to equal that from B to A, for all Borel sets A,B in C. It is sufficient that
∫Aπ(dx)∫Bqm(x,dx′)αm(x,x′)=∫Bπ(dx′)∫Aqm(x′,dx)αm(x′,x)By some discussion, we can take
αm(x,x′)=min{1,π(dx′)qm(x′,dx)π(dx)qm(x,dx′)}Example: Coal mining disasters
For data points yi,i=1,2,…,n∈[0,L] from a Poisson process with rate given by the function x(t), the log-likelihood is
n∑i=1log{x(yi)}−∫L0x(t)dtAssuming that the rate function x(⋅) on [0,L] is a step function. Suppose that there are k steps, at positions 0<s1<s2<…<sk<L, and that the step function takes the value hj, which we call its height, on the subinterval [sj,sj+1) for j=0,1,…,k, writing s0=0,sk+1=L for convenience. The prior model is specified by supposing that k is drawn from the Poisson distribution
p(k)=e−λλkk!the heights h0,h1,…,hk are independently drawn from Γ(α,β).
If x is a step function on [0,L], some possible transitions are:
- a change to the height of a randomly chosen step
- a change to the position of a randomly chosen step
- “birth” of a new step at a randomly chose location in [0,L]
- “death” of a randomly chosen step
the last two involve changing the dimension of x, the countable set of moves is denoted as H,P,0,1,2,…
At each transition, an independent random choice is made between attempting each of the at most four available move types (H,P,k,k−1). These have probabilities ηk for H, πk for P, bk for k, and dk for k−1.