Metropolis Algorithm

Posted on Jul 21, 2017

Monte Carlo plays a key role in evaluating integrals and simulating stochastic systems, and the most critical step of Monte Carlo algorithm is sampling from an appropriate probability distribution $\pi (\mathbf x)$. There are two ways to solve this problem, one is to do importance sampling, another is to produce statistically dependent samples based on the idea of Markov chain Monte Carlo sampling.

Introduction

For importance sampling, you can read several previous posts, Importance Sampling and Adaptive Importance Sampling; and for MCMC, you can know how we sample a post-distribution by reading the jupyter notebook, which illustrates the sampling process via an example.

In this post, I will write down some notes about Metropolis Algorithm which is the cornerstone of MCMC.

Background

Let $\pi(\mathbf x) = Z^{-1}exp(-h(\mathbf x))$ be the target distribution, the normalizing constant $Z$ is often unknown to us. Theoritically,

\[Z = \int exp(-h(\mathbf x))d\mathbf x\]

but evaluating this integral is no easier than the original problem of simulating from $\pi$. Motivated by computational problems in statistical physics, Metropolis introduced the fundamental idea of evolving a Markov process to achieve the sampling of $\pi$, which is known as “Metropolis Algorithm”.

Hastings’ Generalization

The Metropolis algorithm prescribes the transition rule for a Markov chain. It uses a “symmetric” proposal function $T(\mathbf x,\mathbf y)$ to suggest a possible move and then employs an acceptance-rejection rule to “thin it down.” Hastings extended the algorithm to the case when $T$ is not necessarily symmetric. In Hastings’ generalization, the only serious restriction on the proposal function is that $T(\mathbf x,\mathbf y) >0$ if only if $T(\mathbf y,\mathbf x) >0$.

Steps of Metropolis-Hastings Algorithm:

Given current state $\mathbf x^{(t)}$

Draw $\mathbf y$ from the proposal distribution $T(\mathbf x^{(t)}, \mathbf y)$
Draw $U\sim Uniform[0,1]$ and update $\mathbf x^{(t+1)} = \mathbf y \;if \; U\le r(\mathbf x^{(t)},\mathbf y); \text{otherwise} \; \mathbf x^{(t+1)}=\mathbf x^{(t)}$

where

\[r(\mathbf x, \mathbf y) = min\{1, \frac{\pi(\mathbf y)T(\mathbf y, \mathbf x)}{\pi(\mathbf x)T(\mathbf x, \mathbf y)}\}\]

Proof

The actual transition function of the Markov chain is

\[A(\mathbf x, \mathbf y) = T(\mathbf x, \mathbf y)\cdot min\{1, \frac{\pi (\mathbf y)T(\mathbf y,\mathbf x)}{\pi(\mathbf x)T(\mathbf x,\mathbf y)} \}\]

Hence,

\[\pi(\mathbf x)A(\mathbf x,\mathbf y) = min\{\pi(\mathbf x)T(\mathbf x, \mathbf y), \pi(\mathbf y)T(\mathbf y, \mathbf x)\}\]

which is a symmetric function in $\mathbf x$ and $\mathbf y$. Thus, the detailed balance condition is satisfied.

\[\pi(\mathbf x)A(\mathbf x,\mathbf y)=\pi(\mathbf y)A(\mathbf y, \mathbf x)\]

Then

\[\int \pi(\mathbf x)A(\mathbf x, \mathbf y)d\mathbf x=\int \pi(\mathbf y)A(\mathbf y, \mathbf x)d\mathbf x=\pi(\mathbf y)\int A(\mathbf y,\mathbf x)d\mathbf x=\pi \mathbf (y)\]

thus, the detailed balance ensures invariance.

By the standard Markov chain theory, if the chain is irreducible, aperiodic, and possesses an invariant distribution, then the chain will become stationary at its variant distribution $\pi$. Therefore, if we run this chain long enough, the samples $\mathbf x_{n_0+1},\mathbf x_{n_0+2},\ldots, \ldots$ produced by the chain can be regarded as approximately following the target distribution $\pi$.

References

Liu, Jun S. Monte Carlo strategies in scientific computing. Springer Science & Business Media, 2008.

Published in categories Note

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.