# Sequential Monte Carlo samplers

## Introduction

### MCMC’s drawbacks

1. it is difficult to assess when the Markov chain has reached its stationary regime and it can easily become trapped in local modes.
2. cannot be used in a sequential Bayesian estimation context.

## Sequential importance sampling

### Importance sampling

$\pi_n(x) = \gamma_n(x)/Z_n\qquad (1)$

where

1. $\gamma_n(x)$ is known pointwise;
2. the normalizing constant $Z_n$ is unknown
$E_{\pi_n}(\phi) = Z_n^{-1}\int_E \phi(x)w_n(x)\eta_n(x)dx\qquad (2)$

where $\eta_n(x)$ is a positive density with respect to dx, referred to as the importance distribution.

$Z_n=\int_Ew_n(x)\eta_n(x)dx\qquad (3)$ $w_n(x)=\gamma_n(x)/\eta_n(x)\qquad (4)$

We can sample $N$ particles ${X_n^{(i)}}$ from $\eta_n$ and substituting the Monte Carlo approximation

$\eta_n^N(dx)=\frac 1N \sum\limits_{i=1}^N\delta_{X_n^{(i)}}(dx) \qquad (5)$

where $\delta$ is Dirac measure

$\delta_x(A) = 1_A(x) \qquad (6)$

then we can substitute the Monte Carlo approximation into equations (2) and (3), and we will obtain an approximation for $E_{\pi_n}(\phi)$ and $Z_n$.

### Sequential importance sampling

at time $n=1$, start with $\pi_1$, simply, let $\eta_1=\pi_1$

at time $n-1$, we have $N$ particles ${X_{n-1}^{i}}$ distributed according to $\eta_{n-1}$. Move these particles by using a Markov kernel $K_n$, with associated density denoted $K_n(x,x’)$. The particles ${X_n^{(i)}}$ that are obtained this way are marginally distributed according to

$\eta_n(x')=\int_E \eta_{n-1}(x)K_n(x,x')dx$

### Algorithm settings

#### sequence of distributions ${\pi_n}$

• $\pi_n(x) = p(x\mid y_1,\ldots,y_n)$
• $\pi_n(x)\approx \pi (x)^{\phi_n}\mu_1(x)^{1-\phi_n}$
• $\pi_n(x)\approx \pi (x)^{\phi_n}$
• $\pi_n(x)\approx \pi (x)I_{E_n}(x)$

#### sequence of transition kernels ${K_n}$

• independent proposal.(gaussian)
• local random-walk moves.
• markov chain monte carlo moves
• approximate gibbs moves

## Sequential Monte Carlo samplers

### Methodology and algorithm

1. the importance weight can be computed exactly at time 1
2. $n>1$, impossible to compute $\eta_n(x_n)$ pointwise, as it requires an integration with respect to $x_{1:n-1}$

propose Markov kernels $L_{n-1}: E\times \varepsilon\rightarrow [0,1]$ with density $L_{n-1}(x_n, x_{n-1})$

$\tilde \pi_n (x_{1:n}) = \tilde \gamma_n(x_{1:n})/Z_n$

where

$\tilde \gamma_n(x_{1:n})=\gamma_n(x_n)\prod\limits_{k=1}^{n-1}L_k(x_{k+1},x_k)$

at time $n-1$, assume that a set of weighted particles $\{W_{n-1}^{(i)}, X_{1:n-1}^{(i)}\}(i=1,\ldots,N)$ approximating $\tilde \pi_{n-1}$ is available

$\tilde \pi_{n-1}^N(dx_{1:n-1})=\sum\limits_{i=1}^NW_{n-1}^{(i)}\delta_{X_{1:i-1}^{(i)}}(dx_{1:n-1})$ $W_{(n-1)}^{(i)}=w_{n-1}(X_{1:n-1}^{(i)})/\sum_{j=1}^N w_{n-1}(X_{1:n-1}^{(j)})$

at time $n$, extend the path of each particle with a markov kernel $K_n(x_{n-1},x_n)$

$w_n(x_{1:n})=\tilde \gamma_n(x_{1:n})/\eta_n(x_{1:n}) = w_{n-1}(x_{1:n-1})\tilde w_n(x_{n-1},x_n)$

where the so-called incremental weight $\tilde w_n(x_{n-1},x_n)$ is equal to

$\tilde w_n(x_{n-1},w_n)=\frac{\gamma_n(x_n)L_{n-1}(x_n,x_{n-1})}{\gamma_{n-1}(x_{n-1}K_n(x_{n-1},x_n))}$

### Algorithm

notion:

$\eta_n(x)$: the importance distribution

$\pi_n$: a target density

#### step 1: initialization

set $n = 1$

for $i=1,\ldots, N$ draw $X_1^{(i)}\sim \eta_1$

$w_1(X_1^{(i)})=\gamma_1(X_1^{(i)})/\eta_1(X_1^{(i)})$

and normalize these weights to obtain ${W_1^{(i)}}$

Iterate steps 3 and 3

#### step 2: resampling

if ESS < T, resample the particles and set $W_n^{(i)}=1/N$

#### step 3: sampling

set n = n+1; if n = p+1 stop;

for $i = 1, \ldots, N$ draw $X_n^{(i)} \sim K_n(X_{n-1}^{(i)}, \cdot)$

$\tilde w_n(x_{n-1}, x_n) = \frac{\gamma_n(x_nL_{n-1}(x_n, x_{n-1}))}{\gamma_{n-1}(x_{n-1}K_n(x_{n-1}, x_n))}$

normalize the weights

$W_n^{(i)} = \frac{W_{n-1}^{(i)}\tilde w_n(X_{n-1:n}^{(i)})}{\sum\limits_{j=1}^NW_{n-1}^{(j)}\tilde w_n(X_{n-1:n}^{(j)})}$

### Notes on algorithm

#### Estimates of target distributions and normalizing constants

$\pi_n^N(dx) = \sum\limits_{i=1}^NW_n^{(i)}\delta_{X_n^{i}}(dx)$ $\frac{Z_n}{Z_{n-1}}=\frac{\int \gamma_n(x_n)dx}{\int \gamma_{n-1}(x_{n-1})dx_{n-1}}$ $\widehat{\frac{Z_n}{Z_{n-1}}}=\sum\limits_{i=1}^NW_{n-1}^{(i)}\tilde w_n(X_{n-1:n}^{(i)})$

#### Mixture of Markov kernels

it is to SMC sampling what the MH algorithm is to MCMC sampling

## From MCMC to SMC

### Methodology

1. build a sequence of distributions ${\pi_n}, n=1,\ldots, p$
2. build a sequence of MCMC transition kernels
3. based on ${\pi_n}$ and ${K_n}$, build a sequence of artificial backward markov kernels ${L_n}$
4. use the SMC to approximate ${\pi_n}$ and to estimate ${Z_n}$

## Reference

Del Moral, Pierre, Arnaud Doucet, and Ajay Jasra. “Sequential monte carlo samplers.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68.3 (2006): 411-436.

Published in categories Note