# Sequential Monte Carlo samplers

##### Posted on 0 Comments

## Introduction

### MCMC’s drawbacks

- it is difficult to assess when the Markov chain has reached its stationary regime and it can easily become trapped in local modes.
- cannot be used in a sequential Bayesian estimation context.

## Sequential importance sampling

### Importance sampling

\[\pi_n(x) = \gamma_n(x)/Z_n\qquad (1)\]where

- $\gamma_n(x)$ is known pointwise;
- the normalizing constant $Z_n$ is unknown

where $\eta_n(x)$ is a positive density with respect to dx, referred to as the importance distribution.

\[Z_n=\int_Ew_n(x)\eta_n(x)dx\qquad (3)\] \[w_n(x)=\gamma_n(x)/\eta_n(x)\qquad (4)\]We can sample $N$ particles ${X_n^{(i)}}$ from $\eta_n$ and substituting the Monte Carlo approximation

\[\eta_n^N(dx)=\frac 1N \sum\limits_{i=1}^N\delta_{X_n^{(i)}}(dx) \qquad (5)\]where $\delta$ is Dirac measure

\[\delta_x(A) = 1_A(x) \qquad (6)\]then we can substitute the Monte Carlo approximation into equations (2) and (3), and we will obtain an approximation for $E_{\pi_n}(\phi)$ and $Z_n$.

### Sequential importance sampling

at time $n=1$, start with $\pi_1$, simply, let $\eta_1=\pi_1$

at time $n-1$, we have $N$ particles ${X_{n-1}^{i}}$ distributed according to $\eta_{n-1}$. Move these particles by using a Markov kernel $K_n$, with associated density denoted $K_n(x,x’)$. The particles ${X_n^{(i)}}$ that are obtained this way are marginally distributed according to

\[\eta_n(x')=\int_E \eta_{n-1}(x)K_n(x,x')dx\]### Algorithm settings

#### sequence of distributions ${\pi_n}$

- $\pi_n(x) = p(x\mid y_1,\ldots,y_n)$
- $\pi_n(x)\approx \pi (x)^{\phi_n}\mu_1(x)^{1-\phi_n}$
- $\pi_n(x)\approx \pi (x)^{\phi_n}$
- $\pi_n(x)\approx \pi (x)I_{E_n}(x)$

#### sequence of transition kernels ${K_n}$

- independent proposal.(gaussian)
- local random-walk moves.
- markov chain monte carlo moves
- approximate gibbs moves

## Sequential Monte Carlo samplers

### Methodology and algorithm

- the importance weight can be computed exactly at time 1
- $n>1$, impossible to compute $\eta_n(x_n)$ pointwise, as it requires an integration with respect to $x_{1:n-1}$

propose Markov kernels $L_{n-1}: E\times \varepsilon\rightarrow [0,1]$ with density $L_{n-1}(x_n, x_{n-1})$

\[\tilde \pi_n (x_{1:n}) = \tilde \gamma_n(x_{1:n})/Z_n\]where

\[\tilde \gamma_n(x_{1:n})=\gamma_n(x_n)\prod\limits_{k=1}^{n-1}L_k(x_{k+1},x_k)\]at time $n-1$, assume that a set of weighted particles $\{W_{n-1}^{(i)}, X_{1:n-1}^{(i)}\}(i=1,\ldots,N)$ approximating $\tilde \pi_{n-1}$ is available

\[\tilde \pi_{n-1}^N(dx_{1:n-1})=\sum\limits_{i=1}^NW_{n-1}^{(i)}\delta_{X_{1:i-1}^{(i)}}(dx_{1:n-1})\] \[W_{(n-1)}^{(i)}=w_{n-1}(X_{1:n-1}^{(i)})/\sum_{j=1}^N w_{n-1}(X_{1:n-1}^{(j)})\]at time $n$, extend the path of each particle with a markov kernel $K_n(x_{n-1},x_n)$

\[w_n(x_{1:n})=\tilde \gamma_n(x_{1:n})/\eta_n(x_{1:n}) = w_{n-1}(x_{1:n-1})\tilde w_n(x_{n-1},x_n)\]where the so-called incremental weight $\tilde w_n(x_{n-1},x_n)$ is equal to

\[\tilde w_n(x_{n-1},w_n)=\frac{\gamma_n(x_n)L_{n-1}(x_n,x_{n-1})}{\gamma_{n-1}(x_{n-1}K_n(x_{n-1},x_n))}\]### Algorithm

notion:

$\eta_n(x)$: the importance distribution

$\pi_n$: a target density

#### step 1: initialization

set $n = 1$

for $i=1,\ldots, N$ draw $X_1^{(i)}\sim \eta_1$

\[w_1(X_1^{(i)})=\gamma_1(X_1^{(i)})/\eta_1(X_1^{(i)})\]and normalize these weights to obtain ${W_1^{(i)}}$

Iterate steps 3 and 3

#### step 2: resampling

if ESS < T, resample the particles and set $W_n^{(i)}=1/N$

#### step 3: sampling

set n = n+1; if n = p+1 stop;

for $i = 1, \ldots, N$ draw $X_n^{(i)} \sim K_n(X_{n-1}^{(i)}, \cdot)$

\[\tilde w_n(x_{n-1}, x_n) = \frac{\gamma_n(x_nL_{n-1}(x_n, x_{n-1}))}{\gamma_{n-1}(x_{n-1}K_n(x_{n-1}, x_n))}\]normalize the weights

\[W_n^{(i)} = \frac{W_{n-1}^{(i)}\tilde w_n(X_{n-1:n}^{(i)})}{\sum\limits_{j=1}^NW_{n-1}^{(j)}\tilde w_n(X_{n-1:n}^{(j)})}\]### Notes on algorithm

#### Estimates of target distributions and normalizing constants

\[\pi_n^N(dx) = \sum\limits_{i=1}^NW_n^{(i)}\delta_{X_n^{i}}(dx)\] \[\frac{Z_n}{Z_{n-1}}=\frac{\int \gamma_n(x_n)dx}{\int \gamma_{n-1}(x_{n-1})dx_{n-1}}\] \[\widehat{\frac{Z_n}{Z_{n-1}}}=\sum\limits_{i=1}^NW_{n-1}^{(i)}\tilde w_n(X_{n-1:n}^{(i)})\]#### Mixture of Markov kernels

it is to SMC sampling what the MH algorithm is to MCMC sampling

### Algorithm settings

#### optimal backward kernels

#### suboptimal backward kernels

## From MCMC to SMC

### Methodology

- build a sequence of distributions ${\pi_n}, n=1,\ldots, p$
- build a sequence of MCMC transition kernels
- based on ${\pi_n}$ and ${K_n}$, build a sequence of artificial backward markov kernels ${L_n}$
- use the SMC to approximate ${\pi_n}$ and to estimate ${Z_n}$