Sequential Monte Carlo samplers
Posted on 0 Comments
Introduction
MCMC’s drawbacks
- it is difficult to assess when the Markov chain has reached its stationary regime and it can easily become trapped in local modes.
- cannot be used in a sequential Bayesian estimation context.
Sequential importance sampling
Importance sampling
\[\pi_n(x) = \gamma_n(x)/Z_n\qquad (1)\]where
- $\gamma_n(x)$ is known pointwise;
- the normalizing constant $Z_n$ is unknown
where $\eta_n(x)$ is a positive density with respect to dx, referred to as the importance distribution.
\[Z_n=\int_Ew_n(x)\eta_n(x)dx\qquad (3)\] \[w_n(x)=\gamma_n(x)/\eta_n(x)\qquad (4)\]We can sample $N$ particles ${X_n^{(i)}}$ from $\eta_n$ and substituting the Monte Carlo approximation
\[\eta_n^N(dx)=\frac 1N \sum\limits_{i=1}^N\delta_{X_n^{(i)}}(dx) \qquad (5)\]where $\delta$ is Dirac measure
\[\delta_x(A) = 1_A(x) \qquad (6)\]then we can substitute the Monte Carlo approximation into equations (2) and (3), and we will obtain an approximation for $E_{\pi_n}(\phi)$ and $Z_n$.
Sequential importance sampling
at time $n=1$, start with $\pi_1$, simply, let $\eta_1=\pi_1$
at time $n-1$, we have $N$ particles ${X_{n-1}^{i}}$ distributed according to $\eta_{n-1}$. Move these particles by using a Markov kernel $K_n$, with associated density denoted $K_n(x,x’)$. The particles ${X_n^{(i)}}$ that are obtained this way are marginally distributed according to
\[\eta_n(x')=\int_E \eta_{n-1}(x)K_n(x,x')dx\]Algorithm settings
sequence of distributions ${\pi_n}$
- $\pi_n(x) = p(x\mid y_1,\ldots,y_n)$
- $\pi_n(x)\approx \pi (x)^{\phi_n}\mu_1(x)^{1-\phi_n}$
- $\pi_n(x)\approx \pi (x)^{\phi_n}$
- $\pi_n(x)\approx \pi (x)I_{E_n}(x)$
sequence of transition kernels ${K_n}$
- independent proposal.(gaussian)
- local random-walk moves.
- markov chain monte carlo moves
- approximate gibbs moves
Sequential Monte Carlo samplers
Methodology and algorithm
- the importance weight can be computed exactly at time 1
- $n>1$, impossible to compute $\eta_n(x_n)$ pointwise, as it requires an integration with respect to $x_{1:n-1}$
propose Markov kernels $L_{n-1}: E\times \varepsilon\rightarrow [0,1]$ with density $L_{n-1}(x_n, x_{n-1})$
\[\tilde \pi_n (x_{1:n}) = \tilde \gamma_n(x_{1:n})/Z_n\]where
\[\tilde \gamma_n(x_{1:n})=\gamma_n(x_n)\prod\limits_{k=1}^{n-1}L_k(x_{k+1},x_k)\]at time $n-1$, assume that a set of weighted particles $\{W_{n-1}^{(i)}, X_{1:n-1}^{(i)}\}(i=1,\ldots,N)$ approximating $\tilde \pi_{n-1}$ is available
\[\tilde \pi_{n-1}^N(dx_{1:n-1})=\sum\limits_{i=1}^NW_{n-1}^{(i)}\delta_{X_{1:i-1}^{(i)}}(dx_{1:n-1})\] \[W_{(n-1)}^{(i)}=w_{n-1}(X_{1:n-1}^{(i)})/\sum_{j=1}^N w_{n-1}(X_{1:n-1}^{(j)})\]at time $n$, extend the path of each particle with a markov kernel $K_n(x_{n-1},x_n)$
\[w_n(x_{1:n})=\tilde \gamma_n(x_{1:n})/\eta_n(x_{1:n}) = w_{n-1}(x_{1:n-1})\tilde w_n(x_{n-1},x_n)\]where the so-called incremental weight $\tilde w_n(x_{n-1},x_n)$ is equal to
\[\tilde w_n(x_{n-1},w_n)=\frac{\gamma_n(x_n)L_{n-1}(x_n,x_{n-1})}{\gamma_{n-1}(x_{n-1}K_n(x_{n-1},x_n))}\]Algorithm
notion:
$\eta_n(x)$: the importance distribution
$\pi_n$: a target density
step 1: initialization
set $n = 1$
for $i=1,\ldots, N$ draw $X_1^{(i)}\sim \eta_1$
\[w_1(X_1^{(i)})=\gamma_1(X_1^{(i)})/\eta_1(X_1^{(i)})\]and normalize these weights to obtain ${W_1^{(i)}}$
Iterate steps 3 and 3
step 2: resampling
if ESS < T, resample the particles and set $W_n^{(i)}=1/N$
step 3: sampling
set n = n+1; if n = p+1 stop;
for $i = 1, \ldots, N$ draw $X_n^{(i)} \sim K_n(X_{n-1}^{(i)}, \cdot)$
\[\tilde w_n(x_{n-1}, x_n) = \frac{\gamma_n(x_nL_{n-1}(x_n, x_{n-1}))}{\gamma_{n-1}(x_{n-1}K_n(x_{n-1}, x_n))}\]normalize the weights
\[W_n^{(i)} = \frac{W_{n-1}^{(i)}\tilde w_n(X_{n-1:n}^{(i)})}{\sum\limits_{j=1}^NW_{n-1}^{(j)}\tilde w_n(X_{n-1:n}^{(j)})}\]Notes on algorithm
Estimates of target distributions and normalizing constants
\[\pi_n^N(dx) = \sum\limits_{i=1}^NW_n^{(i)}\delta_{X_n^{i}}(dx)\] \[\frac{Z_n}{Z_{n-1}}=\frac{\int \gamma_n(x_n)dx}{\int \gamma_{n-1}(x_{n-1})dx_{n-1}}\] \[\widehat{\frac{Z_n}{Z_{n-1}}}=\sum\limits_{i=1}^NW_{n-1}^{(i)}\tilde w_n(X_{n-1:n}^{(i)})\]Mixture of Markov kernels
it is to SMC sampling what the MH algorithm is to MCMC sampling
Algorithm settings
optimal backward kernels
suboptimal backward kernels
From MCMC to SMC
Methodology
- build a sequence of distributions ${\pi_n}, n=1,\ldots, p$
- build a sequence of MCMC transition kernels
- based on ${\pi_n}$ and ${K_n}$, build a sequence of artificial backward markov kernels ${L_n}$
- use the SMC to approximate ${\pi_n}$ and to estimate ${Z_n}$