WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Multiple Tracking with Rao-Blackwellized marginal particle filtering

Posted on
Tags: Multiple Object Tracking, Particle Filter, Rao-Blackwell

This note is for Smal, I., Meijering, E., Draegestein, K., Galjart, N., Grigoriev, I., Akhmanova, A., … Niessen, W. (2008). Multiple object tracking in molecular bioimaging by Rao-Blackwellized marginal particle filtering. Medical Image Analysis, 12(6), 764–777.


  • probabilistic tracking algorithms, based on Bayesian estimation, have recently been shown to offer several improvements over classical approaches, by better integration of spatial and temporal information, and the possibility to more effectively incorporate prior knowledge about object dynamics and image formation.


the majority of approaches that have been proposed so far for tracking small objects in bioimaging applications consist of two stages.

  1. objects are detected separately in each frame of the image sequence
  2. attempt to solve the interframe correspondence problem in linking detected objects between frames

since the two stages are usually completely separated, without the possibility of feedback from linking to detection and vice versa, the tracking performance of such approaches is often suboptimal and extremely sensitive to failures in either stage

take the successful particle filtering framework as a starting point and propose five fundamental changes that make the algorithm more flexible, more robust, and more accurate.

Probabilistic tracking framework

Particle filtering approach

  • unobserved state $\x_t$ of an object
  • noisy measurements $\z_{1:t}\triangleq \{\z_1,\ldots,\z_t\}$
  • the evolution of the hidden state is assumed to be known ad modeled as a Markov process of initial distribution $p(\x_0)$ and the transition prior $p(\x_t\mid \x_{t-1})$.

we have

\[p(\x_{0:t}\mid \z_{1:t})\propto p(\z_t\mid \x_t) p(\x_t\mid \x_{t-1})p(x_{0:t-1}\mid \z_{1:t-1})\]


\[p(\x_t\mid \z_{1:t}) \propto p(\z_t\mid \x_t)\int p(\x_t\mid\x_{t-1})p(\x_{t-1}\mid \z_{1:t-1})d\x_{t-1}\,.\]

the exact solution can be obtained only for a restrictive set of cases, such as linear Gaussian modeling, in which case Kalman filtering can be applied.

SMC, in particular particle filtering can be used as an efficient numerical approximation.

the basic idea is to represent the required posterior $p(\x_{0:t}\mid \z_{1:t})$ as a set of $N_s$ random samples (particles) and associated weights $\{\x_{0:t}^{(i)},w_t^{(i)}\}_{i=1}^{N_s}$:

\[p(\x_{0:t}\mid \z_{1:t}) \approx \sum_{i=1}^{N_s}w_t^{(i)}\delta(\x_{0:t}-\x_{0:t}^{(i)})\,.\]

In order to calculate the weights recursively, the importance density is factorized as

\[q_0(\x_{0:t}\mid \z_{1:t})=q(\x_t\mid \x_{0:t-1},\z_{1:t})q(\x_{0:t-1}\mid \z_{1:t-1})\,.\]

The particle representation of the posterior at time $t$ is obtained by augmenting the set of existing particles $\x_{0:t-1}^{(i)}$, with the new state $\x_t^{(i)}\sim q(\x_t\mid \x_{0:t-1}^{(i)},\z_{1:t})$, allowing the weight $w_t^{(i)}$ to be updated as

\[\tilde w_t^{(i)} = \frac{p(\x_{0:t}^{(i)}\mid \z_{1:t})}{q(\x_{0:t}^{(i)}\mid \z_{1:t})} = \frac{p(\z_t\mid \x_t^{(i)})p(\x_t^{(i)}\mid \x_{t-1}^{(i)})}{q(\x_t^{(i)}\mid \x_{0:t-1}^{(i)},\z_{1:t})}w_{t-1}^{(i)}\]

I think it missed the coefficient of $p()$, although it does not make differences for the relative weights

and normalized to

\[w_t^{(i)} = \frac{\tilde w_t^{(i)}}{\sum_{i=1}^{N_s}\tilde w_t^{(i)}}\,.\]

Multiple object tracking

tracking of multiple objects within the PF framework can be done by extending each state vector $\x_t^{(i)}$ to include jointly the states of all objects at time $t$. This approach is robust, but it drastically increases the dimensionality of the state space, leading to an exponential explosion of computational demands.

the alternative is to use an independent PF, which is cheap but is prone to errors, especially during object interactions.

proposal: for objects that are far from other objects and do not interact at time $t$, the PFs are run independently, while for objects that come close to each other and do interact, use joint sampling and updating of the weights, in combination with a reclustering procedure.

use the multimodal posterior distribution is represented as a mixture of individual non-parametric distributions,

\[p(\x_t\mid \z_{1:t}) = \sum_{m=1}^M \pi_{m,t}p_m(\x_t\mid \z_{1:t})\,,\]

where $M$ is the number of objects, and $\pi_{m,t}$ are normalized object weights.

The particle representation of the filtering distribution consists of $N=MN_s$ particles, and is given by $\{\{\x_{m,t}^{(i)},w_{m,t}^{(i)}\}_{i=1}^{N_s}\}_{m=1}^M$, and

\[\pi_{m,t} = \frac{\pi_{m,t-1}\sum_{i=1}^{N_s}\tilde w_{m,t}^{(i)}}{\sum_{n=1}^M\sum_{i=1}^{N_s}\pi_{n,t-1}\tilde w_{n,t}^{(i)}}\]

Dynamic models

Bayesian tracking requires the specification of the transition prior, $p(\x_t\mid\x_{t-1})$, which models the dynamics of the objects to be tracked. This prior is application dependent and should be defined based on prior knowledge about the object motion patterns.

To deal with different motion patterns, consider two transition models, define the state vector as

\[\x_t = (x_t,\dot \x_t,y_t,\dot y_t,\sigma_{\max,t},\sigma_{\min,t},I_t)'\,,\]


  • the object shape feature vector: $(\sigma_{\max, t},\sigma_{\min,t})’\triangleq \s_t$
  • the position vector: $(x_t,y_t)’\triangleq \r_t$
  • velocity: $(\dot x_t,\dot y_t)’\triangleq \v_t$
  • intensity: $I_t$.

Define $\y_t=(x_t,\dot x_t,y_t,\dot y_t)’$, and assume that the changes in the position, intensity, and shape parameters are independent, we can factorize the state evolution model as

\[p(\x_t\mid \x_{t-1}) = p_y(\y_t\mid \y_{t-1})p_s(\s_t\mid\s_{t-1})p_I(I_t\mid I_{t-1})\,.\]

For the combined position/velocity factor $p_y(\y_t\mid \y_{t-1})$, define two models, define by $p_y(\y_t\mid\y_{t-1},k)$ with $k\in\{1,2\}$.

The first model is suitable for tracking objects that exhibit motion patterns similar to random walk, the evolution of the state sequence is given by

\[\r_t = \r_{t-1} + T\xi_t\,,\]

where $T$ is the temporal sampling interval (that is, the time between any two successive time frames) and $\xi_t$ is the process noise. The transition prior in this case is given by

\[p_y(\y_t\mid \y_{t-1},k=1) = N(\r_t\mid \r_{t-1}, T^2q_{1,1}\I)\,.\]

The second model describes nearly constant velocity motion with small accelerations,

\[\y_t = \F\y_{t-1} + \eta_t\,.\]

The transition prior in this case is given by

\[p_y(\y_t\mid \y_{t-1},k=2) = N(\y_t\mid \y_{t-1},q_{1,2}\Sigma)\,.\]

The transition prior for the changes in object shape is defined using a Gaussian model,

\[p_s(\s_t\mid \s_{t-1}) = N(\s_t\mid \s_{t-1},Tq_2\I)\]

As for object intensity, in order to model the process of photobleaching, use a first-order Gauss-Markov process, with the transition prior

\[p_I(I_t\mid I_{t-1}) = N(I_t\mid (1-\alpha)I_{t-1},q_3T)\]

Observation model

the intensity profiles of the objects in the images can be modeled using the point-spread function (PSF) of the microscope.

the intensity profiles of elongated objects are modeled by utilizing the velocity components from $\x_t$ as parameters in the Gaussian. In this case, for an object of intensity $I_t$ at position $\r_t$, the intensity contribution to pixel $(i,j)$ is approximated as

\[h_t(i,j;\x_t) = a_t(i, j;\r_t,\v_t,\s_t)I_t + b_t(i, j)\,,\]

where $b_t(i,j)$ is the background intensity and

\[a_t(i,j;\r_t,\v_t,\s_t) = \exp\left(-\frac 12\m_t'\R'\Sigma_t^{-1}\R\m_t\right)\]

Track management

$h$-dome transformation from (gray-scale) mathematical morphology: extract bright structures without requiring any size or shape criteria.

Based on the $h$-dome transformation, define the probability for the real-valued spatial position $\r_t$ of objects at time $t$ by the following transformation:

\[\tilde p(\r_t\mid \z_t) = (H_iH_nH_rD_h)(G_\sigma*z_t))(\r_t)\]

which can be used as the sampling function.

Generate MC samples in those regions where the probability of object existence is high. Does it similar to some operations on the original image, such as thresholds? Having these samples in every frame, apply the mean-shift algorithm, which clusters the particles into $M_n$ classes.

Next, compare the number of particles in each class to a threshold, and if the number of particles in any class is greater than $N_b$, initialize a new object for that class, where $N_b$ is the expected number of particles in the vicinity of the object if the MC sampling were done uniformly over the image.

Track termination is based on the analysis of the unnormalized weights, using the likelihood ratio test. Specifically, check whether the average of the unnormalized likelihood values of all particles corresponding to a given object dropped below the level $\pi_d$ of the likelihood, which corresponds to having no measurements from the object but only from the background.

In the case of object interactions, employ a reclustering procedure, and can use K-means clustering.

Incorporating multiple dynamics

In order to deal with different motion patterns, propose to use jump Markov systems (JMS), where the state-space description allows for system parameters changes over time according to a Markov chain. Interacting multiple model (IMM) filter is an example of a JMS in the case of linear Gaussian models.

Assume the state prior $p(\x_t\mid \x_{t-1},k_t)$ switches between $K$ types of motion patterns, depending on the value of the parameter $k_t$.

\[p(\x_t,k_t\mid \z_{1:t})=p(k_t\mid \z_{1:t})p(\x_{t}\mid k_t,\z_{1:t})\,.\]

The two factors are updated recursively in a so-called mixing stage and a mode-conditioned filtering stage. The mixing stage gives the predicted density $p(\x_{t-1},k_t\mid\z_{1:t-1})$ for the modal state $k_t$ as

\[p(\x_{t-1},k_t\mid \z_{1:t-1}) = p(\x_{t-1}\mid k_t,\z_{1:t-1})p(k_t\mid \z_{1:t-1})\,,\]


\[p(k_t\mid\z_{1:t-1}) = \sum_{k_{t-1}=1}^Kp(k_t\mid k_{t-1})p(k_{t-1}\mid\z_{1:t-1})\]


Applying marginalization concepts

Filtering distribution marginalization

because the sequential nature of the algorithm, the variance of the importance weights can only increase (stochastically) over time, leading to most paths having vanishingly small probability.

one way to reduce this degeneracy effect is to apply marginal particle filtering (MRF), where the filtering is performed directly on the marginal distribution $p(\x_t\mid\z_{1:t})$ instead of on the joint state. The importance weights are now on the marginal space:

\[w_t^{(i)} \propto \frac{p(\x_t^{(i)}\mid \z_{1:t})}{q(\x_t^{(i)}\mid \z_{1:t})}=\frac{p(\z_t\mid\x_t^{(i)})\sum_{j=1}^{N_s}w_{t-1}^{(j)}p(\x_t^{(i)}\mid \x_{t-1}^{(j)})}{\sum_{j=1}^{N_s}w_{t-1}^{(j)}q(\x_t^{(i)}\mid\x_t^{(j)},\z_t)}\]

Data-dependent sampling

use a mixture of the state prior and a data-dependent proposal distribution. For every object, sample $N_p=\gamma N_s$ particles $\{\x_{p,t}^{(i)}\}_{i=1}^{N_p}$ from the transition prior, where $0 < \gamma < 1$.

The importance function $q(\x_t\mid \x_{t-1},\z_t)$ in this case is given by

\[q(\x_t\mid \x_{t-1},\z_t,k_t) = \gamma p(\x_t\mid\x_{t-1},k) + (1-\gamma)\tilde q(\y_t\mid\hat y_{t-1},\z_t,k)p_s(\s_t\mid\hat \s_{t-1})p_I(I_t\mid \hat I_{t-1})\,.\]

Utilizing the image data, this proposal distribution generates samples from those areas where the likelihood is high, and that highly consistent with the most recent measurements.

Rao-Blackwellization approach

In the case of high-dimensional state spaces, the SIS becomes inefficient and leads to variance increase of the estimator. However, when the transition and observation models have an analytically tractable structure, the size of the state space can be reduced by analytical marginalization of some of the state variables. This is also called Rao-Blackwellization.

Here, for each realization (each MC particle) of the state variable $\x_t=(\y_t,\s_t,I_t)$, we have a linear Gaussian transition and observation model for the intensity, $I_t$. For such models the optimal solution can be obtained analytically by using the Kalman filter. Thus, we combine a MPF to compute the distribution of the discrete states $(\y_t,\s_t)$ with a bank of $N$ Kalman filter to compute exactly the distribution of the continuous state $I_t$.

\[p(\y_t,\s_t,I_t\mid \z_{1:t}) = p(I_t\mid \y_t,\s_t,\z_{1:t})p(\y_t,\s_t\mid \z_{1:t})\,,\]

the probability density $p(I_t\mid \y_t,\s_t,\z_{1:t})$, which is Gaussian, can be computed analytically by applying the Kalman filter.

In summary, we need to estimate only $p(\y_t,\s_t\mid \z_{1:t})$ using a MPF, in a space of reduced dimension, which satisfies the alternative recursion

\[p(\y_t\mid \s_t\mid \z_{1:t}) \propto p(\y_{t-1},\s_{t-1}\mid \z_{1:t-1})p(\z_t\mid \y_t,\s_t,\z_{1:t-1})p(\y_t,\s_t\mid \y_{t-1},\s_{t-1})\,.\]

But here I think it should be

\[p(\y_t\mid \s_t\mid \z_{1:t}) \propto p(\z_t\mid \y_t,\s_t,\z_{1:t-1})\int p(\y_t,\s_t\mid \y_{t-1},\s_{t-1})p(\y_{t-1},\s_{t-1}\mid \z_{1:t-1})d\y_{t-1}d\s_{t-1}\,.\]

The variance of the importance weights for RB(M)PF is lower for (M)PF. Also, for the same performance, in terms of both accuracy and robustness, fewer MC particles are needed. This is because the dimension of $p(\y_t,\s_t\mid \z_{1:t})$ is smaller than that of $p(\x_t\mid \z_{1:t})$. Another reason is that optimal algorithms are used in order to estimate the linear state variables.


propose a novel algorithm for simultaneous tracking of many nanoscale objects in time-lapse fluorescence microscopy image data sets.

the algorithm is built within a Bayesian tracking framework

tracking accuracy is improved by using marginalization of the filtering distribution and one of the state variables, for which the optimal solution (the Kalman filter) is used.

improved robustness is achieved by integrating a jump Markov system into the framework, which allows the use of multiple dynamics models for object motion prediction.

Published in categories Note