Exponential Twisting in Importance Sampling

Posted on Sep 18, 2019 (Update: Mar 01, 2020)

Tags: Importance Sampling, Delta Method, Effective Sample Size

This note is based on Ma, J., Du, K., & Gu, G. (2019). An efficient exponential twisting importance sampling technique for pricing financial derivatives. Communications in Statistics - Theory and Methods, 48(2), 203–219.

Consider the problem of estimating

\[\alpha = E_f(G(X)) = \int G(x)f(x)dx\]

where $X$ is a random vector of $\IR^N$ with probability density $f$, and $G$ is a function from $\IR^N$ to $\IR$.

The usual MC estimate is

\[\alpha_f = \frac 1n\sum_{i=1}^nG(X_i)\,,\]

where $X_i$ independently draws from $f$.

If there is another probability density $g$, then alternatively

\[\alpha = E_g\left[G(X)\frac{f(X)}{g(X)}\right]\,.\]

In the framework of Gaussian vector, the ordinal density $f(x)$ is

\[f(x) = (2\pi)^{-N/2}\exp\left(-\frac 12\Vert x\Vert^2\right)\,,\]

where $x = (x_1,\ldots,x_N)$.

Some research is mainly on adding a drift vector $\mu=(\mu_1,\ldots,\mu_N)$ and the new corresponding density is

\[g_\mu(x) = (2\pi)^{-N/2}\exp\left(-\frac 12\Vert x-\mu\Vert^2\right)\,,\]

then the key problem is how to achieve the following minimum

\[\min_\mu E_f\left[ G^2(X) \frac{f(X)}{g_\mu(X)} \right]\,,\]

since

\[E_g\left[ \left(G(X)\frac{f(X)}{g(X)}\right)^2 \right] =E_f\left[G(X)^2 \frac{f(X)}{g(X)} \right]\,,\]

and the importance sampling estimate is an unbiased estimate.

Consider a ratio estimator
\[\delta_h^n = \sum_{i=1}^nw_ih(x_i)/\sum_{i=1}^nw_i\,,\]
where $X_i\sim g(y)$ and $g$ is a candidate distribution for target $f$. The $w_i$’s are realizations of random variables $W_i$ such that $\E[W_i\mid X_i=x] = \kappa f(x)/g(x)$, $\kappa$ being an arbitrary constant (that corresponds to the lack of normalizing constants in $f$ and $g$)but $f$ and $g$ are already supposed to be density functions. Denote
\[S_h^n = \sum_{i=1}^n W_ih(X_i), S_1^n=\sum_{i=1}^nW_i\,.\]
Then the asymptotic variance of $\delta_h^n$ is
\[\var(\delta_h^n) = \frac{1}{n^2\kappa^2}(\var(S_h^n)-2\E_f[h]\cov(S_h^n, S_1^n) + (\E_f[h])^2\var(S_1^n))\,.\]

The key step in the proof is the approximation of the variance of a ratio $X/Y$ by the delta method.

Let $T_1,\ldots, T_k$ be random variables with means $\theta_1,\ldots,\theta_k$. Then
\[\var_\theta(g(T))\approx \sum_{i=1}^k[g_i'(\theta)]^2\var_\theta T_i+2\sum_{i > j}g_i'(\theta)g_j'(\theta)\cov_\theta(T_i,T_j)\,.\]

Further calculation and approximation give Liu (1996)’s ‘rule of thumb’: the effective sample size resulting from an importance sampling procedure is approximately

\[\frac{N}{1+\var_g(w(X))}\,.\]

For a cumulative distribution function $F$ on $\IR^N$, define the cumulant generating function of $F$,

\[\psi(\vartheta) = \log\left(\int_{\IR^N}e^{\psi_\vartheta(x)}dF(x)\right)=\log E[e^{\phi_\vartheta}(X)]\,,\]

where $\phi_\vartheta(x)$ is called exponential twisting function, if $\phi_\vartheta(x)$ is chosen to satisfy $\psi(\vartheta) < \infty$.

For each twisting function $\phi_\vartheta(x)$, set

\[F_{\phi_\vartheta}(x) = \int_{-\infty}^{x_1}\cdots\int_{-\infty}^{x_N}e^{\phi_\vartheta(y)-\psi(\vartheta)}dF(y)\,.\]

It can be showed that $F_{\phi_\vartheta}(x)$ is a probability distribution and $\{F_{\phi_\vartheta},\vartheta\}$ form an exponential family of distributions. The transformation from $F$ to $F_{\phi_\vartheta}$ is called Exponential Twisting or Exponential change of measure.

If $F$ has a density $f$, then $F_{\phi_\vartheta}$ has density

\[f_{\phi_\vartheta}(x) = e^{\phi_\vartheta(x)-\psi(\vartheta)}f(x) = \frac{f(x)}{W(x)}\,,\]

here the weighted function $W(x)=e^{-\phi_\vartheta(x)+\psi(\vartheta)}$, so $f(x)=W(x)f_{\phi_\vartheta}(x)$.

The aim is to compute the expectation $V=E[G(X)]$. Using exponential twisting function $\phi_\vartheta(x)$, we have

\[V=E[G(X)]=E_\vartheta[G(X)W(X)]\]

and the second moment of $G(X)W(X)$ under $P_{\phi_\vartheta}$

\[E_\vartheta[G^2(X)W^2(X)]=E[G^2(X)W(X)]\,.\]

By the Cauchy-Schwarz inequality,

\[E[G^2(X)W(X)] = E[G^2(X)e^{-\phi_\vartheta}(X)]\cdot E[e^{\phi_\vartheta}(X)]\ge E^2[G(X)] = V^2\,.\]

It follows that

\[\Var_\vartheta[G(X)W(X)] = E_\vartheta[G^2(X)W^2(X)] - E_\vartheta^2[G(X)W(X)] =E_\vartheta[G^2(X)W^2(X)]-V^2\,.\]

In order to achieve the optimal situation, we have

\[\log(G(X))=\varphi_\vartheta(X)\,,\]

but in this case, $\psi(\vartheta)=\log(E[G(X)])$ = \log(V) cannot be computed analytically, as $V$ is unknown.

Apply Taylor Expansion for $\log(G(X))$,

\[\log G(X) = \log G(X_1^0,\ldots,X_N^0) + \sum_{i=1}^N(X_i-X_i^0)\frac{\partial \log(G(X_1^0,X_2^0,\ldots,X_N^0))}{\partial X_i} + \\\frac{1}{2!}\sum_{i,j=1}^N(X_i-X_i^0)(X_j-X_j^0)\frac{\partial^2\log(G(X_1^0,X_2^0,\ldots,X_N^0))}{\partial X_i\partial X_j}+O(\rho^3)\,,\]

where $\rho=\sqrt{\sum_{i=1}^N(X_i-X_{i0})^2}$.

Write

\[\log(G(X)) = \beta_0 + \sum_{i=1}^N\beta_iX_i + \sum_{i=1}^N\sum_{j=i}^N\beta_{ij}X_iX_j+\varepsilon\,.\]

In order to avoid overfitting and simplify the least square regression, remove the cross terms,

\[\phi_\vartheta(x)=\beta_0 + \sum_{i=1}^N\beta_ix_i + \sum_{i=1}^N\beta_{ii}x_i^2 = \beta_0 + \beta' x + x' B x\]

Published in categories Memo

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Exponential Twisting in Importance Sampling

Posted on Sep 18, 2019 (Update: Mar 01, 2020)