WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Exponential Twisting in Importance Sampling

Posted on (Update: )
Tags: Importance Sampling, Delta Method, Effective Sample Size

This note is based on Ma, J., Du, K., & Gu, G. (2019). An efficient exponential twisting importance sampling technique for pricing financial derivatives. Communications in Statistics - Theory and Methods, 48(2), 203–219.

Consider the problem of estimating

\[\alpha = E_f(G(X)) = \int G(x)f(x)dx\]

where $X$ is a random vector of $\IR^N$ with probability density $f$, and $G$ is a function from $\IR^N$ to $\IR$.

The usual MC estimate is

\[\alpha_f = \frac 1n\sum_{i=1}^nG(X_i)\,,\]

where $X_i$ independently draws from $f$.

If there is another probability density $g$, then alternatively

\[\alpha = E_g\left[G(X)\frac{f(X)}{g(X)}\right]\,.\]

In the framework of Gaussian vector, the ordinal density $f(x)$ is

\[f(x) = (2\pi)^{-N/2}\exp\left(-\frac 12\Vert x\Vert^2\right)\,,\]

where $x = (x_1,\ldots,x_N)$.

Some research is mainly on adding a drift vector $\mu=(\mu_1,\ldots,\mu_N)$ and the new corresponding density is

\[g_\mu(x) = (2\pi)^{-N/2}\exp\left(-\frac 12\Vert x-\mu\Vert^2\right)\,,\]

then the key problem is how to achieve the following minimum

\[\min_\mu E_f\left[ G^2(X) \frac{f(X)}{g_\mu(X)} \right]\,,\]

since

\[E_g\left[ \left(G(X)\frac{f(X)}{g(X)}\right)^2 \right] =E_f\left[G(X)^2 \frac{f(X)}{g(X)} \right]\,,\]

and the importance sampling estimate is an unbiased estimate.

Consider a ratio estimator

\[\delta_h^n = \sum_{i=1}^nw_ih(x_i)/\sum_{i=1}^nw_i\,,\]

where $X_i\sim g(y)$ and $g$ is a candidate distribution for target $f$. The $w_i$’s are realizations of random variables $W_i$ such that $\E[W_i\mid X_i=x] = \kappa f(x)/g(x)$, $\kappa$ being an arbitrary constant (that corresponds to the lack of normalizing constants in $f$ and $g$)but $f$ and $g$ are already supposed to be density functions. Denote

\[S_h^n = \sum_{i=1}^n W_ih(X_i), S_1^n=\sum_{i=1}^nW_i\,.\]

Then the asymptotic variance of $\delta_h^n$ is

\[\var(\delta_h^n) = \frac{1}{n^2\kappa^2}(\var(S_h^n)-2\E_f[h]\cov(S_h^n, S_1^n) + (\E_f[h])^2\var(S_1^n))\,.\]

The key step in the proof is the approximation of the variance of a ratio $X/Y$ by the delta method.

Let $T_1,\ldots, T_k$ be random variables with means $\theta_1,\ldots,\theta_k$. Then

\[\var_\theta(g(T))\approx \sum_{i=1}^k[g_i'(\theta)]^2\var_\theta T_i+2\sum_{i > j}g_i'(\theta)g_j'(\theta)\cov_\theta(T_i,T_j)\,.\]

Further calculation and approximation give Liu (1996)’s ‘rule of thumb’: the effective sample size resulting from an importance sampling procedure is approximately

\[\frac{N}{1+\var_g(w(X))}\,.\]

For a cumulative distribution function $F$ on $\IR^N$, define the cumulant generating function of $F$,

\[\psi(\vartheta) = \log\left(\int_{\IR^N}e^{\psi_\vartheta(x)}dF(x)\right)=\log E[e^{\phi_\vartheta}(X)]\,,\]

where $\phi_\vartheta(x)$ is called exponential twisting function, if $\phi_\vartheta(x)$ is chosen to satisfy $\psi(\vartheta) < \infty$.

For each twisting function $\phi_\vartheta(x)$, set

\[F_{\phi_\vartheta}(x) = \int_{-\infty}^{x_1}\cdots\int_{-\infty}^{x_N}e^{\phi_\vartheta(y)-\psi(\vartheta)}dF(y)\,.\]

It can be showed that $F_{\phi_\vartheta}(x)$ is a probability distribution and $\{F_{\phi_\vartheta},\vartheta\}$ form an exponential family of distributions. The transformation from $F$ to $F_{\phi_\vartheta}$ is called Exponential Twisting or Exponential change of measure.

If $F$ has a density $f$, then $F_{\phi_\vartheta}$ has density

\[f_{\phi_\vartheta}(x) = e^{\phi_\vartheta(x)-\psi(\vartheta)}f(x) = \frac{f(x)}{W(x)}\,,\]

here the weighted function $W(x)=e^{-\phi_\vartheta(x)+\psi(\vartheta)}$, so $f(x)=W(x)f_{\phi_\vartheta}(x)$.

The aim is to compute the expectation $V=E[G(X)]$. Using exponential twisting function $\phi_\vartheta(x)$, we have

\[V=E[G(X)]=E_\vartheta[G(X)W(X)]\]

and the second moment of $G(X)W(X)$ under $P_{\phi_\vartheta}$

\[E_\vartheta[G^2(X)W^2(X)]=E[G^2(X)W(X)]\,.\]

By the Cauchy-Schwarz inequality,

\[E[G^2(X)W(X)] = E[G^2(X)e^{-\phi_\vartheta}(X)]\cdot E[e^{\phi_\vartheta}(X)]\ge E^2[G(X)] = V^2\,.\]

It follows that

\[\Var_\vartheta[G(X)W(X)] = E_\vartheta[G^2(X)W^2(X)] - E_\vartheta^2[G(X)W(X)] =E_\vartheta[G^2(X)W^2(X)]-V^2\,.\]

In order to achieve the optimal situation, we have

\[\log(G(X))=\varphi_\vartheta(X)\,,\]

but in this case, $\psi(\vartheta)=\log(E[G(X)])$ = \log(V) cannot be computed analytically, as $V$ is unknown.

Apply Taylor Expansion for $\log(G(X))$,

\[\log G(X) = \log G(X_1^0,\ldots,X_N^0) + \sum_{i=1}^N(X_i-X_i^0)\frac{\partial \log(G(X_1^0,X_2^0,\ldots,X_N^0))}{\partial X_i} + \\\frac{1}{2!}\sum_{i,j=1}^N(X_i-X_i^0)(X_j-X_j^0)\frac{\partial^2\log(G(X_1^0,X_2^0,\ldots,X_N^0))}{\partial X_i\partial X_j}+O(\rho^3)\,,\]

where $\rho=\sqrt{\sum_{i=1}^N(X_i-X_{i0})^2}$.

Write

\[\log(G(X)) = \beta_0 + \sum_{i=1}^N\beta_iX_i + \sum_{i=1}^N\sum_{j=i}^N\beta_{ij}X_iX_j+\varepsilon\,.\]

In order to avoid overfitting and simplify the least square regression, remove the cross terms,

\[\phi_\vartheta(x)=\beta_0 + \sum_{i=1}^N\beta_ix_i + \sum_{i=1}^N\beta_{ii}x_i^2 = \beta_0 + \beta' x + x' B x\]

Published in categories Memo