GhostKnockoffs: Only Summary Statistics
Posted on 0 Comments
GhostKnockoffs: main idea is to generate knockoff Z-scores directly without creating knockoff variables. The method operates with only $X^\top Y$ and $\vert Y\Vert_2^2$, where $X$ is the $n\times p$ matrix of covariates, and $Y$ is the $n\times 1$ response vector.
extend the family of GhostKnockoffs methods to incorporate feature importance statistics obtained from penalized regression
- empirical covariance of the covariate-response pair $(X, Y)$ is available, i.e., $X^\top X, X^\top Y, \Vert Y\Vert_2^2$ are available along with the sample size $n$. Substantial power improvement over the method of He et al. (2022) due to far more effective test statistics
Model-X Knockoffs and GhostKnockoffs
conditional independence hypotheses $H_0^j: X_j\ind Y\mid X_{-j}$ for $1\le j\le p$
$n$ i.i.d. samples $(X_i, Y_i), 1\le i\le n$
two conditions:
- exchangeability: $(X_j, \tilde X_j, X_{-j}, \tilde X_{-j})\overset{d}{=}(\tilde X_j, X_j, X_{-j}, \tilde X_{-j}),\forall 1\le j\le p$
- conditional independence: $\tilde X\ind Y\mid X$
define feature importance statistics $W = w([X, \tilde X], Y)\in \IR^p$ to be any function of $X, \tilde X, Y$ such that a flip-sign property holds
\[w_j([X, \tilde X]_{swap(j)}, Y) = -w_j([X,\tilde X], Y)\]common choices include
- marginal correlation difference statistic: $W_j = \vert X_j^\top Y\vert - \vert \tilde X_j^\top Y\vert$
- lasso coefficient difference statistic: $W_j = \vert \hat\beta_j(\lambda_{CV})\vert - \vert \hat \beta_{j+p}(\lambda_{CV})\vert$
Gaussian knockoff sampler
GhostKnockoffs with marginal correlation difference statistic
sample the knockoff Z-score $\tilde \Z_s$ from $\X^\top \Y$ and $\Vert \Y\Vert_2^2$ directly, in a way such that
\[\tilde \Z_s\mid \X, \Y\overset{d}{=} \tilde \X^\top \Y\mid \X, \Y\]where $\tilde X=G(X,\Sigma)$ is the knockoff matrix generated by the Gaussian knockoff sampler. Then take $W = Z_s - \tilde Z_s$
\[\tilde \Z_s = \P^\top \X^\top \Y + \Vert \Y\Vert_2 \Z\; \text{where } \Z \sim N(0, \V) \text{ is indepndent of }\X \text{ and }\Y \tag{6}\]GhostKnockoffs with Penalized Regression: Known Empirical Covariance
in addition to $\X^\top\Y$ and $\Vert \Y\Vert_2^2$, assume we have $\X^\top\X$ and the sample size $n$
GhostKnockoffs with the Lasso
\[\hat\beta(\lambda) \in \argmin_{\beta\in \IR^{2p}} \frac 12\Vert \Y - [\X \tilde \X]\beta\Vert^2_2 + \lambda\Vert \beta\Vert_1\,,\]where $\tilde\X = \cG(\X, \Sigma)$. Then define the lasso coefficient difference feature importance statistics
\[W_j = \vert \hat\beta_j(\lambda)\vert -\vert \hat \beta_{j+p}(\lambda)\vert, 1\le j\le p\]Define the Gram matrix of $[\X, \tilde \X, \Y]$,
\[\cT(\X, \tilde \X, \Y) = [\X, \tilde \X, \Y]^\top [\X, \tilde \X, \Y]\]main idea: sample from the joint distribution $\cT(\X, \tilde \X, \Y)$ using the Gram matrix of $[\X, \Y]$ only.
Proposition 1: If one generate “fake” data matrices $\check\X$ and $\check\Y$ that lead to the same Gram matrix as that of $\X$ and $\Y$, then the distribution of $\cT$ remains unchanged if one replace the original matrices by the fake data matrices
GhostKnockoffs with the square-root Lasso
the square-root Lasso, for which the choice of a reasonable tuning parameter is convenient.
\[\argmin \Vert \Y- [\X \tilde \X]\beta\Vert_2 + \lambda \Vert \beta\Vert_1\]and a good choice of $\lambda$ is given by
\[\lambda = \kappa \bbE\left[\frac{\Vert [X \tilde X]^\top \varepsilon\Vert_\infty}{\Vert\epsilon\Vert_2} \mid X, \tilde X\right]\]GhostKnockoffs with the Lasso-max
\[W_j = \sup\{\lambda:\hat\beta_j(\lambda) \neq 0\} - \sup\{\lambda: \hat \beta_{j+p}(\lambda)\neq 0\}\]GhostKnockoffs with Penalized Regression: Missing Empirical Covariance
in applications such as genetics, $\X^\top\X$ may not be available.
assume only know $\X^\top\Y, \Vert \Y\Vert^2$ and the sample size $n$, assume that $X\sim N(0, \Sigma)$, where the covariance matrix is known
GhostKnockoffs with pseudo-lasso
idea: modify the Lasso objective function so that it can be constrcuted from the available summary statistics.
Choice of tuning parameter
- lasso-min
- pseudo-sum
Variants of GhostKnockoffs
Multi-knockoffs
Group knockoffs
Conditional randomization test
He et al. (2021)
- $X_i = (X_{i1},\ldots, X_{iq})$: vector of covariates
- $G_i = (G_{i1},\ldots, G_{ip})$: vector of genotype
- $Y_i$: genotype
the per-sample score statistic can be written as $G_i^\top Y_i$. The z-scores aggregating all samples can be written as
\[Z_{score} = \frac{1}{\sqrt n}G^\top Y\]the knockoff counterpart for $Z_{score}$ can be directly generated by
\[\tilde Z_{score} = PZ_{score} + E,\]where $E\sim N(0, V)$.
define a W-statistic that quantifies the magnitude of effect on the outcome as
\[W = (T-\text{median}_{1\le m\le M} T^m) I_{T\ge \max_{1\le m\le M} T^m}\,.\]define the knockoff statistics
\[\kappa_j = \argmax_{0\le m\le M} T_j^m\,, \qquad \tau_j = T_j^{(0)} - \text{median}_{1\le m\le M} T_j^{(m)}\,,\]where $m$ indicates the $m$-th knockoff.
define the threshold for the knockoff filter as
\[\tau = \min\left\{ t > 0: \frac{\frac 1M + \frac 1M\#\{\kappa_j \ge 1, \tau_j \ge t\} }{\#\{\kappa_j = 0, \tau_j \ge t\} } \le q \right\}\]=============
\[\hat\beta^{ls} = \argmin_{\beta}\Vert \bfY - \bfX\beta\Vert^2_2 = (\bfX^\top\bfX)^{-1}\bfX^\top\bfY\]In the marginal regression,
\[\hat\beta^{marginal}_j = \argmin_{\beta}\Vert \bfY - \bfX_j\beta\Vert^2_2 = (\bfX_j^\top\bfX_j)^{-1}\bfX_j^\top\bfY\]Note that
\[\hat\beta^{ridge} = \argmin_{\beta}\Vert \bfY - \bfX\beta\Vert^2_2 + \lambda \Vert\beta\Vert_2^2 = (\bfX^\top\bfX + \lambda\bfI)^{-1}\bfX^\top\bfY\]then with augmented design matrix $[\bfX\,\tilde\bfX]$, the ridge solution only depends on $[\bfX\,\tilde\bfX]^\top[\bfX\,\tilde\bfX]$ and $[\bfX\,\tilde\bfX]^\top\bfY$, i.e., only $\bfX^\top\bfX, \tilde\bfX^\top\bfX, \tilde \bfX^\top\tilde\bfX, \bfX^\top\bfY, \tilde \bfX^\top\bfY$, which are all components of the Gram matrix of $[\bfX\,\tilde\bfX\,\bfY]$ (actually $\bfY^\top\bfY$ is not necessary, as also discussed in the paper on page 6)
The lasso solution can be obtained via an iterative ridge strategy (sec 6.4.2 of http://arxiv.org/abs/1509.09169), so we can also obtain an (approximated) lasso from ridge which only depends on the Gram matrix of $[\bfX\,\tilde\bfX\,\bfY]$.
\[\hat\beta^{lasso} = \argmin_{\beta}\Vert \bfY - \bfX\beta\Vert^2_2 + \lambda \Vert\beta\Vert_1\]