# GhostKnockoffs: Only Summary Statistics

##### Posted on May 23, 20240 Comments
Tags: Knockoff, Lasso, GWAS

GhostKnockoffs: main idea is to generate knockoff Z-scores directly without creating knockoff variables. The method operates with only $X^\top Y$ and $\vert Y\Vert_2^2$, where $X$ is the $n\times p$ matrix of covariates, and $Y$ is the $n\times 1$ response vector.

extend the family of GhostKnockoffs methods to incorporate feature importance statistics obtained from penalized regression

• empirical covariance of the covariate-response pair $(X, Y)$ is available, i.e., $X^\top X, X^\top Y, \Vert Y\Vert_2^2$ are available along with the sample size $n$. Substantial power improvement over the method of He et al. (2022) due to far more effective test statistics

## Model-X Knockoffs and GhostKnockoffs

conditional independence hypotheses $H_0^j: X_j\ind Y\mid X_{-j}$ for $1\le j\le p$

$n$ i.i.d. samples $(X_i, Y_i), 1\le i\le n$

two conditions:

• exchangeability: $(X_j, \tilde X_j, X_{-j}, \tilde X_{-j})\overset{d}{=}(\tilde X_j, X_j, X_{-j}, \tilde X_{-j}),\forall 1\le j\le p$
• conditional independence: $\tilde X\ind Y\mid X$

define feature importance statistics $W = w([X, \tilde X], Y)\in \IR^p$ to be any function of $X, \tilde X, Y$ such that a flip-sign property holds

$w_j([X, \tilde X]_{swap(j)}, Y) = -w_j([X,\tilde X], Y)$

common choices include

• marginal correlation difference statistic: $W_j = \vert X_j^\top Y\vert - \vert \tilde X_j^\top Y\vert$
• lasso coefficient difference statistic: $W_j = \vert \hat\beta_j(\lambda_{CV})\vert - \vert \hat \beta_{j+p}(\lambda_{CV})\vert$

## GhostKnockoffs with marginal correlation difference statistic

sample the knockoff Z-score $\tilde \Z_s$ from $\X^\top \Y$ and $\Vert \Y\Vert_2^2$ directly, in a way such that

$\tilde \Z_s\mid \X, \Y\overset{d}{=} \tilde \X^\top \Y\mid \X, \Y$

where $\tilde X=G(X,\Sigma)$ is the knockoff matrix generated by the Gaussian knockoff sampler. Then take $W = Z_s - \tilde Z_s$

$\tilde \Z_s = \P^\top \X^\top \Y + \Vert \Y\Vert_2 \Z\; \text{where } \Z \sim N(0, \V) \text{ is indepndent of }\X \text{ and }\Y \tag{6}$

## GhostKnockoffs with Penalized Regression: Known Empirical Covariance

in addition to $\X^\top\Y$ and $\Vert \Y\Vert_2^2$, assume we have $\X^\top\X$ and the sample size $n$

### GhostKnockoffs with the Lasso

$\hat\beta(\lambda) \in \argmin_{\beta\in \IR^{2p}} \frac 12\Vert \Y - [\X \tilde \X]\beta\Vert^2_2 + \lambda\Vert \beta\Vert_1\,,$

where $\tilde\X = \cG(\X, \Sigma)$. Then define the lasso coefficient difference feature importance statistics

$W_j = \vert \hat\beta_j(\lambda)\vert -\vert \hat \beta_{j+p}(\lambda)\vert, 1\le j\le p$

Define the Gram matrix of $[\X, \tilde \X, \Y]$,

$\cT(\X, \tilde \X, \Y) = [\X, \tilde \X, \Y]^\top [\X, \tilde \X, \Y]$

main idea: sample from the joint distribution $\cT(\X, \tilde \X, \Y)$ using the Gram matrix of $[\X, \Y]$ only.

Proposition 1: If one generate “fake” data matrices $\check\X$ and $\check\Y$ that lead to the same Gram matrix as that of $\X$ and $\Y$, then the distribution of $\cT$ remains unchanged if one replace the original matrices by the fake data matrices

### GhostKnockoffs with the square-root Lasso

the square-root Lasso, for which the choice of a reasonable tuning parameter is convenient.

$\argmin \Vert \Y- [\X \tilde \X]\beta\Vert_2 + \lambda \Vert \beta\Vert_1$

and a good choice of $\lambda$ is given by

$\lambda = \kappa \bbE\left[\frac{\Vert [X \tilde X]^\top \varepsilon\Vert_\infty}{\Vert\epsilon\Vert_2} \mid X, \tilde X\right]$

### GhostKnockoffs with the Lasso-max

$W_j = \sup\{\lambda:\hat\beta_j(\lambda) \neq 0\} - \sup\{\lambda: \hat \beta_{j+p}(\lambda)\neq 0\}$

## GhostKnockoffs with Penalized Regression: Missing Empirical Covariance

in applications such as genetics, $\X^\top\X$ may not be available.

assume only know $\X^\top\Y, \Vert \Y\Vert^2$ and the sample size $n$, assume that $X\sim N(0, \Sigma)$, where the covariance matrix is known

### GhostKnockoffs with pseudo-lasso

idea: modify the Lasso objective function so that it can be constrcuted from the available summary statistics.

• lasso-min
• pseudo-sum

## He et al. (2021)

• $X_i = (X_{i1},\ldots, X_{iq})$: vector of covariates
• $G_i = (G_{i1},\ldots, G_{ip})$: vector of genotype
• $Y_i$: genotype
$g(\mu_i) = \alpha_0 + \alpha^TX_i + \beta^TG_i$

the per-sample score statistic can be written as $G_i^\top Y_i$. The z-scores aggregating all samples can be written as

$Z_{score} = \frac{1}{\sqrt n}G^\top Y$

the knockoff counterpart for $Z_{score}$ can be directly generated by

$\tilde Z_{score} = PZ_{score} + E,$

where $E\sim N(0, V)$.

define a W-statistic that quantifies the magnitude of effect on the outcome as

$W = (T-\text{median}_{1\le m\le M} T^m) I_{T\ge \max_{1\le m\le M} T^m}\,.$

define the knockoff statistics

$\kappa_j = \argmax_{0\le m\le M} T_j^m\,, \qquad \tau_j = T_j^{(0)} - \text{median}_{1\le m\le M} T_j^{(m)}\,,$

where $m$ indicates the $m$-th knockoff.

define the threshold for the knockoff filter as

$\tau = \min\left\{ t > 0: \frac{\frac 1M + \frac 1M\#\{\kappa_j \ge 1, \tau_j \ge t\} }{\#\{\kappa_j = 0, \tau_j \ge t\} } \le q \right\}$

=============

$\hat\beta^{ls} = \argmin_{\beta}\Vert \bfY - \bfX\beta\Vert^2_2 = (\bfX^\top\bfX)^{-1}\bfX^\top\bfY$

In the marginal regression,

$\hat\beta^{marginal}_j = \argmin_{\beta}\Vert \bfY - \bfX_j\beta\Vert^2_2 = (\bfX_j^\top\bfX_j)^{-1}\bfX_j^\top\bfY$

Note that

$\hat\beta^{ridge} = \argmin_{\beta}\Vert \bfY - \bfX\beta\Vert^2_2 + \lambda \Vert\beta\Vert_2^2 = (\bfX^\top\bfX + \lambda\bfI)^{-1}\bfX^\top\bfY$

then with augmented design matrix $[\bfX\,\tilde\bfX]$, the ridge solution only depends on $[\bfX\,\tilde\bfX]^\top[\bfX\,\tilde\bfX]$ and $[\bfX\,\tilde\bfX]^\top\bfY$, i.e., only $\bfX^\top\bfX, \tilde\bfX^\top\bfX, \tilde \bfX^\top\tilde\bfX, \bfX^\top\bfY, \tilde \bfX^\top\bfY$, which are all components of the Gram matrix of $[\bfX\,\tilde\bfX\,\bfY]$ (actually $\bfY^\top\bfY$ is not necessary, as also discussed in the paper on page 6)

The lasso solution can be obtained via an iterative ridge strategy (sec 6.4.2 of http://arxiv.org/abs/1509.09169), so we can also obtain an (approximated) lasso from ridge which only depends on the Gram matrix of $[\bfX\,\tilde\bfX\,\bfY]$.

$\hat\beta^{lasso} = \argmin_{\beta}\Vert \bfY - \bfX\beta\Vert^2_2 + \lambda \Vert\beta\Vert_1$

Published in categories Note