GhostKnockoffs: Only Summary Statistics

Posted on May 23, 2024 (Update: Jun 24, 2025)

Tags: Knockoffs, Lasso, GWAS

This note is for Chen, Z., He, Z., Chu, B. B., Gu, J., Morrison, T., Sabatti, C., & Candès, E. (2024). Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression (arXiv:2402.12724). arXiv.

GhostKnockoffs: main idea is to generate knockoff Z-scores directly without creating knockoff variables. The method operates with only $X^\top Y$ and $\vert Y\Vert_2^2$, where $X$ is the $n\times p$ matrix of covariates, and $Y$ is the $n\times 1$ response vector.

extend the family of GhostKnockoffs methods to incorporate feature importance statistics obtained from penalized regression

empirical covariance of the covariate-response pair $(X, Y)$ is available, i.e., $X^\top X, X^\top Y, \Vert Y\Vert_2^2$ are available along with the sample size $n$. Substantial power improvement over the method of He et al. (2022) due to far more effective test statistics

Model-X Knockoffs and GhostKnockoffs

conditional independence hypotheses $H_0^j: X_j\ind Y\mid X_{-j}$ for $1\le j\le p$

$n$ i.i.d. samples $(X_i, Y_i), 1\le i\le n$

two conditions:

exchangeability: $(X_j, \tilde X_j, X_{-j}, \tilde X_{-j})\overset{d}{=}(\tilde X_j, X_j, X_{-j}, \tilde X_{-j}),\forall 1\le j\le p$
conditional independence: $\tilde X\ind Y\mid X$

define feature importance statistics $W = w([X, \tilde X], Y)\in \IR^p$ to be any function of $X, \tilde X, Y$ such that a flip-sign property holds

\[w_j([X, \tilde X]_{swap(j)}, Y) = -w_j([X,\tilde X], Y)\]

common choices include

marginal correlation difference statistic: $W_j = \vert X_j^\top Y\vert - \vert \tilde X_j^\top Y\vert$
lasso coefficient difference statistic: $W_j = \vert \hat\beta_j(\lambda_{CV})\vert - \vert \hat \beta_{j+p}(\lambda_{CV})\vert$

Gaussian knockoff sampler

GhostKnockoffs with marginal correlation difference statistic

sample the knockoff Z-score $\tilde \Z_s$ from $\X^\top \Y$ and $\Vert \Y\Vert_2^2$ directly, in a way such that

\[\tilde \Z_s\mid \X, \Y\overset{d}{=} \tilde \X^\top \Y\mid \X, \Y\]

where $\tilde X=G(X,\Sigma)$ is the knockoff matrix generated by the Gaussian knockoff sampler. Then take $W = Z_s - \tilde Z_s$

\[\tilde \Z_s = \P^\top \X^\top \Y + \Vert \Y\Vert_2 \Z\; \text{where } \Z \sim N(0, \V) \text{ is indepndent of }\X \text{ and }\Y \tag{6}\]

GhostKnockoffs with Penalized Regression: Known Empirical Covariance

in addition to $\X^\top\Y$ and $\Vert \Y\Vert_2^2$, assume we have $\X^\top\X$ and the sample size $n$

GhostKnockoffs with the Lasso

\[\hat\beta(\lambda) \in \argmin_{\beta\in \IR^{2p}} \frac 12\Vert \Y - [\X \tilde \X]\beta\Vert^2_2 + \lambda\Vert \beta\Vert_1\,,\]

where $\tilde\X = \cG(\X, \Sigma)$. Then define the lasso coefficient difference feature importance statistics

\[W_j = \vert \hat\beta_j(\lambda)\vert -\vert \hat \beta_{j+p}(\lambda)\vert, 1\le j\le p\]

Define the Gram matrix of $[\X, \tilde \X, \Y]$,

\[\cT(\X, \tilde \X, \Y) = [\X, \tilde \X, \Y]^\top [\X, \tilde \X, \Y]\]

main idea: sample from the joint distribution $\cT(\X, \tilde \X, \Y)$ using the Gram matrix of $[\X, \Y]$ only.

Proposition 1: If one generate “fake” data matrices $\check\X$ and $\check\Y$ that lead to the same Gram matrix as that of $\X$ and $\Y$, then the distribution of $\cT$ remains unchanged if one replace the original matrices by the fake data matrices

GhostKnockoffs with the square-root Lasso

the square-root Lasso, for which the choice of a reasonable tuning parameter is convenient.

\[\argmin \Vert \Y- [\X \tilde \X]\beta\Vert_2 + \lambda \Vert \beta\Vert_1\]

and a good choice of $\lambda$ is given by

\[\lambda = \kappa \bbE\left[\frac{\Vert [X \tilde X]^\top \varepsilon\Vert_\infty}{\Vert\epsilon\Vert_2} \mid X, \tilde X\right]\]

GhostKnockoffs with the Lasso-max

\[W_j = \sup\{\lambda:\hat\beta_j(\lambda) \neq 0\} - \sup\{\lambda: \hat \beta_{j+p}(\lambda)\neq 0\}\]

GhostKnockoffs with Penalized Regression: Missing Empirical Covariance

in applications such as genetics, $\X^\top\X$ may not be available.

assume only know $\X^\top\Y, \Vert \Y\Vert^2$ and the sample size $n$, assume that $X\sim N(0, \Sigma)$, where the covariance matrix is known

GhostKnockoffs with pseudo-lasso

idea: modify the Lasso objective function so that it can be constrcuted from the available summary statistics.

Choice of tuning parameter

lasso-min

pseudo-sum

Variants of GhostKnockoffs

Multi-knockoffs

Group knockoffs

Conditional randomization test

He et al. (2021)

This section is for the paper He, Z., Liu, L., Belloy, M. E., Le Guen, Y., Sossin, A., Liu, X., Qi, X., Ma, S., Gyawali, P. K., Wyss-Coray, T., Tang, H., Sabatti, C., Candès, E., Greicius, M. D., & Ionita-Laza, I. (2022). GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies. Nature Communications, 13(1), 7209.

$X_i = (X_{i1},\ldots, X_{iq})$: vector of covariates
$G_i = (G_{i1},\ldots, G_{ip})$: vector of genotype
$Y_i$: genotype

\[g(\mu_i) = \alpha_0 + \alpha^TX_i + \beta^TG_i\]

the per-sample score statistic can be written as $G_i^\top Y_i$. The z-scores aggregating all samples can be written as

\[Z_{score} = \frac{1}{\sqrt n}G^\top Y\]

the knockoff counterpart for $Z_{score}$ can be directly generated by

\[\tilde Z_{score} = PZ_{score} + E,\]

where $E\sim N(0, V)$.

define a W-statistic that quantifies the magnitude of effect on the outcome as

\[W = (T-\text{median}_{1\le m\le M} T^m) I_{T\ge \max_{1\le m\le M} T^m}\,.\]

define the knockoff statistics

\[\kappa_j = \argmax_{0\le m\le M} T_j^m\,, \qquad \tau_j = T_j^{(0)} - \text{median}_{1\le m\le M} T_j^{(m)}\,,\]

where $m$ indicates the $m$-th knockoff.

define the threshold for the knockoff filter as

\[\tau = \min\left\{ t > 0: \frac{\frac 1M + \frac 1M\#\{\kappa_j \ge 1, \tau_j \ge t\} }{\#\{\kappa_j = 0, \tau_j \ge t\} } \le q \right\}\]

=============

\[\hat\beta^{ls} = \argmin_{\beta}\Vert \bfY - \bfX\beta\Vert^2_2 = (\bfX^\top\bfX)^{-1}\bfX^\top\bfY\]

In the marginal regression,

\[\hat\beta^{marginal}_j = \argmin_{\beta}\Vert \bfY - \bfX_j\beta\Vert^2_2 = (\bfX_j^\top\bfX_j)^{-1}\bfX_j^\top\bfY\]

Note that

\[\hat\beta^{ridge} = \argmin_{\beta}\Vert \bfY - \bfX\beta\Vert^2_2 + \lambda \Vert\beta\Vert_2^2 = (\bfX^\top\bfX + \lambda\bfI)^{-1}\bfX^\top\bfY\]

then with augmented design matrix $[\bfX\,\tilde\bfX]$, the ridge solution only depends on $[\bfX\,\tilde\bfX]^\top[\bfX\,\tilde\bfX]$ and $[\bfX\,\tilde\bfX]^\top\bfY$, i.e., only $\bfX^\top\bfX, \tilde\bfX^\top\bfX, \tilde \bfX^\top\tilde\bfX, \bfX^\top\bfY, \tilde \bfX^\top\bfY$, which are all components of the Gram matrix of $[\bfX\,\tilde\bfX\,\bfY]$ (actually $\bfY^\top\bfY$ is not necessary, as also discussed in the paper on page 6)

The lasso solution can be obtained via an iterative ridge strategy (sec 6.4.2 of http://arxiv.org/abs/1509.09169), so we can also obtain an (approximated) lasso from ridge which only depends on the Gram matrix of $[\bfX\,\tilde\bfX\,\bfY]$.

\[\hat\beta^{lasso} = \argmin_{\beta}\Vert \bfY - \bfX\beta\Vert^2_2 + \lambda \Vert\beta\Vert_1\]

Published in categories Note

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.