WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

GhostKnockoffs: Only Summary Statistics

Posted on 0 Comments
Tags: Knockoff, Lasso, GWAS

This note is for Chen, Z., He, Z., Chu, B. B., Gu, J., Morrison, T., Sabatti, C., & Candès, E. (2024). Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression (arXiv:2402.12724). arXiv.

GhostKnockoffs: main idea is to generate knockoff Z-scores directly without creating knockoff variables. The method operates with only $X^\top Y$ and $\vert Y\Vert_2^2$, where $X$ is the $n\times p$ matrix of covariates, and $Y$ is the $n\times 1$ response vector.

extend the family of GhostKnockoffs methods to incorporate feature importance statistics obtained from penalized regression

  • empirical covariance of the covariate-response pair $(X, Y)$ is available, i.e., $X^\top X, X^\top Y, \Vert Y\Vert_2^2$ are available along with the sample size $n$. Substantial power improvement over the method of He et al. (2022) due to far more effective test statistics

Model-X Knockoffs and GhostKnockoffs

conditional independence hypotheses $H_0^j: X_j\ind Y\mid X_{-j}$ for $1\le j\le p$

$n$ i.i.d. samples $(X_i, Y_i), 1\le i\le n$

two conditions:

  • exchangeability: $(X_j, \tilde X_j, X_{-j}, \tilde X_{-j})\overset{d}{=}(\tilde X_j, X_j, X_{-j}, \tilde X_{-j}),\forall 1\le j\le p$
  • conditional independence: $\tilde X\ind Y\mid X$

define feature importance statistics $W = w([X, \tilde X], Y)\in \IR^p$ to be any function of $X, \tilde X, Y$ such that a flip-sign property holds

\[w_j([X, \tilde X]_{swap(j)}, Y) = -w_j([X,\tilde X], Y)\]

common choices include

  • marginal correlation difference statistic: $W_j = \vert X_j^\top Y\vert - \vert \tilde X_j^\top Y\vert$
  • lasso coefficient difference statistic: $W_j = \vert \hat\beta_j(\lambda_{CV})\vert - \vert \hat \beta_{j+p}(\lambda_{CV})\vert$

Gaussian knockoff sampler

image

GhostKnockoffs with marginal correlation difference statistic

sample the knockoff Z-score $\tilde \Z_s$ from $\X^\top \Y$ and $\Vert \Y\Vert_2^2$ directly, in a way such that

\[\tilde \Z_s\mid \X, \Y\overset{d}{=} \tilde \X^\top \Y\mid \X, \Y\]

where $\tilde X=G(X,\Sigma)$ is the knockoff matrix generated by the Gaussian knockoff sampler. Then take $W = Z_s - \tilde Z_s$

\[\tilde \Z_s = \P^\top \X^\top \Y + \Vert \Y\Vert_2 \Z\; \text{where } \Z \sim N(0, \V) \text{ is indepndent of }\X \text{ and }\Y \tag{6}\]

image

GhostKnockoffs with Penalized Regression: Known Empirical Covariance

in addition to $\X^\top\Y$ and $\Vert \Y\Vert_2^2$, assume we have $\X^\top\X$ and the sample size $n$

GhostKnockoffs with the Lasso

\[\hat\beta(\lambda) \in \argmin_{\beta\in \IR^{2p}} \frac 12\Vert \Y - [\X \tilde \X]\beta\Vert^2_2 + \lambda\Vert \beta\Vert_1\,,\]

where $\tilde\X = \cG(\X, \Sigma)$. Then define the lasso coefficient difference feature importance statistics

\[W_j = \vert \hat\beta_j(\lambda)\vert -\vert \hat \beta_{j+p}(\lambda)\vert, 1\le j\le p\]

Define the Gram matrix of $[\X, \tilde \X, \Y]$,

\[\cT(\X, \tilde \X, \Y) = [\X, \tilde \X, \Y]^\top [\X, \tilde \X, \Y]\]

main idea: sample from the joint distribution $\cT(\X, \tilde \X, \Y)$ using the Gram matrix of $[\X, \Y]$ only.

Proposition 1: If one generate “fake” data matrices $\check\X$ and $\check\Y$ that lead to the same Gram matrix as that of $\X$ and $\Y$, then the distribution of $\cT$ remains unchanged if one replace the original matrices by the fake data matrices

image

GhostKnockoffs with the square-root Lasso

the square-root Lasso, for which the choice of a reasonable tuning parameter is convenient.

\[\argmin \Vert \Y- [\X \tilde \X]\beta\Vert_2 + \lambda \Vert \beta\Vert_1\]

and a good choice of $\lambda$ is given by

\[\lambda = \kappa \bbE\left[\frac{\Vert [X \tilde X]^\top \varepsilon\Vert_\infty}{\Vert\epsilon\Vert_2} \mid X, \tilde X\right]\]

GhostKnockoffs with the Lasso-max

\[W_j = \sup\{\lambda:\hat\beta_j(\lambda) \neq 0\} - \sup\{\lambda: \hat \beta_{j+p}(\lambda)\neq 0\}\]

GhostKnockoffs with Penalized Regression: Missing Empirical Covariance

in applications such as genetics, $\X^\top\X$ may not be available.

assume only know $\X^\top\Y, \Vert \Y\Vert^2$ and the sample size $n$, assume that $X\sim N(0, \Sigma)$, where the covariance matrix is known

GhostKnockoffs with pseudo-lasso

idea: modify the Lasso objective function so that it can be constrcuted from the available summary statistics.

image

Choice of tuning parameter

  • lasso-min
  • pseudo-sum

Variants of GhostKnockoffs

Multi-knockoffs

Group knockoffs

Conditional randomization test

He et al. (2021)

  • $X_i = (X_{i1},\ldots, X_{iq})$: vector of covariates
  • $G_i = (G_{i1},\ldots, G_{ip})$: vector of genotype
  • $Y_i$: genotype
\[g(\mu_i) = \alpha_0 + \alpha^TX_i + \beta^TG_i\]

the per-sample score statistic can be written as $G_i^\top Y_i$. The z-scores aggregating all samples can be written as

\[Z_{score} = \frac{1}{\sqrt n}G^\top Y\]

the knockoff counterpart for $Z_{score}$ can be directly generated by

\[\tilde Z_{score} = PZ_{score} + E,\]

where $E\sim N(0, V)$.

define a W-statistic that quantifies the magnitude of effect on the outcome as

\[W = (T-\text{median}_{1\le m\le M} T^m) I_{T\ge \max_{1\le m\le M} T^m}\,.\]

define the knockoff statistics

\[\kappa_j = \argmax_{0\le m\le M} T_j^m\,, \qquad \tau_j = T_j^{(0)} - \text{median}_{1\le m\le M} T_j^{(m)}\,,\]

where $m$ indicates the $m$-th knockoff.

define the threshold for the knockoff filter as

\[\tau = \min\left\{ t > 0: \frac{\frac 1M + \frac 1M\#\{\kappa_j \ge 1, \tau_j \ge t\} }{\#\{\kappa_j = 0, \tau_j \ge t\} } \le q \right\}\]

=============

\[\hat\beta^{ls} = \argmin_{\beta}\Vert \bfY - \bfX\beta\Vert^2_2 = (\bfX^\top\bfX)^{-1}\bfX^\top\bfY\]

In the marginal regression,

\[\hat\beta^{marginal}_j = \argmin_{\beta}\Vert \bfY - \bfX_j\beta\Vert^2_2 = (\bfX_j^\top\bfX_j)^{-1}\bfX_j^\top\bfY\]

Note that

\[\hat\beta^{ridge} = \argmin_{\beta}\Vert \bfY - \bfX\beta\Vert^2_2 + \lambda \Vert\beta\Vert_2^2 = (\bfX^\top\bfX + \lambda\bfI)^{-1}\bfX^\top\bfY\]

then with augmented design matrix $[\bfX\,\tilde\bfX]$, the ridge solution only depends on $[\bfX\,\tilde\bfX]^\top[\bfX\,\tilde\bfX]$ and $[\bfX\,\tilde\bfX]^\top\bfY$, i.e., only $\bfX^\top\bfX, \tilde\bfX^\top\bfX, \tilde \bfX^\top\tilde\bfX, \bfX^\top\bfY, \tilde \bfX^\top\bfY$, which are all components of the Gram matrix of $[\bfX\,\tilde\bfX\,\bfY]$ (actually $\bfY^\top\bfY$ is not necessary, as also discussed in the paper on page 6)

The lasso solution can be obtained via an iterative ridge strategy (sec 6.4.2 of http://arxiv.org/abs/1509.09169), so we can also obtain an (approximated) lasso from ridge which only depends on the Gram matrix of $[\bfX\,\tilde\bfX\,\bfY]$.

\[\hat\beta^{lasso} = \argmin_{\beta}\Vert \bfY - \bfX\beta\Vert^2_2 + \lambda \Vert\beta\Vert_1\]

Published in categories Note