Generalized Higher Criticism in GWAS

Tags: Higher Criticism, GWAS

generalized higher criticism for Testing SNP-Set effects in Genetic Association Studies

individual SNP effects are generally weak, and the disease/trait associated SNPs identified in GWAS are insufficient in explaining much of the heritability of complex diseases and traits, even for highly heritable traits, such as height
these findings suggest that single SNP analysis may be underpowered. This is particularly the case for low-frequency SNPs in sequencing association studies
region-based analyses: combine information from multiple SNPs in a genetic construct
genes, gene networks, and pathways are examples of genetic constructs that are likely to have multiple SNPs that function simultaneously to affect diseases and traits, for example, due to functional similarity or interaction
the signal SNPs in a genetic construct are likely to be sparse and have weak signals. Hence, a methodology that does not require strong marginal SNP effects but is capable of aggregating these weak and sparse SNP effects together into a detectable signal at the genetic construct level, such as a gene, is needed to help increase the chance of detecting the effects of these genetic constructs and find the causes of the missing heritability

CGEM GWAS breast cancer study

several SNPs in the FGFR2 region showed strong evidence of association with breast cancer risk using invidual SNP analysis
but none of these SNPs reached genome-wide significance when analyzing the CGEM GWAS data using the traditional individual SNP analysis
sparse signals in an SNP set, present a particularly difficult problem for detection.

several methods for SNP-set testing:

HC is a global test that combines information over all the mariginal test statistics of a set of variables

2 GLM and Marginal SNP Score Test Statistics

$N$ individuals genotyped over a region with $p$ observed $p$ observed SNPs in a SNP-set
possible SNP-sets include genes, gene networks, or genetic pathways
phenotypes $Y = [Y_1,\ldots, Y_N]^\top$
$N\times p$ genotype matrix $G$ is constructed such that $G_i = [G_{i1},\ldots, G_{ip}]^\top$
$N\times q$ covariate matrix $X$

conditional on $(X_i, G_i)$, $Y_i$ follows a distribution in the exponential family

\[f(Y_i) = \exp((Y_i\theta_i - b(\theta_i)) / a_i(\phi) + c(Y_i, \phi))\]

to construct a mareginal test between the $j$-th SNP and $Y$, model

\[\mu_i = E(Y_i\mid G_i, X_i) = b'(\theta_i)\]

(why the derivative?) {:.comment}

using the GLM

\[g(\mu_i) = X_i^\top\alpha + G_i^\top\beta\]

the variance of $Y_i$ is $\Var(Y_i) = a_i(\phi)\nu(\mu_i)$, where $\nu(\mu_i) = b’’(\theta_o)$ is a variance function.

testing for the overall effect of the SNP set $G_i$, which corresponds to the global null $H_0: \beta = 0$

Let $W$ and $P$, the marginal score test statistic for $\beta_j$ under the global null is

\[Z_j = \frac{G_j^\top (Y-\hat\mu_0)}{\sqrt{G_j^\top P G_j}}\]

where $\hat\mu_0 = \mu(X\hat\alpha)$, $\hat\alpha$ is the MLE of $\alpha$ under the null model of $g(\mu_i) = X_i^\top\alpha$

these individual SNP test statistics are asymptotically jointly distributed as $Z\sim MVN(0, \Sigma)$, where we estimate $\cov(Z_j, Z_k)$

while the $Z$ are correlated, we define the uncorrelated transformed test statistics $Z^\star$ to be

\[Z^\star = U^{-1}Z\sim MVN(0, I_p)\]

where $UU^\top = \hat \Sigma$ is the Cholesky decomposition.

because $Z$ are correlated, based on the transformed $Z^\star$.

\[S^\star(t) = \sum_{j=1}^p 1_{\vert Z_j^\star\vert \ge t}\]

Published in categories

← previous next →