LD Score Regression
Posted on
an inflated distribution of test statistics in GWAS can be yielded by
- polygenicity (many small genetic effects)
- confounding biases: such as cryptic relatedness and population stratification
the paper proposed LD Score regression
- quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD)
- the LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control
Under a polygenic model, where effect sizes for variants are drawn independently from distributions with variance proportional to $1/(p(1-p))$, where $p$ is the minor allele frequency (MAF), the expected $\chi^2$ statistic of variant $j$ is
\[E[\chi^2\mid \ell_j] = Nh^2\ell_j/M + Na +1\,,\]where
- $N$: sample size
- $M$: number of SNPs, then $h^2/M$ is the average heritability explained per SNP
- $a$: contribution of confounding biases
- $\ell_j = \sum_kr_{jk}^2$: LD score of variant $j$, which measures the amount of genetic variation tagged by $j$
Consequently, if regress $\chi^2$ from GWAS against LD score, the intercept minus one is an estimator of the mean contribution of confounding bias to the inflation in the test statistics.