WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

LD Score Regression

Posted on
Tags: Polygenicity, Genome-wide Association Studies

This note is for Bulik-Sullivan, B. K., Loh, P.-R., Finucane, H. K., Ripke, S., Yang, J., Patterson, N., Daly, M. J., Price, A. L., & Neale, B. M. (2015). LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics, 47(3), 291–295.

an inflated distribution of test statistics in GWAS can be yielded by

  • polygenicity (many small genetic effects)
  • confounding biases: such as cryptic relatedness and population stratification

the paper proposed LD Score regression

  • quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD)
  • the LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control

Under a polygenic model, where effect sizes for variants are drawn independently from distributions with variance proportional to $1/(p(1-p))$, where $p$ is the minor allele frequency (MAF), the expected $\chi^2$ statistic of variant $j$ is

\[E[\chi^2\mid \ell_j] = Nh^2\ell_j/M + Na +1\,,\]


  • $N$: sample size
  • $M$: number of SNPs, then $h^2/M$ is the average heritability explained per SNP
  • $a$: contribution of confounding biases
  • $\ell_j = \sum_kr_{jk}^2$: LD score of variant $j$, which measures the amount of genetic variation tagged by $j$

Consequently, if regress $\chi^2$ from GWAS against LD score, the intercept minus one is an estimator of the mean contribution of confounding bias to the inflation in the test statistics.

Published in categories Note