Debiased Inverse-Variance Weighted Estimator in Mendelian Randomization
Posted on
This post is for the talk at Yale given by Prof. Ting Ye based on the paper Ye, T., Shao, J., & Kang, H. (2020). Debiased Inverse-Variance Weighted Estimator in Two-Sample Summary-Data Mendelian Randomization (arXiv:1911.09802). arXiv.
Mendelian randomization: study the effect of a modifiable exposure on an outcome by using genetic variants as instrumental variables
Challenge: each genetic variant explains a relatively small proportion of variance in the exposure and there are many such variants, a setting known as many weak instruments.
The paper provides a theoretical characterization of the statistical properties of two popular estimators:
- the inverse-variance weighted (IVW) estimator
- the IVW estimator with screened instruments using an independent selection dataset, under many weak instruments
Then propose a debiased IVW estimator, that is robust to many weak instruments, and does not require screening.
Additionally, present two instrument selection methods to improve the efficiency of the new estimator when a selection dataset is available.
Introduction
-
IV: estimate the effect of a treatment, policy, or an exposure on an outcome in observational studies with unmeasured confounding
-
MR is a type of IV method, utilizes genetic variants as instruments to study the effect of a modifiable exposure or potential risk factor on an outcome in the presence of unmeasured confounding
- a large number of genetic variants, SNPs from GWASs, and many or possibly all SNPs are weak IVs
Three reasons for these genetic instruments/SNPs can be weak
- many SNPs may have zero/null effects on the exposure
- when SNPs are common genetic variants (MAF > 0.05), they may have small effects on the exposure
- when SNPs are rare variants (MAF < 0.05), they may have small of modest effects on the exposure, but their genetic variances are small so that their total contribution to the variation of the exposure is small
Focus on two-sample summary-data MR, where two sets of summary statistics are obtained from two GWASs
- first set from one GWAS: estimated marginal association between the j-th SNP and the exposure: $\hat\gamma_j$, its standard error $\hat\sigma_{X_j}, j=1,\ldots,p$
- second set from another GWAS: estimated marginal association between the j-th SNP and the outcome: $\hat\Gamma_j$, its SE $\hat\sigma_{Y_j}$
In MR, the main parameter of interest is the exposure effect on the outcome, $\beta_0$, and can be estimated by $\hat\beta_j=\hat\Gamma_j/\hat\gamma_j$.
However, $\hat\beta_j$ may be seriously biased and unstable when SNP j is weak because $\hat\gamma_j$ is close to zero. This leads to several modern MR methods that aggregate many possibly unstable estimators $\hat\beta_j$ using a meta-analysis strategy. The most popular is
\[\hat\beta_{IVW} = \frac{\sum_{j=1}^p\hat w_j\hat\beta_j}{\sum_{j=1}^p\hat w_j}\,,\hat\beta_j = \frac{\hat\Gamma_j}{\hat\gamma_j}, \,, \hat w_j = \frac{\hat\gamma_j^2}{\hat\sigma_{Y_j}^2}\]A variant of the typical IVW estimator is to only include SNPs that pass the genome-wide significance threshold in a third independent GWAS, known as the selection dataset, inside the IVW estimator. It is IVW estimator with screening,
\[\hat\beta_{\lambda, IVW} = \frac{\sum_{j\in S_\lambda}\hat w_j\hat\beta_j}{\sum_{j\in S_\lambda}\hat w_j} = \frac{\sum_{j\in S_\lambda}\hat\Gamma_j\hat\gamma_j\hat\sigma_{Y_j}^{-2}}{\sum_{j\in S_\lambda}\hat\gamma_j^2\hat\sigma_{Y_j}^{-2}}\,, S_\lambda = \{j:\vert \hat\gamma_j^\star\vert > \lambda \hat\sigma_{X_j}^\star\}\]