Debiased InverseVariance Weighted Estimator in Mendelian Randomization
Posted on
This post is for the talk at Yale given by Prof. Ting Ye based on the paper Ye, T., Shao, J., & Kang, H. (2020). Debiased InverseVariance Weighted Estimator in TwoSample SummaryData Mendelian Randomization (arXiv:1911.09802). arXiv.
Mendelian randomization: study the effect of a modifiable exposure on an outcome by using genetic variants as instrumental variables
Challenge: each genetic variant explains a relatively small proportion of variance in the exposure and there are many such variants, a setting known as many weak instruments.
The paper provides a theoretical characterization of the statistical properties of two popular estimators:
 the inversevariance weighted (IVW) estimator
 the IVW estimator with screened instruments using an independent selection dataset, under many weak instruments
Then propose a debiased IVW estimator, that is robust to many weak instruments, and does not require screening.
Additionally, present two instrument selection methods to improve the efficiency of the new estimator when a selection dataset is available.
Introduction

IV: estimate the effect of a treatment, policy, or an exposure on an outcome in observational studies with unmeasured confounding

MR is a type of IV method, utilizes genetic variants as instruments to study the effect of a modifiable exposure or potential risk factor on an outcome in the presence of unmeasured confounding
 a large number of genetic variants, SNPs from GWASs, and many or possibly all SNPs are weak IVs
Three reasons for these genetic instruments/SNPs can be weak
 many SNPs may have zero/null effects on the exposure
 when SNPs are common genetic variants (MAF > 0.05), they may have small effects on the exposure
 when SNPs are rare variants (MAF < 0.05), they may have small of modest effects on the exposure, but their genetic variances are small so that their total contribution to the variation of the exposure is small
Focus on twosample summarydata MR, where two sets of summary statistics are obtained from two GWASs
 first set from one GWAS: estimated marginal association between the jth SNP and the exposure: $\hat\gamma_j$, its standard error $\hat\sigma_{X_j}, j=1,\ldots,p$
 second set from another GWAS: estimated marginal association between the jth SNP and the outcome: $\hat\Gamma_j$, its SE $\hat\sigma_{Y_j}$
In MR, the main parameter of interest is the exposure effect on the outcome, $\beta_0$, and can be estimated by $\hat\beta_j=\hat\Gamma_j/\hat\gamma_j$.
However, $\hat\beta_j$ may be seriously biased and unstable when SNP j is weak because $\hat\gamma_j$ is close to zero. This leads to several modern MR methods that aggregate many possibly unstable estimators $\hat\beta_j$ using a metaanalysis strategy. The most popular is
\[\hat\beta_{IVW} = \frac{\sum_{j=1}^p\hat w_j\hat\beta_j}{\sum_{j=1}^p\hat w_j}\,,\hat\beta_j = \frac{\hat\Gamma_j}{\hat\gamma_j}, \,, \hat w_j = \frac{\hat\gamma_j^2}{\hat\sigma_{Y_j}^2}\]A variant of the typical IVW estimator is to only include SNPs that pass the genomewide significance threshold in a third independent GWAS, known as the selection dataset, inside the IVW estimator. It is IVW estimator with screening,
\[\hat\beta_{\lambda, IVW} = \frac{\sum_{j\in S_\lambda}\hat w_j\hat\beta_j}{\sum_{j\in S_\lambda}\hat w_j} = \frac{\sum_{j\in S_\lambda}\hat\Gamma_j\hat\gamma_j\hat\sigma_{Y_j}^{2}}{\sum_{j\in S_\lambda}\hat\gamma_j^2\hat\sigma_{Y_j}^{2}}\,, S_\lambda = \{j:\vert \hat\gamma_j^\star\vert > \lambda \hat\sigma_{X_j}^\star\}\]