BONuS: Multiple Multivariate Testing
Posted on
an adaptive empirical Bayes framework, the Bag-Of-Null-Statistics (BONuS) procedure, for multiple testing where each hypothesis testing problem is itself multivariate or nonparametric
an adaptive and interactive knockoff-type method that helps improve the testing power while controlling the FDR, and is closely connected to the counting knockoffs
contrary to procedures that start with a p-value for each hypothesis, the proposed method analyzes the entire data set to adaptively estimate an optimal p-value transform based on an empirical Bayes model
despite the extra adaptivity, the method controls FDR in finite samples even if the empirical Bayes model is incorrect or the estimation is poor.
the Double BONuS procedure, validates the empirical Bayes model to guard against power loss due to model misspecification
Introduction
multiple multivariate testing
in most multivariate hypothesis testing problem, there is no uniformly most powerful (UMP) test that is efficient against all alternatives
in a single multivariate testing problem, we cannot avoid paying the price of agnosticism without prior knowledge of which alternatives are more likely to occur.
by contrast, when testing many multivariate hypotheses at once, we can pool information across hypotheses to learn the requisite prior knowledge to craft a more powerful test for each hypothesis.
the article proposes an interactive EB testing framework that uses a partially masked version of the entire data set to jointly estimate a prior distribution over the alternative. called the Bag of Null Statistics (BONuS)
The BONuS procedure adaptively estimates an optimal sequence of nested rejection regions, selecting the largest region for which an estimator of the FDP is below a prespecified significance level $\alpha$.
it achieves robust finite-sample control of the FDR at level $\alpha$ whether or not the EB working model for the prior is correctly specified.
to illustrate the cost of using an inefficient agnostic test, consider a rudimentary multivariate Gaussian simulation with
\[X^{(i)}\sim_{ind.} N_{10}(\thet^{(i)}, I_{10})\]wish to test $\theta^{(i)} = 0$ against $\theta^{(i)}\neq 0$ for each $i$.
generate $n_1$ non-null statistics with mean parameters drawn independently from $\theta^{(i)}\sim N_{10}(0, 4vv’)$, and the remaining $n_0 = n - n_1$ parameters are set to 0. In this problem, the GLRT statistic is equivalently $T_{GLRT}(X^{(i)})=\Vert X^{(i)}\Vert_2^2$, while the Bayes-optimal test statistic is $T(X^{(i)})=(v’X^{(i)})^2$, which focuses all of its power in a single dimension of $\IR^{10}$.
multiple testing and the two-groups model
consider testing the null hypothesis $H_0^{(i)}: \theta^{(i)}=0$ against $H_1^{(i)}:\theta^{(i)}\neq 0$ in $n$ experiments
\[X^{(i)}\overset{ind.}{\sim} f_{\theta^{(i)}}(X)\,,\quad \text{for }i=1,\ldots, n\]with possibly infinite-dimensional parameter $\theta^{(i)}\in \Theta$.
take a Bayesian perspective and assume that $\theta\sim \Lambda$ under the alternative, then the test with highest average power rejects for large values of $LR_\Lambda(X) = f_\Lambda(X)/f_0(X)$. if the prior $\Lambda$ is relatively concentrated around a lower-dimensional region then the test based on $LR_\Lambda$ may have much higher power, but we must know $\Lambda$ to use it.
the posterior probability that $H_0^{(i)}$ is true, called the local FDR or lfdr
a natural EB idea is to estimate either $\Lambda$ or $f_{mix}$ directly from the data, calculate p-value with repect to the plug-in test statistic
the main difficulty with this plan is that one must account properly for its using the same data twice.
expecting consistent estimation of $\Lambda$ is highly dubious for several reasons:
- first, the space of priors over the alternatives is very large
- second $\pi_0$ is difficult to estimate
related work
BONuS is motivated by several papers on adaptive inference
- knockoff
- AdaPT and STAR
- counting knockoffs
the focus here is to learn a prior distribution over a multivariate parameter space
In BONuS, the objective is to adaptively learn the structure of the problem from the data and use the structure to construct a more powerful test statistics
many recent methodology papers in post-selection inference have explored the use of structural information to improve testing power when certain prior information is available.
- a common type of structure: a hypothesis can be rejected only if the preceding hypotheses have been rejected
- another structure represented by a directed acyclic graph (DAG)
- a generalization of ultilizing prior information is proposed
- how to exploit covariates independent p-values when they are available
in applications of GWAS, researchers are interested in diseases related to multiple endophenotypes, which naturally motivates the study of quantitative trait loci (QTL) that have a joint impact on these endophenotypes
for solvin multivariate GWAS problems
- canonical correlation analysis to extract linear combinations of traits that explain the most correlation with the markers
- in testing the regression coefficients of genotypes for some quantitative phenotypes, proposed to use multiple phenotypes jointly to test the coefficients, different from the traditional approach that adopts a T-test for each genotype-phenotype pair
BONuS procedure
the BONuS procedure begins by either generating a set of $\tilde n$ synthetic controls drawn from the null distribution
and then hiding them aong the real statistics.
the analyst observes the pooled empirical distribution of synthetic null and real test statistics
under the working Bayesian model, the permuted values are exchangeable (but not quite independent)
the BONuS method proceeds iteratively, gradually revealing more information to the analyst
- at step $t=0$, the analyst uses the permuted data to calculate an initial estimator, and an initial rejection region
- at step $t$, the analyst is allowed to observe $B(R_t^c)$, “unmasking” the real/synthetic identities of all observations excluded from the current rejection region, and then calculate an estimator for the rejection region. the analyst either halts the procedure or proposes a new candidate rejection region
define the counting processes
\[N(\cA) = \#\{i: X^{(i)}\in \cA\}\,, \qquad \tilde N(\cA) = \#\{i:\tilde X^{(i)}\in \cA\}\]consider two versions of the procedure
- BH-BONuS
- Storey-BONuS
with respectively use the FDP estimators
\[\widehat\text{FDP}_t^{BH} = \frac{n}{\tilde n+1} \cdot \frac{\tilde N(R_t)+1}{1\vee N(R_t)}\]and
\[\widehat\text{FDP}_t^{St} = \frac{N(\cA)}{\tilde N(\cA)+1} \cdot \frac{\tilde N(R_t)+1}{1\vee N(R_t)}\]where $\cA$ is a correction set.