BONuS: Multiple Multivariate Testing

Posted on Apr 05, 2026

Tags: Multivariate Testing, Data Adaptive

This note is for Yang, Chiao-Yu, Lihua Lei, Nhat Ho, and Will Fithian. “BONuS: Multiple Multivariate Testing with a Data-Adaptive test Statistic.” arXiv:2106.15743. Preprint, arXiv, July 1, 2021.

an adaptive empirical Bayes framework, the Bag-Of-Null-Statistics (BONuS) procedure, for multiple testing where each hypothesis testing problem is itself multivariate or nonparametric

an adaptive and interactive knockoff-type method that helps improve the testing power while controlling the FDR, and is closely connected to the counting knockoffs

contrary to procedures that start with a p-value for each hypothesis, the proposed method analyzes the entire data set to adaptively estimate an optimal p-value transform based on an empirical Bayes model

despite the extra adaptivity, the method controls FDR in finite samples even if the empirical Bayes model is incorrect or the estimation is poor.

the Double BONuS procedure, validates the empirical Bayes model to guard against power loss due to model misspecification

Introduction

multiple multivariate testing

in most multivariate hypothesis testing problem, there is no uniformly most powerful (UMP) test that is efficient against all alternatives

in a single multivariate testing problem, we cannot avoid paying the price of agnosticism without prior knowledge of which alternatives are more likely to occur.

by contrast, when testing many multivariate hypotheses at once, we can pool information across hypotheses to learn the requisite prior knowledge to craft a more powerful test for each hypothesis.

the article proposes an interactive EB testing framework that uses a partially masked version of the entire data set to jointly estimate a prior distribution over the alternative. called the Bag of Null Statistics (BONuS)

The BONuS procedure adaptively estimates an optimal sequence of nested rejection regions, selecting the largest region for which an estimator of the FDP is below a prespecified significance level $\alpha$.

it achieves robust finite-sample control of the FDR at level $\alpha$ whether or not the EB working model for the prior is correctly specified.

to illustrate the cost of using an inefficient agnostic test, consider a rudimentary multivariate Gaussian simulation with

\[X^{(i)}\sim_{ind.} N_{10}(\thet^{(i)}, I_{10})\]

wish to test $\theta^{(i)} = 0$ against $\theta^{(i)}\neq 0$ for each $i$.

generate $n_1$ non-null statistics with mean parameters drawn independently from $\theta^{(i)}\sim N_{10}(0, 4vv’)$, and the remaining $n_0 = n - n_1$ parameters are set to 0. In this problem, the GLRT statistic is equivalently $T_{GLRT}(X^{(i)})=\Vert X^{(i)}\Vert_2^2$, while the Bayes-optimal test statistic is $T(X^{(i)})=(v’X^{(i)})^2$, which focuses all of its power in a single dimension of $\IR^{10}$.

multiple testing and the two-groups model

consider testing the null hypothesis $H_0^{(i)}: \theta^{(i)}=0$ against $H_1^{(i)}:\theta^{(i)}\neq 0$ in $n$ experiments

\[X^{(i)}\overset{ind.}{\sim} f_{\theta^{(i)}}(X)\,,\quad \text{for }i=1,\ldots, n\]

with possibly infinite-dimensional parameter $\theta^{(i)}\in \Theta$.

take a Bayesian perspective and assume that $\theta\sim \Lambda$ under the alternative, then the test with highest average power rejects for large values of $LR_\Lambda(X) = f_\Lambda(X)/f_0(X)$. if the prior $\Lambda$ is relatively concentrated around a lower-dimensional region then the test based on $LR_\Lambda$ may have much higher power, but we must know $\Lambda$ to use it.

the posterior probability that $H_0^{(i)}$ is true, called the local FDR or lfdr

a natural EB idea is to estimate either $\Lambda$ or $f_{mix}$ directly from the data, calculate p-value with repect to the plug-in test statistic

the main difficulty with this plan is that one must account properly for its using the same data twice.

expecting consistent estimation of $\Lambda$ is highly dubious for several reasons:

first, the space of priors over the alternatives is very large
second $\pi_0$ is difficult to estimate

BONuS is motivated by several papers on adaptive inference

knockoff
AdaPT and STAR
counting knockoffs

the focus here is to learn a prior distribution over a multivariate parameter space

In BONuS, the objective is to adaptively learn the structure of the problem from the data and use the structure to construct a more powerful test statistics

many recent methodology papers in post-selection inference have explored the use of structural information to improve testing power when certain prior information is available.

a common type of structure: a hypothesis can be rejected only if the preceding hypotheses have been rejected
another structure represented by a directed acyclic graph (DAG)
a generalization of ultilizing prior information is proposed
how to exploit covariates independent p-values when they are available

in applications of GWAS, researchers are interested in diseases related to multiple endophenotypes, which naturally motivates the study of quantitative trait loci (QTL) that have a joint impact on these endophenotypes

for solvin multivariate GWAS problems

canonical correlation analysis to extract linear combinations of traits that explain the most correlation with the markers
in testing the regression coefficients of genotypes for some quantitative phenotypes, proposed to use multiple phenotypes jointly to test the coefficients, different from the traditional approach that adopts a T-test for each genotype-phenotype pair

BONuS procedure

the BONuS procedure begins by either generating a set of $\tilde n$ synthetic controls drawn from the null distribution

and then hiding them aong the real statistics.

the analyst observes the pooled empirical distribution of synthetic null and real test statistics

under the working Bayesian model, the permuted values are exchangeable (but not quite independent)

the BONuS method proceeds iteratively, gradually revealing more information to the analyst

at step $t=0$, the analyst uses the permuted data to calculate an initial estimator, and an initial rejection region
at step $t$, the analyst is allowed to observe $B(R_t^c)$, “unmasking” the real/synthetic identities of all observations excluded from the current rejection region, and then calculate an estimator for the rejection region. the analyst either halts the procedure or proposes a new candidate rejection region

define the counting processes

\[N(\cA) = \#\{i: X^{(i)}\in \cA\}\,, \qquad \tilde N(\cA) = \#\{i:\tilde X^{(i)}\in \cA\}\]

consider two versions of the procedure

BH-BONuS
Storey-BONuS

with respectively use the FDP estimators

\[\widehat\text{FDP}_t^{BH} = \frac{n}{\tilde n+1} \cdot \frac{\tilde N(R_t)+1}{1\vee N(R_t)}\]

and

\[\widehat\text{FDP}_t^{St} = \frac{N(\cA)}{\tilde N(\cA)+1} \cdot \frac{\tilde N(R_t)+1}{1\vee N(R_t)}\]

where $\cA$ is a correction set.

Published in categories

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

BONuS: Multiple Multivariate Testing

Posted on Apr 05, 2026

Introduction

multiple multivariate testing

multiple testing and the two-groups model

BONuS procedure

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

BONuS: Multiple Multivariate Testing

Posted on Apr 05, 2026

Introduction

multiple multivariate testing

multiple testing and the two-groups model

related work

BONuS procedure