Power of Masking Methods for Adaptive Testing
Posted on
power of masking methods for adaptive testing in a multivariate normal means problem
many large-scale testing procedures learn signal structure from the data to boost power.
- direct data reuse can inflate Type-I error (double-dipping), a common remedy is masking: withholding some information during learning and using it for testing
- sample splitting masks by withholding observations for testing, while null augmentation (e.g., knockoffs or full-conformal outlier detection) masks by appending null samples or variables and withholding their identities until testing
- in many settings, little is known about or against more data-efficient non-masking alternatives.
- study these questions in a stylized two-groups multivariate normal means model with an unknown signal direction learned from the data
- the paper develops a transparent, unified set of asymptotic power expressions for three parallel methods differing in masking choices
- a sample splitting method
- a full-conformal-style null augmentation method
- an oracle in-sample benchmark
the main findings are:
- the augmentation method is more powerful than the splitting method with matched tuning
- the power-optimal number of null samples for the augmentation method is a vanishing fraction of the number of tractable approximation to the augmentation
- for a tractable approximation to the augmentation method, the optimal number of null samples scales as the square root of the number of tests, with empirical evidence suggesting a similar scaling for the method itself
HRT and full-conformal outlier detection exemplify two common masking mechanisms: sample splitting and null augmentation. Likewise, there are other data modification schemes not falling within the definition of masking. (Dai et al., 2023)
work in a two-groups multivariate normal means problem where alternative means are drawn from a one-dimensional subspace whose direction $v$ is unknown.
analyze three methods that
- learn an alternative direction $\hat v$ on a portion of the data
- score each hypothesis by projecting a potentially different portion of the data in the direction of $\hat v$
- calibrate these scores against a null distribution to obtain $p$-values
- adjust these $p$-values for multiplicity by applying the BH procedure
consider
- split BH
- BONuS
- In-sample BH
derive the asymptotic powers of all three methods in a unified framework that mirrors their common structure
their findings are as follows:
- Q1 (The choice of masking mechanism)
- Q2 (The amount of masking)
- Q3 (the cost of masking)