WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Power of Masking Methods for Adaptive Testing

Posted on
Tags: Adaptive Testing, Double-dipping

This note is for Chakraborty, Abhinav, Junu Lee, and Eugene Katsevich. “Power of Masking Methods for Adaptive Testing in a Multivariate Normal Means Problem.” arXiv:2601.07764. Version 2. Preprint, arXiv, March 31, 2026..

power of masking methods for adaptive testing in a multivariate normal means problem

many large-scale testing procedures learn signal structure from the data to boost power.

  • direct data reuse can inflate Type-I error (double-dipping), a common remedy is masking: withholding some information during learning and using it for testing
  • sample splitting masks by withholding observations for testing, while null augmentation (e.g., knockoffs or full-conformal outlier detection) masks by appending null samples or variables and withholding their identities until testing
  • in many settings, little is known about or against more data-efficient non-masking alternatives.
  • study these questions in a stylized two-groups multivariate normal means model with an unknown signal direction learned from the data
  • the paper develops a transparent, unified set of asymptotic power expressions for three parallel methods differing in masking choices
    • a sample splitting method
    • a full-conformal-style null augmentation method
    • an oracle in-sample benchmark

the main findings are:

  • the augmentation method is more powerful than the splitting method with matched tuning
  • the power-optimal number of null samples for the augmentation method is a vanishing fraction of the number of tractable approximation to the augmentation
  • for a tractable approximation to the augmentation method, the optimal number of null samples scales as the square root of the number of tests, with empirical evidence suggesting a similar scaling for the method itself

HRT and full-conformal outlier detection exemplify two common masking mechanisms: sample splitting and null augmentation. Likewise, there are other data modification schemes not falling within the definition of masking. (Dai et al., 2023)

work in a two-groups multivariate normal means problem where alternative means are drawn from a one-dimensional subspace whose direction $v$ is unknown.

analyze three methods that

  • learn an alternative direction $\hat v$ on a portion of the data
  • score each hypothesis by projecting a potentially different portion of the data in the direction of $\hat v$
  • calibrate these scores against a null distribution to obtain $p$-values
  • adjust these $p$-values for multiplicity by applying the BH procedure

consider

  • split BH
  • BONuS
  • In-sample BH

derive the asymptotic powers of all three methods in a unified framework that mirrors their common structure

their findings are as follows:

  • Q1 (The choice of masking mechanism)
  • Q2 (The amount of masking)
  • Q3 (the cost of masking)

Published in categories