WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Cauchy Combination Test

Posted on
Tags: p-value

This note is for Liu, Y., & and Xie, J. (2020). Cauchy Combination Test: A Powerful Test With Analytic p-Value Calculation Under Arbitrary Dependency Structures. Journal of the American Statistical Association, 115(529), 393–402.

Cauchy Combination Test:

  • a simple form and is defined as a weighted sum of Cauchy transformation of individual p-values

  • prove a nonasymptotic result that the tail of the null distribution of our proposed test statistic can be well approximated by a Cauchy distribution under artbitrary dependency structures

  • show that the power of the proposed test is asymptotically optimal in a strong sparsity setting

Introduction

a few well-known classical methods:

  • Fisher’s combination test
  • sum of squares type tests

however, in modern high-throughput data analysis where there is only a small fraction of significant effects, these traditional tests are ineffective and can have substantial power loss

various methods have been developed to improve power for detecting sparse alternatives

  • Tippett’s minimum p-value test
  • the higher criticism test
  • the Berk-Jones test

all three tests combine individual p-values to aggregate multiple effects, refer to them as combination tests

no analytic methods are available for the p-value calculation of the Tippett’s minimum p-value, higher criticism, and Berk-Jones tests under dependence structures

the main motivating examples from GWAS

one commonly used analysis approach in GWAS is to perform set-based analysis. which divides the SNPs into sets/groups (e.g., genes) based on some biological information and tests the association between each SNP-set and the phenotype one at a time

the combination tests are useful for testing the significance of each SNP-set by aggregating the p-values of individual SNPs

similar to the Fisher’s combination test, the new test statistic is defined as the weighted sum of transformed p-values, except that the p-values are transformed to follow a standard Cauchy distribution

  • prove that the tail of the null distribution of the proposed test statistic is approximately Cauchy under arbitrary correlation structures
  • then propose to calculate the p-value of the Cauchy combination test by the cdf of a standard Cauchy distribution
  • establish similar theoretical result for the high-dimensional situation where the number of p-values $d$ diverges

a remarkable result: the sum of some class of dependent Cauchy variables could be exactly Cauchy distributed

the sum of independent standard Cauchy variables follows the same distribution as the sum of perfectly dependent standard Cauchy variables

Null distribution

  • $p_i$ is the individual $p$-value, for $i=1,\ldots,d$. define the Cauchy combination test statistic as
\[T = \sum_{i=1}^d w_i \tan((0.5-p_i)\pi)\]

where $w_i\ge 0$ and $\sum_{i=1}^d w_i = 1$.

if $p_i \sim U[0, 1]$ under the null, then $\tan((0.5-p_i)\pi)$ follows a standard Cauchy

if $p_i$ are indepndent or perfectly dependent (all the $p_i$ are equal), the test statistic $T$ has a standard Cauchy distribution under the null. This results from the closeness of Cauchy distribution under convolution and is unique to the proposed Cauchy combination test statistic.


Published in categories