WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Combining $p$-values in Meta Analysis

Posted on
Tags: Meta-analysis, p-value

I came across the term meta-analysis in the previous post, and I had another question about nominal size while reading the paper of the previous post, which reminds me Keith’s notes. By coincidence, I also find the topic about meta-analysis in the same notes. Hence, this post is mainly based on Keith’s notes, and reproduce the power curves by myself.

Suppose testing $H_0:\theta=0$ requires expensive experiments, so you cannot perform a large scale experiment. Luckily, many researchers have done the same tests before. However, only $p$-values are reported in their research papers (without the actual datasets). Then we can combine these $p$-values by using the methods in this post. Generally, this type of multi-source inference is called meta analysis.


Let $p_1,\ldots, p_m$ be the $p$-values of $m$ independent size $\alpha$ tests for the same simple null $H_0$ and same alternative $H_1$, where $\alpha\in(0,1)$. We want to combine these $p$-values. The aggregated $p$-value and the aggregated test are defined to be

\[\hat p = \hat p(p_{1:n})\qquad \text{and}\qquad \hat\psi = 1(\hat p < c)\,,\]

respectively for some $c\in\IR$.

Tippett (1931): $\hat p_T = \min(p_1,\ldots,p_m)$

Under $H_0$, $p_i\iid U(0,1)$, then

\[\hat p_T \sim \mathrm{Beta}(1, m)\,,\]

More generally, if $X_1,\ldots, X_n\iid U(0, 1)$, then $X_{(j)}\sim\mathrm{Beta}(j,n+j-1)$ for $j=1,\ldots,n$.

and the critical value $c_T$ satisfies

\[\alpha = P_0(\hat p_T < c_T)\,.\]

Pearson (1933): $\hat p_P=-\sum_{j=1}^m\log(1-p_j)$

Under $H_0$, for each $j$, we have

\[P_0(-\log(1-p_j) < t) = P_0(p_j < 1-e^{-t}) = (1-e^{-t})1(t>0)\,,\]

implying that $-\log(1-p_j)\sim\mathrm{Exp}(1)$. Hence, under $H_0$,

\[\hat p_P=-\sum_{j=1}^m\log(1-p_j)\sim \mathrm{Gamma}(m, 1)\,.\]

Gamma distribution generalizes exponential distribution. Let $\theta > 0$ and $n\in \bbN$, then

\[X_1,\ldots,X_n\iid \theta\mathrm{Exp}(1)\Rightarrow T=\sum_{i=1}^nX_i\sim \theta\mathrm{Gamma}(n)\,.\]

If also allow “fractional sample size” $n\in\IR^+$, then $T\sim \theta\mathrm{Gamma}(n)$ is called the Gamma distribution with scale parameter $\theta$ and shape parameter $n$. This notation can avoid the confusing traditional notations $\mathrm{Gamma}(n, \theta)$ and $\mathrm{Gamma}(n,1/\theta)$ in which the second parameter refer to either the scale or rate parameter.

Fisher (1934): $\hat p_F=\sum_{j=1}^m\log p_j$

Under $H_0$, note that

\[\hat p_F = \sum_{j=1}^m\log p_j = -\left[ - \sum_{j=1}^m\log p_j \right] \triangleq -W\,,\]

where $W\sim \mathrm{Gamma}(m, 1)$ since $1-p_j\overset{d}{=}p_j$. Under $H_0$, $c_F$ can be found by solving

\[\alpha = P_0(\hat p_F < c_F) = P(-W < c_F) = P(W > -c_F)\,,\]

that is,

\[1-\alpha = P(W < -c_F)\,.\]

Power Curves

Suppose that $H_0:\theta_0=0$ is tested against $H_1:\theta > 0$ such that

\[p_1,\ldots,p_m \iid \mathrm{Beta}(1,1+\theta), \theta\in \IR^+\,,\]

and actually $\mathrm{Beta}(1,1)=\mathrm{Unif}(0, 1)$, which also includes the case for $H_0$.

Estimate the power functions with Monte Carlo method, and implement in Julia as follows:

function calc_power(n, m; α = 0.05, θs = 0:2.5:25)
    cT = quantile(Beta(1, m), α)
    cP = quantile(Gamma(m, 1), α)
    cF = -quantile(Gamma(m, 1), 1-α)
    power = zeros(length(θs), 3)
    for k = 1:length(θs)
        freqs = zeros(Int, 3)
        for i = 1:n
            p = rand(Beta(1, 1+θs[k]), m)
            # Tippett
            freqs[1] += (minimum(p) < cT)
            # Pearson
            freqs[2] += (sum(-log.(1 .- p)) < cP)
            # Fisher
            freqs[3] += (sum(log.(p)) < cF)
        power[k, :] .= freqs ./ n
    return power

The upper-left plot shows that the tests $\hat \psi_P$ and $\hat \psi_T$ are the most and the least powerful tests among the three combined tests for any $\theta >0$, respectively.

The remaining three plots present the power curves for each test when $\theta = 1,2,3$. Visually, the test $\hat \psi_T$ does not improve its power with $m$, which means that the combining method is bas. On the other hand, the test $\hat \psi_F$ and the test $\psi_P$ increases their powers when $m$ increases. It is a desirable property.

However, that it does not mean $\hat\psi_P$ is the universal best combing method. When the distribution of the $p$-values $p_{1:m}$ follow other distribution under $H_1$, other combining methods may become more appealing. Refer to Heard, N. A., & Rubin-Delanchy, P. (2018). Choosing between methods of combining $p$-values. Biometrika, 105(1), 239–246. for more details.

Published in categories Note