Combining $p$-values in Meta Analysis
Posted on
I came across the term meta-analysis in the previous post, and I had another question about nominal size while reading the paper of the previous post, which reminds me Keith’s notes. By coincidence, I also find the topic about meta-analysis in the same notes. Hence, this post is mainly based on Keith’s notes, and reproduce the power curves by myself.
Suppose testing $H_0:\theta=0$ requires expensive experiments, so you cannot perform a large scale experiment. Luckily, many researchers have done the same tests before. However, only $p$-values are reported in their research papers (without the actual datasets). Then we can combine these $p$-values by using the methods in this post. Generally, this type of multi-source inference is called meta analysis.
Setup
Let $p_1,\ldots, p_m$ be the $p$-values of $m$ independent size $\alpha$ tests for the same simple null $H_0$ and same alternative $H_1$, where $\alpha\in(0,1)$. We want to combine these $p$-values. The aggregated $p$-value and the aggregated test are defined to be
\[\hat p = \hat p(p_{1:n})\qquad \text{and}\qquad \hat\psi = 1(\hat p < c)\,,\]respectively for some $c\in\IR$.
Tippett (1931): $\hat p_T = \min(p_1,\ldots,p_m)$
Under $H_0$, $p_i\iid U(0,1)$, then
\[\hat p_T \sim \mathrm{Beta}(1, m)\,,\]More generally, if $X_1,\ldots, X_n\iid U(0, 1)$, then $X_{(j)}\sim\mathrm{Beta}(j,n+j-1)$ for $j=1,\ldots,n$.
and the critical value $c_T$ satisfies
\[\alpha = P_0(\hat p_T < c_T)\,.\]Pearson (1933): $\hat p_P=-\sum_{j=1}^m\log(1-p_j)$
Under $H_0$, for each $j$, we have
\[P_0(-\log(1-p_j) < t) = P_0(p_j < 1-e^{-t}) = (1-e^{-t})1(t>0)\,,\]implying that $-\log(1-p_j)\sim\mathrm{Exp}(1)$. Hence, under $H_0$,
\[\hat p_P=-\sum_{j=1}^m\log(1-p_j)\sim \mathrm{Gamma}(m, 1)\,.\]Gamma distribution generalizes exponential distribution. Let $\theta > 0$ and $n\in \bbN$, then
\[X_1,\ldots,X_n\iid \theta\mathrm{Exp}(1)\Rightarrow T=\sum_{i=1}^nX_i\sim \theta\mathrm{Gamma}(n)\,.\]If also allow “fractional sample size” $n\in\IR^+$, then $T\sim \theta\mathrm{Gamma}(n)$ is called the Gamma distribution with scale parameter $\theta$ and shape parameter $n$. This notation can avoid the confusing traditional notations $\mathrm{Gamma}(n, \theta)$ and $\mathrm{Gamma}(n,1/\theta)$ in which the second parameter refer to either the scale or rate parameter.
Fisher (1934): $\hat p_F=\sum_{j=1}^m\log p_j$
Under $H_0$, note that
\[\hat p_F = \sum_{j=1}^m\log p_j = -\left[ - \sum_{j=1}^m\log p_j \right] \triangleq -W\,,\]where $W\sim \mathrm{Gamma}(m, 1)$ since $1-p_j\overset{d}{=}p_j$. Under $H_0$, $c_F$ can be found by solving
\[\alpha = P_0(\hat p_F < c_F) = P(-W < c_F) = P(W > -c_F)\,,\]that is,
\[1-\alpha = P(W < -c_F)\,.\]Power Curves
Suppose that $H_0:\theta_0=0$ is tested against $H_1:\theta > 0$ such that
\[p_1,\ldots,p_m \iid \mathrm{Beta}(1,1+\theta), \theta\in \IR^+\,,\]and actually $\mathrm{Beta}(1,1)=\mathrm{Unif}(0, 1)$, which also includes the case for $H_0$.
Estimate the power functions with Monte Carlo method, and implement in Julia as follows:
function calc_power(n, m; α = 0.05, θs = 0:2.5:25)
cT = quantile(Beta(1, m), α)
cP = quantile(Gamma(m, 1), α)
cF = -quantile(Gamma(m, 1), 1-α)
power = zeros(length(θs), 3)
for k = 1:length(θs)
freqs = zeros(Int, 3)
for i = 1:n
p = rand(Beta(1, 1+θs[k]), m)
# Tippett
freqs[1] += (minimum(p) < cT)
# Pearson
freqs[2] += (sum(-log.(1 .- p)) < cP)
# Fisher
freqs[3] += (sum(log.(p)) < cF)
end
power[k, :] .= freqs ./ n
end
return power
end
The upper-left plot shows that the tests $\hat \psi_P$ and $\hat \psi_T$ are the most and the least powerful tests among the three combined tests for any $\theta >0$, respectively.
The remaining three plots present the power curves for each test when $\theta = 1,2,3$. Visually, the test $\hat \psi_T$ does not improve its power with $m$, which means that the combining method is bas. On the other hand, the test $\hat \psi_F$ and the test $\psi_P$ increases their powers when $m$ increases. It is a desirable property.
However, that it does not mean $\hat\psi_P$ is the universal best combing method. When the distribution of the $p$-values $p_{1:m}$ follow other distribution under $H_1$, other combining methods may become more appealing. Refer to Heard, N. A., & Rubin-Delanchy, P. (2018). Choosing between methods of combining $p$-values. Biometrika, 105(1), 239–246. for more details.