Combining p-values in Meta Analysis
Posted on
I came across the term meta-analysis in the previous post, and I had another question about nominal size while reading the paper of the previous post, which reminds me Keith’s notes. By coincidence, I also find the topic about meta-analysis in the same notes. Hence, this post is mainly based on Keith’s notes, and reproduce the power curves by myself.
Suppose testing H0:θ=0 requires expensive experiments, so you cannot perform a large scale experiment. Luckily, many researchers have done the same tests before. However, only p-values are reported in their research papers (without the actual datasets). Then we can combine these p-values by using the methods in this post. Generally, this type of multi-source inference is called meta analysis.
Setup
Let p1,…,pm be the p-values of m independent size α tests for the same simple null H0 and same alternative H1, where α∈(0,1). We want to combine these p-values. The aggregated p-value and the aggregated test are defined to be
ˆp=ˆp(p1:n)andˆψ=1(ˆp<c),respectively for some c∈IR.
Tippett (1931): ˆpT=min(p1,…,pm)
Under H0, pii.i.d∼U(0,1), then
ˆpT∼Beta(1,m),More generally, if X1,…,Xni.i.d∼U(0,1), then X(j)∼Beta(j,n+j−1) for j=1,…,n.
and the critical value cT satisfies
α=P0(ˆpT<cT).Pearson (1933): ˆpP=−∑mj=1log(1−pj)
Under H0, for each j, we have
P0(−log(1−pj)<t)=P0(pj<1−e−t)=(1−e−t)1(t>0),implying that −log(1−pj)∼Exp(1). Hence, under H0,
ˆpP=−m∑j=1log(1−pj)∼Gamma(m,1).Gamma distribution generalizes exponential distribution. Let θ>0 and n∈N, then
X1,…,Xni.i.d∼θExp(1)⇒T=n∑i=1Xi∼θGamma(n).If also allow “fractional sample size” n∈IR+, then T∼θGamma(n) is called the Gamma distribution with scale parameter θ and shape parameter n. This notation can avoid the confusing traditional notations Gamma(n,θ) and Gamma(n,1/θ) in which the second parameter refer to either the scale or rate parameter.
Fisher (1934): ˆpF=∑mj=1logpj
Under H0, note that
ˆpF=m∑j=1logpj=−[−m∑j=1logpj]≜−W,where W∼Gamma(m,1) since 1−pjd=pj. Under H0, cF can be found by solving
α=P0(ˆpF<cF)=P(−W<cF)=P(W>−cF),that is,
1−α=P(W<−cF).Power Curves
Suppose that H0:θ0=0 is tested against H1:θ>0 such that
p1,…,pmi.i.d∼Beta(1,1+θ),θ∈IR+,and actually Beta(1,1)=Unif(0,1), which also includes the case for H0.
Estimate the power functions with Monte Carlo method, and implement in Julia as follows:
function calc_power(n, m; α = 0.05, θs = 0:2.5:25)
cT = quantile(Beta(1, m), α)
cP = quantile(Gamma(m, 1), α)
cF = -quantile(Gamma(m, 1), 1-α)
power = zeros(length(θs), 3)
for k = 1:length(θs)
freqs = zeros(Int, 3)
for i = 1:n
p = rand(Beta(1, 1+θs[k]), m)
# Tippett
freqs[1] += (minimum(p) < cT)
# Pearson
freqs[2] += (sum(-log.(1 .- p)) < cP)
# Fisher
freqs[3] += (sum(log.(p)) < cF)
end
power[k, :] .= freqs ./ n
end
return power
end
The upper-left plot shows that the tests ˆψP and ˆψT are the most and the least powerful tests among the three combined tests for any θ>0, respectively.
The remaining three plots present the power curves for each test when θ=1,2,3. Visually, the test ˆψT does not improve its power with m, which means that the combining method is bas. On the other hand, the test ˆψF and the test ψP increases their powers when m increases. It is a desirable property.
However, that it does not mean ˆψP is the universal best combing method. When the distribution of the p-values p1:m follow other distribution under H1, other combining methods may become more appealing. Refer to Heard, N. A., & Rubin-Delanchy, P. (2018). Choosing between methods of combining p-values. Biometrika, 105(1), 239–246. for more details.