# Combining $p$-values in Meta Analysis

##### Posted on

I came across the term meta-analysis in the previous post, and I had another question about nominal size while reading the paper of the previous post, which reminds me Keith’s notes. By coincidence, I also find the topic about meta-analysis in the same notes. Hence, this post is mainly based on Keith’s notes, and reproduce the power curves by myself.

Suppose testing $H_0:\theta=0$ requires expensive experiments, so you cannot perform a large scale experiment. Luckily, many researchers have done the same tests before. However, only $p$-values are reported in their research papers (without the actual datasets). Then we can combine these $p$-values by using the methods in this post. Generally, this type of multi-source inference is called **meta analysis**.

## Setup

Let $p_1,\ldots, p_m$ be the $p$-values of $m$ independent size $\alpha$ tests for the same simple null $H_0$ and same alternative $H_1$, where $\alpha\in(0,1)$. We want to combine these $p$-values. The aggregated $p$-value and the aggregated test are defined to be

respectively for some $c\in\IR$.

## Tippett (1931): $\hat p_T = \min(p_1,\ldots,p_m)$

Under $H_0$, $p_i\iid U(0,1)$, then

More generally, if $X_1,\ldots, X_n\iid U(0, 1)$, then $X_{(j)}\sim\mathrm{Beta}(j,n+j-1)$ for $j=1,\ldots,n$.

and the critical value $c_T$ satisfies

## Pearson (1933): $\hat p_P=-\sum_{j=1}^m\log(1-p_j)$

Under $H_0$, for each $j$, we have

implying that $-\log(1-p_j)\sim\mathrm{Exp}(1)$. Hence, under $H_0$,

Gamma distribution generalizes exponential distribution. Let $\theta > 0$ and $n\in \bbN$, then

If also allow “fractional sample size” $n\in\IR^+$, then $T\sim \theta\mathrm{Gamma}(n)$ is called the Gamma distribution with scale parameter $\theta$ and shape parameter $n$. This notation can avoid the confusing traditional notations $\mathrm{Gamma}(n, \theta)$ and $\mathrm{Gamma}(n,1/\theta)$ in which the second parameter refer to either the scale or rate parameter.

## Fisher (1934): $\hat p_F=\sum_{j=1}^m\log p_j$

Under $H_0$, note that

where $W\sim \mathrm{Gamma}(m, 1)$ since $1-p_j\overset{d}{=}p_j$. Under $H_0$, $c_F$ can be found by solving

that is,

## Power Curves

Suppose that $H_0:\theta_0=0$ is tested against $H_1:\theta > 0$ such that

and actually $\mathrm{Beta}(1,1)=\mathrm{Unif}(0, 1)$, which also includes the case for $H_0$.

Estimate the power functions with Monte Carlo method, and implement in Julia as follows:

```
function calc_power(n, m; α = 0.05, θs = 0:2.5:25)
cT = quantile(Beta(1, m), α)
cP = quantile(Gamma(m, 1), α)
cF = -quantile(Gamma(m, 1), 1-α)
power = zeros(length(θs), 3)
for k = 1:length(θs)
freqs = zeros(Int, 3)
for i = 1:n
p = rand(Beta(1, 1+θs[k]), m)
# Tippett
freqs[1] += (minimum(p) < cT)
# Pearson
freqs[2] += (sum(-log.(1 .- p)) < cP)
# Fisher
freqs[3] += (sum(log.(p)) < cF)
end
power[k, :] .= freqs ./ n
end
return power
end
```

The upper-left plot shows that the tests $\hat \psi_P$ and $\hat \psi_T$ are the most and the least powerful tests among the three combined tests for any $\theta >0$, respectively.

The remaining three plots present the power curves for each test when $\theta = 1,2,3$. Visually, the test $\hat \psi_T$ does not improve its power with $m$, which means that the combining method is bas. On the other hand, the test $\hat \psi_F$ and the test $\psi_P$ increases their powers when $m$ increases. It is a desirable property.

However, that it does not mean $\hat\psi_P$ is the universal best combing method. When the distribution of the $p$-values $p_{1:m}$ follow other distribution under $H_1$, other combining methods may become more appealing. Refer to Heard, N. A., & Rubin-Delanchy, P. (2018). Choosing between methods of combining $p$-values. Biometrika, 105(1), 239–246. for more details.