# Tweedie's Formula and Selection Bias

##### Posted on (Update: )

Prof. Inchi HU will give a talk on Large Scale Inference for Chi-squared Data tomorrow, which proposes the Tweedie’s formula in the Bayesian hierarchical model for chi-squared data, and he mentioned a thought-provoking paper, Efron, B. (2011). Tweedie’s Formula and Selection Bias. Journal of the American Statistical Association, 106(496), 1602–1614., which is the focus of this note.

## Introduction

Suppose that some large number $N$ of possibly correlated normal variates $z_i$ have been observed, each with its own unobserved mean parameter $\mu_i$,

where $N=5000$ $\mu_i$ values comprised 10 repetitions each of

and attention focuses on the extremes, say the 100 largest $z_i$’s.

Reproduce Figure 1 with the following Julia code:

```
using Distributions
using Plots
# figure 1
N = 5000
m = 100
σ = 1
z = ones(N)
μ = repeat(-log.( ( (1:500) .- 0.5) / 500 ), inner = 10)
for i = 1:N
z[i] = rand(Normal(μ[i], σ))
end
p = histogram(z, legend=false, size = (1080, 720))
for zi in sort(z, rev = true)[1:100]
plot!(p, [zi, zi], [-5, 0], color = :red)
end
vline!(p, [0])
xlabel!(p, "z values")
ylabel!(p, "Frequency")
savefig(p, "fig1.png")
```

Selection bias: the tendency of the corresponding 100 $\mu_i$’s to be less extreme.

## Tweedie’s formula

Consider

where $\eta$ is the natural or canonical parameter of the exponential family, and $f_0(z)$ is the density when $\eta=0$.

By Bayes rule,

where $f(z)$ is the marginal density

$\cZ$ is the sample space of the exponential family. Then

where $\lambda(z)=\log(f(z)/f_0(z))$.

Differentiating $\lambda(z)$ yields

Letting

we can express the posterior mean and variance of $\eta\mid z$ as

In the normal transformation family $z\sim N(\mu,\sigma^2)$, then

and since $\mu=\sigma^2\eta$,

The $N(\mu,\sigma^2)$ family has skewness equal zero. We can incorporate skewness into the application of Tweedie’s formula by taking $f_0(z)$ to be a standardized gamma variable with shape parameter $m$,

in which case $f_0(z)$ has mean 0, variance $1$, and

for all members of the exponential family.

Note that $Z=\frac{Y-m}{\sqrt m}\sim f_0(z)$, then $f_Y(y)=f_0(z)/\sqrt m$, thus

which implies that $\E Y = \frac{m\sqrt m}{\sqrt m-\eta}$, and hence

Since

then

thus we have

*There might be a typo in equation (2.13) of Efron (2011)*

## Empirical Bayes Estimation

The empirical Bayes formula

requires a smoothly differentiable estimate of $l(z)=\log f(z)$.

Assume that $l(z)$ is a $J$th-degree polynomial, that is,

which represents a $J$-parameter exponential family having canonical parameter vector $\bbeta=(\beta_1,\ldots,\beta_J)$.

Set the same number of bins $K=63$,

```
using StatsBase
fit(Histogram, z, nbins= 63)
# Histogram{Int64,1,Tuple{StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}
# edges:
# -3.4:0.2:9.2
# weights: [1, 0, 2, 1, 8, 10, 12, 16, 30, 50 … 0, 3, 3, 0, 0, 0, 0, 0, 0, 1]
# closed: left
# isdensity: false
```

the width is also $d=0.2$, with centers $x_k$ ranging from -3.3 to 9.1. The bar heights are given by the `weights`

.

Do the empirical Bayes estimate $\hat\mu_i=z_i+\sigma^2\hat l’(z_i)$ cure selection bias?

It shows that the empirical Bayes bias correction $\hat l’_i$ was quite effective, the corrected differences being much more closely centered around zero.

The empirical Bayes implementation of Tweedie’s formula reduces, almost, to the James-Stein estimator.

where the MLE estimate of $1/V$ is $N/\sum z_j^2$, while the Jame-Stein rule gives

## Empirical Bayes Information

the amount of information per “other” observation $z_j$ for estimating $\mu_i$.

For a fixed value $z_0$, let

be the Bayes and empirical Bayes estimates of $\E[\mu\mid z_0]$. Having observed $z=z_0$, the conditional regret for estimating $\mu$ by $\hat\mu_\z(z_0)$ instead of $\mu^+(z_0)$ is

The empirical Bayes information at $z_0$ is defined as

so

## Prof. Hu’s Talk

The non-central chi-squared density is a Poisson mixture of central chi-squared densities. Consider the following model

- Draw the non-centrality parameter $\lambda$ from a prior density $g$.
- Generate a non-negative integer $J$ according to a Poisson distribution with mean $\lambda/2$
- Sample from the chi-squared density with $k+2J$ degrees of freedom
- The resulting $X$ has non-central chi-squared distribution with $k$ DF.