WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Tweedie's Formula and Selection Bias

Posted on (Update: )
Tags: Tweedie, Empirical Bayes, James-Stein, Empirical Bayes Information, Non-central Chi-square

Prof. Inchi HU will give a talk on Large Scale Inference for Chi-squared Data tomorrow, which proposes the Tweedie’s formula in the Bayesian hierarchical model for chi-squared data, and he mentioned a thought-provoking paper, Efron, B. (2011). Tweedie’s Formula and Selection Bias. Journal of the American Statistical Association, 106(496), 1602–1614., which is the focus of this note.

Introduction

Suppose that some large number $N$ of possibly correlated normal variates $z_i$ have been observed, each with its own unobserved mean parameter $\mu_i$,

where $N=5000$ $\mu_i$ values comprised 10 repetitions each of

and attention focuses on the extremes, say the 100 largest $z_i$’s.

Reproduce Figure 1 with the following Julia code:

using Distributions
using Plots

# figure 1
N = 5000
m = 100
σ = 1
z = ones(N)
μ = repeat(-log.( ( (1:500) .- 0.5) / 500 ), inner = 10)
for i = 1:N
    z[i] = rand(Normal(μ[i], σ))
end
p = histogram(z, legend=false, size = (1080, 720))
for zi in sort(z, rev = true)[1:100]
    plot!(p, [zi, zi], [-5, 0], color = :red)
end
vline!(p, [0])
xlabel!(p, "z values")
ylabel!(p, "Frequency")
savefig(p, "fig1.png")    

Selection bias: the tendency of the corresponding 100 $\mu_i$’s to be less extreme.

Tweedie’s formula

Consider

where $\eta$ is the natural or canonical parameter of the exponential family, and $f_0(z)$ is the density when $\eta=0$.

By Bayes rule,

where $f(z)$ is the marginal density

$\cZ$ is the sample space of the exponential family. Then

where $\lambda(z)=\log(f(z)/f_0(z))$.

Differentiating $\lambda(z)$ yields

Letting

we can express the posterior mean and variance of $\eta\mid z$ as

In the normal transformation family $z\sim N(\mu,\sigma^2)$, then

and since $\mu=\sigma^2\eta$,

The $N(\mu,\sigma^2)$ family has skewness equal zero. We can incorporate skewness into the application of Tweedie’s formula by taking $f_0(z)$ to be a standardized gamma variable with shape parameter $m$,

in which case $f_0(z)$ has mean 0, variance $1$, and

for all members of the exponential family.

Note that $Z=\frac{Y-m}{\sqrt m}\sim f_0(z)$, then $f_Y(y)=f_0(z)/\sqrt m$, thus

which implies that $\E Y = \frac{m\sqrt m}{\sqrt m-\eta}$, and hence

Since

then

thus we have

There might be a typo in equation (2.13) of Efron (2011)

Empirical Bayes Estimation

The empirical Bayes formula

requires a smoothly differentiable estimate of $l(z)=\log f(z)$.

Assume that $l(z)$ is a $J$th-degree polynomial, that is,

which represents a $J$-parameter exponential family having canonical parameter vector $\bbeta=(\beta_1,\ldots,\beta_J)$.

Set the same number of bins $K=63$,

using StatsBase
fit(Histogram, z, nbins= 63)
# Histogram{Int64,1,Tuple{StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}}
# edges:
#   -3.4:0.2:9.2
# weights: [1, 0, 2, 1, 8, 10, 12, 16, 30, 50  …  0, 3, 3, 0, 0, 0, 0, 0, 0, 1]
# closed: left
# isdensity: false

the width is also $d=0.2$, with centers $x_k$ ranging from -3.3 to 9.1. The bar heights are given by the weights.

Do the empirical Bayes estimate $\hat\mu_i=z_i+\sigma^2\hat l’(z_i)$ cure selection bias?

It shows that the empirical Bayes bias correction $\hat l’_i$ was quite effective, the corrected differences being much more closely centered around zero.

The empirical Bayes implementation of Tweedie’s formula reduces, almost, to the James-Stein estimator.

where the MLE estimate of $1/V$ is $N/\sum z_j^2$, while the Jame-Stein rule gives

Empirical Bayes Information

the amount of information per “other” observation $z_j$ for estimating $\mu_i$.

For a fixed value $z_0$, let

be the Bayes and empirical Bayes estimates of $\E[\mu\mid z_0]$. Having observed $z=z_0$, the conditional regret for estimating $\mu$ by $\hat\mu_\z(z_0)$ instead of $\mu^+(z_0)$ is

The empirical Bayes information at $z_0$ is defined as

so

Prof. Hu’s Talk

The non-central chi-squared density is a Poisson mixture of central chi-squared densities. Consider the following model

  1. Draw the non-centrality parameter $\lambda$ from a prior density $g$.
  2. Generate a non-negative integer $J$ according to a Poisson distribution with mean $\lambda/2$
  3. Sample from the chi-squared density with $k+2J$ degrees of freedom
  4. The resulting $X$ has non-central chi-squared distribution with $k$ DF.

References


Published in categories Note