WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Poisson Regression

Posted on 0 Comments
Tags: Poisson Regression

Introduction

The Poisson distribution $P(\mu)$ is often used to model count data. If $Y$ is the number of occurrences, its probability distribution can be written as

\[f(y) = \frac{\mu^ye^{-\mu}}{y!},\; y=0,1,2,\ldots\]

where $\mu$ is the average number of occurrences. And we can obtain

\[E(Y)=\mu\qquad var(Y)=\mu\]

Poisson Regression

Let $Y_1,\ldots, Y_N$ be independent random variables with $Y_i$ denoting the number of events observed from exposure $n_i$ for the $i$-th covariate pattern. We have

\[E(Y_i)=\mu_i = n_i\theta_i\]

where $\theta_i$ is usually modeled by

\[\theta_i=\exp(\boldsymbol x_i^T\boldsymbol \beta)\]

Therefore, the generalized linear model is

\[E(Y_i)=\mu_i=n_i\exp(\boldsymbol x_i^T\boldsymbol \beta)\qquad Y_i\sim P(\mu_i)\]

The natural link function is the logarithmic function

\[\log \mu_i = \log n_i + \boldsymbol x_i^T\boldsymbol \beta\]

where $\log(n_i)$ called the offset, it is a known constant.

Measure the goodness of fit

Hypotheses about the parameter $\beta_j$ can be tested using the wald, score or likelihood ration statistics. Approximately, we have

\[\frac{b_j-\beta_j}{s.e.(b_j)}\sim N(0, 1)\]

The fitted values are given by

\[e_i=\hat Y_i=\hat \mu_i=n_i exp(\boldsymbol x_i^T\boldsymbol b), \; i=1,\ldots, N\]

The pearson residuals are

\[r_i = \frac{o_i-e_i}{\sqrt{e_i}}\]

where $o_i$ is the observed value of $Y_i$. The chi-squared goodness of fit statistics are related by

\[X^2 = \sum r_i^2 = \sum\frac{(o_i-e_i)^2}{e_i}\]

The deviance for a Poisson model can be written in the form

\[D=2\sum[o_i\log(o_i/e_i)-(o_i-e_i)]\]

For most models, we have $\sum o_i=\sum e_i$, so

\[D = 2\sum[o_i\log(o_i/e_i)]\]

The deviance residuals are the components of $D$,

\[d_i = sign(o_i-e_i)\sqrt{2\sum[o_i\log(o_i/e_i)-(o_i-e_i)]},\; i=1,\ldots, N\]

so that $D = \sum d_i^2$.

Using the Taylor series expansion,

\[\begin{array}{cc} D &= 2\sum\limits_{i=1}^N[(o_i-e_i)+\frac{1}{2}\frac{(o_i-e_i)^2}{e_i}-(o_i-e_i)]\\ &=\sum\limits_{i=1}^N\frac{(o_i-e_i)^2}{e_i}=X^2 \end{array}\]

Implement

The example which implement in C++ and R can be found on gsl_lm

You can see the code in my github repository gsl_lm/poisson.

References

An Introduction to Generalized Linear Models, Third Edition (Chapman & Hall/CRC Texts in Statistical Science)


Published in categories Report