Poisson Regression
Posted on 0 Comments
Introduction
The Poisson distribution $P(\mu)$ is often used to model count data. If $Y$ is the number of occurrences, its probability distribution can be written as
\[f(y) = \frac{\mu^ye^{-\mu}}{y!},\; y=0,1,2,\ldots\]where $\mu$ is the average number of occurrences. And we can obtain
\[E(Y)=\mu\qquad var(Y)=\mu\]Poisson Regression
Let $Y_1,\ldots, Y_N$ be independent random variables with $Y_i$ denoting the number of events observed from exposure $n_i$ for the $i$-th covariate pattern. We have
\[E(Y_i)=\mu_i = n_i\theta_i\]where $\theta_i$ is usually modeled by
\[\theta_i=\exp(\boldsymbol x_i^T\boldsymbol \beta)\]Therefore, the generalized linear model is
\[E(Y_i)=\mu_i=n_i\exp(\boldsymbol x_i^T\boldsymbol \beta)\qquad Y_i\sim P(\mu_i)\]The natural link function is the logarithmic function
\[\log \mu_i = \log n_i + \boldsymbol x_i^T\boldsymbol \beta\]where $\log(n_i)$ called the offset, it is a known constant.
Measure the goodness of fit
Hypotheses about the parameter $\beta_j$ can be tested using the wald, score or likelihood ration statistics. Approximately, we have
\[\frac{b_j-\beta_j}{s.e.(b_j)}\sim N(0, 1)\]The fitted values are given by
\[e_i=\hat Y_i=\hat \mu_i=n_i exp(\boldsymbol x_i^T\boldsymbol b), \; i=1,\ldots, N\]The pearson residuals are
\[r_i = \frac{o_i-e_i}{\sqrt{e_i}}\]where $o_i$ is the observed value of $Y_i$. The chi-squared goodness of fit statistics are related by
\[X^2 = \sum r_i^2 = \sum\frac{(o_i-e_i)^2}{e_i}\]The deviance for a Poisson model can be written in the form
\[D=2\sum[o_i\log(o_i/e_i)-(o_i-e_i)]\]For most models, we have $\sum o_i=\sum e_i$, so
\[D = 2\sum[o_i\log(o_i/e_i)]\]The deviance residuals are the components of $D$,
\[d_i = sign(o_i-e_i)\sqrt{2\sum[o_i\log(o_i/e_i)-(o_i-e_i)]},\; i=1,\ldots, N\]so that $D = \sum d_i^2$.
Using the Taylor series expansion,
\[\begin{array}{cc} D &= 2\sum\limits_{i=1}^N[(o_i-e_i)+\frac{1}{2}\frac{(o_i-e_i)^2}{e_i}-(o_i-e_i)]\\ &=\sum\limits_{i=1}^N\frac{(o_i-e_i)^2}{e_i}=X^2 \end{array}\]Implement
The example which implement in C++ and R can be found on gsl_lm
You can see the code in my github repository gsl_lm/poisson.