Poisson Regression

Posted on Jul 31, 2017

Tags: Poisson

Introduction

The Poisson distribution $P(\mu)$ is often used to model count data. If $Y$ is the number of occurrences, its probability distribution can be written as

\[f(y) = \frac{\mu^ye^{-\mu}}{y!},\; y=0,1,2,\ldots\]

where $\mu$ is the average number of occurrences. And we can obtain

\[E(Y)=\mu\qquad var(Y)=\mu\]

Poisson Regression

Let $Y_1,\ldots, Y_N$ be independent random variables with $Y_i$ denoting the number of events observed from exposure $n_i$ for the $i$-th covariate pattern. We have

\[E(Y_i)=\mu_i = n_i\theta_i\]

where $\theta_i$ is usually modeled by

\[\theta_i=\exp(\boldsymbol x_i^T\boldsymbol \beta)\]

Therefore, the generalized linear model is

\[E(Y_i)=\mu_i=n_i\exp(\boldsymbol x_i^T\boldsymbol \beta)\qquad Y_i\sim P(\mu_i)\]

The natural link function is the logarithmic function

\[\log \mu_i = \log n_i + \boldsymbol x_i^T\boldsymbol \beta\]

where $\log(n_i)$ called the offset, it is a known constant.

Measure the goodness of fit

Hypotheses about the parameter $\beta_j$ can be tested using the wald, score or likelihood ration statistics. Approximately, we have

\[\frac{b_j-\beta_j}{s.e.(b_j)}\sim N(0, 1)\]

The fitted values are given by

\[e_i=\hat Y_i=\hat \mu_i=n_i exp(\boldsymbol x_i^T\boldsymbol b), \; i=1,\ldots, N\]

The pearson residuals are

\[r_i = \frac{o_i-e_i}{\sqrt{e_i}}\]

where $o_i$ is the observed value of $Y_i$. The chi-squared goodness of fit statistics are related by

\[X^2 = \sum r_i^2 = \sum\frac{(o_i-e_i)^2}{e_i}\]

The deviance for a Poisson model can be written in the form

\[D=2\sum[o_i\log(o_i/e_i)-(o_i-e_i)]\]

For most models, we have $\sum o_i=\sum e_i$, so

\[D = 2\sum[o_i\log(o_i/e_i)]\]

The deviance residuals are the components of $D$,

\[d_i = sign(o_i-e_i)\sqrt{2\sum[o_i\log(o_i/e_i)-(o_i-e_i)]},\; i=1,\ldots, N\]

so that $D = \sum d_i^2$.

Using the Taylor series expansion,

\[\begin{array}{cc} D &= 2\sum\limits_{i=1}^N[(o_i-e_i)+\frac{1}{2}\frac{(o_i-e_i)^2}{e_i}-(o_i-e_i)]\\ &=\sum\limits_{i=1}^N\frac{(o_i-e_i)^2}{e_i}=X^2 \end{array}\]

Implement

The example which implement in C++ and R can be found on gsl_lm

You can see the code in my github repository gsl_lm/poisson.

References

An Introduction to Generalized Linear Models, Third Edition (Chapman & Hall/CRC Texts in Statistical Science)

Published in categories Report

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Poisson Regression

Posted on Jul 31, 2017

Introduction

Poisson Regression

Measure the goodness of fit

Implement

References