WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Estimate Parameters in Logistic Regression

Posted on July 30, 2017 0 Comments

Background

Assuming $\boldsymbol y$ is a $n\times 1$ response variable, and $y_i\sim B(1, \pi_i)$. $\boldsymbol x_1,\ldots,\boldsymbol x_p$ are $p$ explanatory variables.

Then the likelihood of $\boldsymbol y$ is

and the log-likelihood is

The logistic regression said that

where

then we have

Thus, the log-likelihood could be

Maximum Likelihood Estimate

We need to find $\boldsymbol \beta$ to minimize $l(\boldsymbol \beta)$, which means that

We adopt Newton-Raphson Algorithm (Multivariate version) to solve $\boldsymbol \beta$ numerically.

Let

then

so

where

The Newton-Raphson formula for multi-variate problem is

Wald Test for Parameters

In the univariate case, the Wald statistic is

which is compared against a chi-squared distribution.

Alternatively, the difference can be compared to a normal distribution. In this case, the test statistic is

where $se(\hat\theta)$ is the standard error of the maximum likelihood estimate. Assuming $\mathbf H$ is the Hessian of the log-likelihood function $l$, then the vector $\sqrt{diag((-\mathbf H)^{-1})}$ is the estimate of the standard error of each parameter value at its maximum.

Implement in C++ and R

Implement above algorithm and apply to an example, and compare the logistic regression results with glm(..., family = binomial()) in R.

If you want to know more details, you can visit my github repository gsl_lm/logit.

References

  1. An Introduction to Generalized Linear Models, Third Edition (Chapman & Hall/CRC Texts in Statistical Science)
  2. Newton-Raphson Algorithm (Multivariate version)
  3. How to calculate p values in logistic regression with gradient descent algorithm

Published in categories Regression