WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Lagrange Multiplier Test

Posted on
Tags: Lagrange Multiplier

This post is based on Peter BENTLER’s talk, S.-Y. Lee’s Lagrange Multiplier Test in Structural Modeling: Still Useful? in the International Statistical Conference in Memory of Professor Sik-Yum Lee.

In the structured linear models of multivariate random data, $\Sigma = \Sigma(\btheta)$ is a $p\times p$ covariance matrix whose elements are assumed to be differentiable real-valued functions of a true though unknown $q\times 1$ vector of parameter $\btheta$, and the primary statistical problems involve

  • estimation of parameters of the model,
  • establishing properties of the estimators,
  • evaluating goodness of fit of competing models.

In the method of maximum likelihood and generalized least squares, the parameter vector $\btheta$ is considered to represent a vector of free elements that are independent of each other.

Simple equality constraints among parameters can be implemented by a reparameterization, but models containing parameters that are related by general functional constraints had not been studied at the time of the paper, and the paper provides a statistical basis for constrained generalized least squares estimators.

  • $\Sigma_o=\Sigma(\btheta_o)$: a $p\times p$ population covariance matrix
  • $\sigma_{ij}(\btheta_o), i, j=1,\ldots,p$: elements of $\Sigma_o$, differentiable real-valued functions of a true though unknown $q\times 1$ vector of parameters $\btheta_o$
  • $\Omega\in \IR^q$ with elements $\btheta$: a closed and bounded parameter set
  • $\omega$: a subset of $\Omega$ whose elements satisfy the functional relationship $\bfh(\btheta)=\0$
  • $\bfh(\btheta)$: an $r\times 1$ real vector-valued continuous function of $\theta$.
  • $S$: the sample covariance matrix obtained from a random sample of size $N=n+1$ from a multivariate normal population with mean vector $\0$ and covariance matrix $\Sigma_o$.

Consider the generalized least squares function,

\[Q(\btheta) = \frac 12 \tr\{[(S-\Sigma)V]^2\}\]

which comes from the residual quadratic form $(\bfs - \bsigma)’[\Cov(\bfs, \bfs’)]^{-1}(\bfs - \bsigma)$ (Browne, 1974){.comment}

define the constrained generalized least squares estimator $\tilde\btheta$ of $\btheta_o$ as the vector which satisfies $\bfh(\tilde\btheta)=\0$ and minimizes $Q(\btheta)$. It follows from the first order necessary condition that there exists a vector $\tilde \blambda’ = (\tilde \lambda_1,\ldots, \tilde \lambda_r)$ of Lagrange multipliers such that

\[\begin{align} \dot Q(\tilde \btheta) + \tilde L'\tilde \blambda &= \0\\ \h(\tilde \btheta) &=\0\,, \end{align}\]

where $\dot Q= (\partial Q/\partial \theta_i)$ is the gradient vector of $Q(\btheta)$, and $L=(\partial h_i/\partial \theta_j)$ is an $r\times q$ matrix of partial derivatives with $\tilde L = L(\tilde\btheta)$.

The constrained maximum likelihood estimator $\hat\btheta$ of $\btheta_o$, is defined as the vector which satisfies $\bfh(\hat\btheta)=\0$ and minimizes the function

\[F(\btheta) = \log \vert \Sigma\vert +\tr(S\Sigma^{-1}) - \log \vert S\vert -p\,.\]

Similarly, we have

\[\begin{align} \dot F(\btheta) + \hat L'\hat\blambda &= 0\\ \bfh(\hat\btheta) &= 0\,. \end{align}\]
  1. The generalized least squares estimator $\tilde\btheta$ is consistent.
  2. The joint asymptotic distribution of random variables $n^{1/2}(\tilde\btheta-\btheta_o)$ and $n^{1/2}\tilde\blambda$ is multivariate normal with zero mean vector and covariance matrix.
  3. The generalized least squares estimator $(\tilde \btheta, \tilde\blambda)$ is asymptotically equivalent to the maximum likelihood estimator $(\hat\btheta, \hat\blambda)$.
  4. The asymptotic distribution of $nQ(\tilde \theta)$ is chi-square with degrees of freedom $p(p+1)/2-(q-r)$.

Let $\tilde \btheta^*$ be the generalized least squares estimator that is subject to $\h^*(\btheta)=\0$, where $\h^*(\btheta) = (h_1(\btheta), \ldots,h_j(\btheta))$.

  1. The asymptotic distribution of $n[Q(\tilde \btheta) - Q(\tilde \btheta^*)]$ is chi-square with degrees of freedom $r-j$.

Proposition 4 provides an asymptotic test statistic for testing the null hypothesis

\[H_o = \Sigma_o = \Sigma(\btheta_o), \btheta_o\in \omega\]

against the general alternative that $\Sigma_o$ is any symmetric positive definite $p\times p$ matrix. Another asymptotic test statistic for the null hypothesis against the specific alternative

\[H_1 = \Sigma = \Sigma(\btheta_o), \btheta_o\in \Omega\]

is given by the next proposition. (how & why??)

  1. The asymptotic distribution of $-2^{-1}n\tilde\blambda’\tilde R^{-}\tilde\blambda$ under $H_0$ is chi-square with degrees of freedom equal to the rank of $R_0$.

It is well known that this test is asymptotically equal to Rao’s (1948) score test. These tests also are asymptotically equal to the Wald and LR chi-square difference tests.

The LM (Lagrange Multiplier) test for several omitted parameters can be broken down into a series of 1-df tests. Bentler (1983, 1985) developed a forward stepwise LM procedure where, at each step, the parameter is chosen that will maximally increase the LM chi-square, contingent on those already included.

It seems that the most frequent applications of LM tests in SEM are the following:

  • $\theta_i=0$: evaluate necessity of an omitted parameter. This is often–maybe almost always– post-hoc.
  • $\theta_i-\theta_j=0$. evaluate the appropriateness of an equality restriction. This can be a priori.
  • in EQS (Structural Equation Modeling Software): evaluate constraints across multiple groups such as, for a given parameter, $\theta_i^{(1)}=\theta_i^{(2)}=\ldots=\theta_i^{(g)}$, i.e., differences are zero. This is typically a fully a priori test, e.g., of equal factor loadings across groups.

Simple nonlinear constraint such as $\theta_1 = \theta_2^2$ can be done with phantom variables, and do not require constrained optimization.


Published in categories Note