Lagrange Multiplier Test
Posted on
This post is based on Peter BENTLER’s talk, S.-Y. Lee’s Lagrange Multiplier Test in Structural Modeling: Still Useful? in the International Statistical Conference in Memory of Professor Sik-Yum Lee.
In the structured linear models of multivariate random data, $\Sigma = \Sigma(\btheta)$ is a $p\times p$ covariance matrix whose elements are assumed to be differentiable real-valued functions of a true though unknown $q\times 1$ vector of parameter $\btheta$, and the primary statistical problems involve
- estimation of parameters of the model,
- establishing properties of the estimators,
- evaluating goodness of fit of competing models.
In the method of maximum likelihood and generalized least squares, the parameter vector $\btheta$ is considered to represent a vector of free elements that are independent of each other.
Simple equality constraints among parameters can be implemented by a reparameterization, but models containing parameters that are related by general functional constraints had not been studied at the time of the paper, and the paper provides a statistical basis for constrained generalized least squares estimators.
- $\Sigma_o=\Sigma(\btheta_o)$: a $p\times p$ population covariance matrix
- $\sigma_{ij}(\btheta_o), i, j=1,\ldots,p$: elements of $\Sigma_o$, differentiable real-valued functions of a true though unknown $q\times 1$ vector of parameters $\btheta_o$
- $\Omega\in \IR^q$ with elements $\btheta$: a closed and bounded parameter set
- $\omega$: a subset of $\Omega$ whose elements satisfy the functional relationship $\bfh(\btheta)=\0$
- $\bfh(\btheta)$: an $r\times 1$ real vector-valued continuous function of $\theta$.
- $S$: the sample covariance matrix obtained from a random sample of size $N=n+1$ from a multivariate normal population with mean vector $\0$ and covariance matrix $\Sigma_o$.
Consider the generalized least squares function,
\[Q(\btheta) = \frac 12 \tr\{[(S-\Sigma)V]^2\}\]which comes from the residual quadratic form $(\bfs - \bsigma)’[\Cov(\bfs, \bfs’)]^{-1}(\bfs - \bsigma)$ (Browne, 1974){.comment}
define the constrained generalized least squares estimator $\tilde\btheta$ of $\btheta_o$ as the vector which satisfies $\bfh(\tilde\btheta)=\0$ and minimizes $Q(\btheta)$. It follows from the first order necessary condition that there exists a vector $\tilde \blambda’ = (\tilde \lambda_1,\ldots, \tilde \lambda_r)$ of Lagrange multipliers such that
\[\begin{align} \dot Q(\tilde \btheta) + \tilde L'\tilde \blambda &= \0\\ \h(\tilde \btheta) &=\0\,, \end{align}\]where $\dot Q= (\partial Q/\partial \theta_i)$ is the gradient vector of $Q(\btheta)$, and $L=(\partial h_i/\partial \theta_j)$ is an $r\times q$ matrix of partial derivatives with $\tilde L = L(\tilde\btheta)$.
The constrained maximum likelihood estimator $\hat\btheta$ of $\btheta_o$, is defined as the vector which satisfies $\bfh(\hat\btheta)=\0$ and minimizes the function
\[F(\btheta) = \log \vert \Sigma\vert +\tr(S\Sigma^{-1}) - \log \vert S\vert -p\,.\]Similarly, we have
\[\begin{align} \dot F(\btheta) + \hat L'\hat\blambda &= 0\\ \bfh(\hat\btheta) &= 0\,. \end{align}\]
- The generalized least squares estimator $\tilde\btheta$ is consistent.
- The joint asymptotic distribution of random variables $n^{1/2}(\tilde\btheta-\btheta_o)$ and $n^{1/2}\tilde\blambda$ is multivariate normal with zero mean vector and covariance matrix.
- The generalized least squares estimator $(\tilde \btheta, \tilde\blambda)$ is asymptotically equivalent to the maximum likelihood estimator $(\hat\btheta, \hat\blambda)$.
- The asymptotic distribution of $nQ(\tilde \theta)$ is chi-square with degrees of freedom $p(p+1)/2-(q-r)$.
Let $\tilde \btheta^*$ be the generalized least squares estimator that is subject to $\h^*(\btheta)=\0$, where $\h^*(\btheta) = (h_1(\btheta), \ldots,h_j(\btheta))$.
- The asymptotic distribution of $n[Q(\tilde \btheta) - Q(\tilde \btheta^*)]$ is chi-square with degrees of freedom $r-j$.
Proposition 4 provides an asymptotic test statistic for testing the null hypothesis
\[H_o = \Sigma_o = \Sigma(\btheta_o), \btheta_o\in \omega\]against the general alternative that $\Sigma_o$ is any symmetric positive definite $p\times p$ matrix. Another asymptotic test statistic for the null hypothesis against the specific alternative
\[H_1 = \Sigma = \Sigma(\btheta_o), \btheta_o\in \Omega\]is given by the next proposition. (how & why??)
- The asymptotic distribution of $-2^{-1}n\tilde\blambda’\tilde R^{-}\tilde\blambda$ under $H_0$ is chi-square with degrees of freedom equal to the rank of $R_0$.
It is well known that this test is asymptotically equal to Rao’s (1948) score test. These tests also are asymptotically equal to the Wald and LR chi-square difference tests.
The LM (Lagrange Multiplier) test for several omitted parameters can be broken down into a series of 1-df tests. Bentler (1983, 1985) developed a forward stepwise LM procedure where, at each step, the parameter is chosen that will maximally increase the LM chi-square, contingent on those already included.
It seems that the most frequent applications of LM tests in SEM are the following:
- $\theta_i=0$: evaluate necessity of an omitted parameter. This is often–maybe almost always– post-hoc.
- $\theta_i-\theta_j=0$. evaluate the appropriateness of an equality restriction. This can be a priori.
- in EQS (Structural Equation Modeling Software): evaluate constraints across multiple groups such as, for a given parameter, $\theta_i^{(1)}=\theta_i^{(2)}=\ldots=\theta_i^{(g)}$, i.e., differences are zero. This is typically a fully a priori test, e.g., of equal factor loadings across groups.
Simple nonlinear constraint such as $\theta_1 = \theta_2^2$ can be done with phantom variables, and do not require constrained optimization.