# Canonical Variate Analysis

##### Posted on

This note is based on Campbell, N. A. (1979). CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS. 243.

Consider $g$ groups of data, with $v$ variables measured on each of $n_k$ individuals for the $k$-th group.

- $x_{km}$: the vector of obs. on the $m$-th individual for the $k$-th group

Define the sums of squares and products (SSQPR) matrix for the $k$-th group as

\[S_k = \sum_{m=1}^{n_k}(x_{km}-\bar x_k)(x_{km}-\bar x_k)^T\,,\]where

\[\bar x_k = n_k^{-1}\sum_{m=1}^{n_k}x_{km}\]and write

\[W=\sum_{k=1}^gS_k=S\]for the within-groups SSQPR matrix on

\[n_w=\sum_{k=1}^g(n_k-1)\]degrees of freedom.

Define the between-groups SSQPR matrix as

\[B = \sum_{k=1}^gn_k(\bar x_k-\bar x_T)(\bar x_k-\bar x_T)^T\]where

\[\bar x_T = n_T^{-1}\sum_{k=1}^gn_k\bar x_k\]and

\[n_T = \sum_{k=1}^gn_k = n\,.\]The simplest formulation of canonical variate analysis is the distribution-free one of finding that linear combination of the original variables which maximizes the variation between groups, relative to the variation within groups.

That is, find the canonical vector $c_1$ which maximizes the ration $c_1^TBc_1/c_1^TWc_1$; the vector is usually scaled so that $c_1^TWc_1=n_w$. The maximized ratio gives the first canonical root $f_1$.

Use of Lagrange multipliers leads directly to the eigenanalysis

\[(B-fW)c=0\,.\]Let $h=\min(v, g-1)$,

- $C=[c_1,\ldots,c_h]$
- $F=[f_1,\ldots,f_h]$

Then

\[BC=WCF\]with

\[C^TWC=n_wI\]and

\[C^TBC=n_wF\,,\]the canonical variates are uncorrelated both within and between groups, and have unit variance within groups.

Write

\[T = B+W\,,\]then an equivalent formulation is to maximize the ratio $c_1^TBc_1/c_1^TTc_1$, leading to the eigenanalysis

\[(B-r^2T)c=0\,.\]The ration $r_1^2$ is the square of the first sample canonical correlation coefficient. The vector $c_1$ is scaled so that $c_1^Tc_1=n_w(1-r_1^2)^{-1}=n_w(1+f_1)$, so that again $c_1^TBc_1=n_wr_1^2(1-r_1^2)^{-1}=n_wf$ and $c_1^Wc_1=n_w$.

Now assume that $x_{km}\sim N_v(\mu_k,\Sigma)$. The maximized likelihood when the $\mu_k$ are unrestricted is

\[(2\pi)^{-nv/2}\vert n^{-1}W\vert^{-n/2} e^{-nv/2}\]with $v(v+1)/2+gv$ estimated parameters. *why $e^{-nv/2}$*

The maximized likelihood for the hypothesis specifying equality of the $\mu_k$ is

\[(2\pi)^{-nv/2}\vert n^{-1}(W+B)\vert^{-n/2} e^{-nv/2}\]with $v(v+1)/2 + v$ estimated parameters. This leads to the well-known likelihood ratio statistic given by $\vert W\vert/\vert W+B\vert$, and commonly referred as Wilks $\Lambda$. The statistic $\Lambda$ may be written as

\[\Lambda = \vert W\vert / \vert W+B\vert = \vert W\vert / \vert T\vert =\vert I+W^{-1}B\vert^{-1} = \prod_{i=1}^h(1+f_i)^{-1} = \prod_{i=1}^h(1-r_i^2)\,.\](by the Cayley-Hamilton Theorem)