# MLE for MTP2

##### Posted on (Update: )

This post is based on Lauritzen, S., Uhler, C., & Zwiernik, P. (2019). Maximum likelihood estimation in Gaussian models under total positivity. The Annals of Statistics, 47(4), 1835–1863.

A $p$-variate real-valued distribution with density $f$ w.r.t. a product measure $\mu$ is **Multivariate totally positive of order 2 (MTP2)** if the density satisfies

$N(\mu, \Sigma) is MTP2 iff $K:=\Sigma^{-1}$ is a symmetric $M$-matrix, i.e., $K_{ij}\le 0$ for all $i\neq j$, or equivalently, if all partial correlations are nonnegative**by the following lemma**.

### Partial Correlation Graphs

Let $X, Y\in\bbR$ and $Z$ be a random vector. The partial correlation between $X$ and $Y$, given $Z$ is a measure of association between $X$ and $Y$ after removing the effect of $Z$.

Specifically, $\rho(X,Y\mid Z)$ is the correlation between $\epsilon_X$ and $\epsilon_Y$ where

Here, $\Pi_ZX$ is the projection of $X$ onto the linear space spanned by $Z$. That is, $\Pi_ZX=\bbE[X\mid Z]$.

For a graph, let $X=(X(1),\ldots,X(d))$ and let $\rho_{jk}$ denote the partial correlation between $X(j)$ and $X(k)$ given all the other variables. Let $R={\rho_{jk}}$ be the $d\times d$ matrix of partial correlations.

### Literatures on MTP2 Gaussian graphical model

- form a sub-class of
**non-frustrated**Gaussian graphical model - efficient structure estimation algorithms based on thresholding covariances after conditioning on subsets of variables of limited size
- Slawski and Hein: efficient learning procedures based on convex optimization.
**The paper is closely related to this approach.**

### Task

Given $n$ iid samples from $N(x,\Sigma)$, where $\Sigma$ is unknown with size $p\times p$. W.L.O.G, assume $\mu=0$ and focus on the estimation of $\Sigma$.

Let $S$ be the sample covariance matrix based on $n$ samples. Then the log-likelihood function is, up to additive and multiplicative constants, given by

Notations:

- $\bbS^p$: the cone (?) of real symmetric matrices of size $p\times p$
- $\bbS^p_{\succ 0}$: the positive definite elements
- $\bbS^p_{\succeq 0}$: the positive semidefinite elements

Since $M$-matrices form a convex subset of $\bbS^p_{\succ 0}$, the optimization problem for computing the MLE for MTP2 Gaussian models is a convex optimization problem.

Slawski and Hein:

- show that the MLE exists when $n\ge 2$
- provide empirical evidence that the MTP2 constraint serves as an implicit regularizer and leads to sparsity in $K$.

The paper:

- analyze the sparsity pattern of the MLE $\hat K$ under the MTP2 constraint.
- obtain a simple upper bound for the ML graph $G(\hat K)$.