WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Clustering of High-dim GMMs with EM

Posted on
Tags: Gaussian Mixture Model, Expectation-Maximization, High-Dimensional

This post is for Cai, T. T., Ma, J., & Zhang, L. (2019). CHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality. The Annals of Statistics, 47(3), 1234–1267. https://doi.org/10.1214/18-AOS1711

study clustering of high-dimensional Gaussian mixtures, and propose a procedure, called CHIME, that is based on the EM algorithm and a direct estimation method for the sparse discriminant vector.

establish the optimal rate of convergence for the excess misclustering error and show that CHIME is minimax rate optimal

Introduction

  • k-means and k-medians are centroid-based
  • Hierarchical clustering builds a hierarchy of clusters based on the empirical masures of dissimilarity among sets of observations

the paper consider clustering of data generated from Gaussian mixtures with the focus on the high-dimensional setting

\[Y ~ \begin{cases} 1 & \text{w.p. }1-w^\star\\ 2 & \text{w.p. }w^\star \end{cases} Z\mid Y = k \sim N_p(\mu_k^\star, \Sigma^\star)\]

suppose $n$ unlabeled observations

\[z^{(1)}, z^{(2)},\ldots, z^{(n)} \sim (1-w^\star)N_p(\mu_1^\star, \Sigma^\star) + w^\star N_p(\mu_2^\star, \Sigma^\star)\]

In the ideal case where the parameter $\theta^\star = {w^\star, \mu_1^\star, \mu_2^\star, \Sigma^\star}$ is known,

Image

in practice, the parameters are unknown and a data driven method is needed

a common approach in the low-dimensional setting is to simply plug the sample values

in high-dimensional setting where $p$ can be very larger than $n$, the sample covariance matrix may not even be invertible and it is difficult to estimate the precision matrix

  • one way is to directly estimate the discriminant direction.

for unsupervised learning, the class labels are not observed.

the paper introduces CHIME

  • use the posterior probability of $z^{(i)}$ in class $k$ as the sample label of $z^{(i)}$

Image

Image


Published in categories