UMAP: Uniform Manifold Approximation and Projection

Posted on Jan 07, 2026

This note is for McInnes, L., Healy, J., & Melville, J. (2020). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction (No. arXiv:1802.03426). arXiv.

UMAP is constructed from a theoretical framework based in Riemannian geometry and algebric topology.

dimension reduction algorithms tends to fall into two categories

preserve the pairwise distance structure amongst all the data samples: PCA, MDS, Sammon mapping
favor the preservation of local distances over global distances: t-SNE, Isomap, LargeVis, Laplacian eigenmaps, diffusion maps

2 Theoretical Foundations for UMAP

2.1 Uniform distribution of data on a manifold anmd geodesic approximation

the first step is to approximate the manifold the paper assumes the data (approximately) lies on

3 A Computational View of UMAP

UMAP can ultimately be described in terms of, construction of, and operations on, weighted graphs.

this situates UMAP in the class of k-neighbor based graph learning algorithms such as Laplacian Eigenmaps, Isomap and t-SNE

as with other k-neighbour graph based algorithms, UMAP can be described in two phases.

in the first phase a particular weighted k-neighbour graph is constructed

in the second phase a low dimensional layout of this graph is computed

UMAP assume the following axioms

there exists a manifold on which the data would be uniformly distributed
the underlying manifold of interest is locally connected
preserving the topological structure of this manifold is the primary goal

any algorithm that attempts to use a mathematical structure akin to a k-neighbor graph to approximate a manifold must follow a similar basic structure

Graph Construction
- Construct a weighted k-neighbor graph
- apply some transform on the edges to ambient local distance
- deal with the inherent asymmetry of the k-neighbor graph
Graph Layout
- define an objective function that preserves desired characteristics of this k-neighbor graph
- find a low dimensional representation which optimizes this objective function

3.1 Graph Construction

$X = {x_1,\ldots, x_N}$: the input dataset
metric (dissimilarity measure) $d: X\times X\rightarrow \IR_{\ge 0}$
${x_{i_1},\ldots, x_{i_k}}$: $k$ nearest neighbors of $x_i$ under the metric $d$

for each $x_i$, define $\rho_i$ and $\sigma_i$, let

\[\rho_i = \min\{d(x_i, x_{i_j})\mid 1\le j\le k, d(x_i, x_{i_j}) > 0\}\]

and set $\sigma_i$ to be the value such that

\[\sum_{j=1}^k \exp\left(\frac{-\max(0, d(x_i, x_{i_j}) - \rho_i )}{\sigma_i}\right) = \log_2(k)\]

define a weighted directed graph $\bar G = (V, E, w)$, where the weight function is

\[w((x_i, x_{i_j})) = \exp\left(\frac{-\max(0, d(x_i, x_{i_j}) - \rho_i )}{\sigma_i}\right)\]

let $A$ be the weighted adjacency matrix of $\bar G$, and consider the symmetric matrix

\[B = A + A^\top - A \circ A^\top\]

where $\circ$ is the Hadamard (or pointwise) product

if one interprets the value of $A_{ij}$ as the probability that the directed edge from $x_i$ to $x_j$ exists, then $B_{ij}$ is the probability that at least one of the two directed edges (from $x_i$ to $x_j$ and from $x_j$ to $x_i$) exists.

The UMAP graph $G$ is then an undirected weighted graph whose adjacency matrix is given by $B$

3.2 Graph Layout

UMAP uses a force directed graph layout algorithm in low dimensional space

any force directed layout algorithm requires a description of both attractive and replusive forces. the algorithms proceeds by iteratively applying attractive and repulsive forces at each edge or vertex

the inputs are:

$X$: the dataset to have its dimension reduced
$n$: the neighborhood size to use for local metric approximation
$d$: the dimension of the target reduced space
min-dist: an algorithmic parameter controlling the layout
n-epochs: control the amount of optimization work to perform

5. Practical Efficacy

Pen digits
COIL 20
COIL 100
Mouse scRNA-seq
Statlog (Shuttle)
MNIST
F-MNIST
Flow cytometry
GoogleNews word vectors

5.1 Qualitative Comparison of Multiple Algorithms

5.2 Quantitative Comparison of Multiple Algorithms

the performance of a kNN classifier trained on the embedding space for a variety of datasets

5.3 Embedding Stability

since UMAP makes use of both stochastic approximate nearest neighbor search, and stochastic gradient descent with negative sampling for optimization, the resulting embedding is necessarily different from run to run, and under sub-sampling of the data.

to measure the stability of an embedding we make use of the normalized Procrustes distance to measure the distance between two potentially comparable distributions.

5.4 Computational Performance Comparisons

scaling with embedding dimension
scaling with ambient dimension
scaling with the number of samples

6 Weakness

similarly to most non-linear dimension reduction techniques (including t-SNE and Isomap), UMAP lacks the strong interpretability of PCA and related techniques such a NMF
one of the core assumptions of UMAP is that there exists manifold structure in the data. care must be taken with smaller sample sizes of noisy data
UMAP is derived from the axiom that local distance is of more importance than long range distances (similar to techniques like t-SNE and LargeVis)
- while the paper believes

Published in categories

← previous

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.