WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Hierarchical Multi-label Contrastive Learning

Posted on
Tags: Hierarchical Multi-label Classification, Contrastive Learning

This post is for Zhang, Shu, Ran Xu, Caiming Xiong, and Chetan Ramaiah. “Use All the Labels: A Hierarchical Multi-Label Contrastive Learning Framework,” 16660–69, 2022.

present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes.

in representation learning frameworks, where a single embedding function can be used in a variety of downstream tasks, utilizing all of the supervisory signal available is vital.

two losses

  • HiMulCon:
  • HiConE: prevent the hierarchy violation. It ensures that the loss from pairs farther apart in the label space is never smaller than the loss from pairs that are closer.

The supervised constrastive learning approach formulated positive pairs by sampling from different instances of the same class, as opposed to augmenting different views of the same image in the unsupervised setting.

Constrative Learning

a set of $N$ randomly sampled labeled pairs is defined as ${x_k, y_k}$, where $x, y$ represent the samples and labels individually and $k=1,\ldots,N$.

two augmentations to each sample

let $i$ be the index of one augmented sample, and $j$ be the index of the other, where $i\in A = {1,\ldots, 2N}$ and $j\neq i$.

$i$ is the anchor and $j$ is the positive sample

The contrastive loss is defined as

\[L^{self} = -\sum_{i\in A}\log \frac{\exp(f_i\cdot f_j/\tau)}{\sum_{k\in A\backslash i}\exp(f_i\cdot f_k/\tau)}\]

given the presence of labels, positive pairings for the anchor goes from one-to-many positive-negative samples in SimCLR to many-to-many samples. The loss is defined as

\[L^{sup} = \sum_{i\in I}\frac{1}{\vert P(i)\vert}\sum_{p\in P}\log \frac{\exp(f_i\cdot f_p/\tau)}{\sum_{a\in A\backslash i}\exp(f_i\cdot f_a/\tau)}\]


  • $P$ represents the indices of all positives in the multi-view batches except for $i$
  • $A$ represents all images in the batch

Hierarchical Multi-label Contrastive Loss

Define $L$ as the set of all label levels

the loss for a pairing of the anchor image, indexed by $i$ and a positive image at level $l$ is defined as

\[L^{pair}(i, p_l^i) = \log \frac{\exp(f_i\cdot f_p^l/\tau)}{\sum_{a\in A\backslash i}\exp(f_i\cdot f_a/\tau)}\]

The HiMuL-Con is

\[L^{HMC} = \sum_{l\in L}\frac{1}{\vert L\vert} \sum_{i\in I}\frac{-\lambda_l}{\vert P_l(i)\vert} \sum_{p_l\in P_l}L^{pair}(i, p_l^i)\]

Hierarchical Constraint Enforcing Loss

the loss between image pairs from a higher level in the hierarchy will never be higher than the loss between pairs from a lower level.

HiConE is computed sequentially in decreasing order of $l$ from $L$ to 0, which helps ensure that the pair loss at level $l-1$ can never be less than the max pair loss at $l$.

Published in categories Note