# Contrastive Learning: A Simple Framework and A Theoretical Analysis

##### Posted on

This note is based on

- Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A Simple Framework for Contrastive Learning of Visual Representations (arXiv:2002.05709). arXiv.
- Ji, W., Deng, Z., Nakada, R., Zou, J., & Zhang, L. (2021). The Power of Contrast for Feature Learning: A Theoretical Analysis (arXiv:2110.02473). arXiv.

## Simple Framework for Contrastive Learning

- composition of data augmentations plays a critical role in defining effective predictive tasks
- introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations
- contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning

Two classes of most mainstream approaches for learning effective visual representations without human supervision:

- generative: learn to generate or otherwise model pixels in the input space
- pixel-level generation is computationally expensive and may not be necessary for representation learning

- discriminative:
- learn representations using objective functions similar to those used for supervised learning, but the inputs and labels are derived from an unlabeled dataset

Four major components:

- stochastic data augmentation
- random cropping followed by resize back to the original size
- random color distortions
- random Gaussian blur

- a neural network base encoder $f(\cdot)$ extracts representation vectors from augmented data examples, use ResNet, $h_i = f(\tilde x_i) = \text{ResNet}(\tilde x_i)$
- a small neural network projection head $g(\cdot)$ that maps representation to the space where contrastive loss is applied, use MLP with one hidden layer $z_i=g(h_i)=W^{(2)}\sigma(W^{(1)}h_i)$. It is beneficial to define the contrastive loss on $z_i$ rather than $h_i$
- a contrastive loss function defined for a contrastive prediction task

## A Theoretical Analysis on Contractive Learning

This is based on Prof. Linjun Zhang’s talk on Ji, W., Deng, Z., Nakada, R., Zou, J., & Zhang, L. (2021). The Power of Contrast for Feature Learning: A Theoretical Analysis (arXiv:2110.02473). arXiv.

- Linear Representation
- Random Masking Augmentation
- Data Generating Process: spiked covariance model
- Self-Supervised Contrastive Learning vs Autoencoders/GANs
- both Autoencoders and GANs are related to PCA
- focus on comparisons between Self-supervised CL vs Autoencoders

it is a constant lower bound, so when $n, d, r$ varies, it is worse than

- Performance on Downstream Tasks

- Impact of Labeled Data in Supervised Contrastive Learning