Likelihood Annealing

Posted on Apr 23, 2025

This note is for Upadhyay, U., Kim, J. M., Schmidt, C., Schölkopf, B., & Akata, Z. (2023). Likelihood Annealing: Fast Calibrated Uncertainty for Regression (No. arXiv:2302.11012). arXiv. https://doi.org/10.48550/arXiv.2302.11012

deep learning approaches that allow uncertainty estimation for regression problems often converge slowly and yield poorly calibrated uncertainty estimates that can not be effectively used for quantification.
the work presents a fast calibrated uncertainty estimation method for regression tasks called Likelihood Annealing, that consistently improves the convergence of deep regression models and yields calibrated uncertainty without any post hoc calibration phase

Introduction

various formulations to provide accurate predictions for deep neural networks

Bayesian approaches
pseudo-ensembles
quantile regression

revisit deep regrssion models trained via MLE, which assumes a Gaussian distribution over the regression output and optimizes the negative log-likelihood to estimate the target and uncertainty.

they often converge slowly at the beginning of training due to a flat gradient landscape
they may even risk gradient explosion caused by a steep gradient landscape when reaching the optima

to reshape the aformentioned ill-posed gradient landscape that causes slow convergence and poorly calibrated uncertainty, the paper proposes a novel Likelihood Annealing (LIKA) scheme for deep regression models that alters the original gradients by formulating a temperature-dependent improper likelihood to be optimized during the learning phase

the proposed temperature-dependent likelihood brings crucial properties to regression uncertainty

the multimodal distribution on the regression target ensures that at high residuals (between output and ground truth, occurring in the initial learning phase), the gradients are much larger than the standard unimodal Gaussian distribution leading to faster convergence at the beginning of the learning phase
anneal the learning rate over the course of training along with the temperature that avoid gradient explosion towards the end of the learning phase, a problem with the standard heteroscedastic Gaussian-based likelihood distribution with sharp gradients at lower errors
construct the temperature-dependent likelihood such that the predicted uncertainty is encouraged to be calibrated at every step, by being close to the error between the prediction and ground truth

DNN typically estimate inaccurate uncertainty due to their deterministic form that is insufficient for characterizing the accurate confidence
Bayesian inference
- approximate inference
model two terms, predictive mean and variance, as an output of DNN to estimate the uncertainty directly from the network’s output
estimate different quantile for a given input
conformal predictions

two types of uncertainties in deep learning

Aleatoric: the uncertainty that arises from the inherent randomness in the data
Epistemic: the uncertainty that arises due to a lack of knowledge or information about the data

calibrating the inaccurate uncertainty in another way to estimate accurate uncertainty

the estimated credible interval with confidence level $\alpha$ is calibrated if $\alpha\%$ of the ground-truth target is covered in that interval.

post-processing methods for regression calibration:

introduce an auxiliary model to adjust the output of the pre-trained model based on Platt-scaling, while others use Gaussian process or maximum mean discrepancy
an auxiliary model with enough capacity will always be able to recalibrate, even if the predicted uncertainty is completely uncorrelated with the real uncertainty

Methodology: Likelihood Annealing

Likelihood Annealing (LIKA) belongs to the family of models that are designed to predict a distribution for the outputs, and the model is trained via a loss function derived from MLE

Kendall & Gal (2017): relax the i.i.d. assumption and learn to model the heteroscedasticity as well

assume the residual $\epsilon_i\sim N(0, \hat\sigma_i)$, the likelihood is a factored Gaussian distribution

\[P(\cD\mid \theta) = \prod_{i=1}^N \frac{1}{\sqrt{2\pi \hat \sigma_i^2}}\exp\left(-\frac{\vert y_i - \hat y_i\vert^2}{2\hat\sigma_i^2}\right)\]

the DNN is modified to output both the prediction (i.e., the mean of Gaussian) as well as the uncertainty estimate (i.e., the variance of Gaussian) learned using the above equation, i.e., $\Psi(x_i, \theta) = {\hat y_i, \hat\sigma_i}$

constructing temperature dependent improper likelihood

take the negative log of improper likelihood

which can be rewritten as

Experiments

Evaluation Metrics

to measure the quality of uncertainty estimates $(\hat\sigma^2)$, compute

the correlation coefficient (Corr. Coeff.) between uncertainty estimates $(\hat\sigma^2)$ and the error $(\vert \hat y - y\vert^2)$
uncertainty calibration error (UCE) for regression tasks. the uncertainty output $\hat \sigma^2$ of a deep model is partitioned into $M$ bins
- a weighted average of the difference between the predictive error and uncertainty is used $UCE = \sum_{m=1}^M \frac{\vert B_m\vert}{N}\vert err(B_m) - uncer(B_m)\vert$, where $err(B_m) = \frac{1}{\vert B_m\vert}\sum_{i\in B_m}\Vert \hat y_i - y_i\Vert^2$ and $uncer(B_m) = \frac{1}{\vert B_m\vert}\sum_{i\in B_m}\hat\sigma_i^2$
UCE for the re-calibarated estimates
expected calibration error
sharpness

Published in categories

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Likelihood Annealing

Posted on Apr 23, 2025

Introduction

Methodology: Likelihood Annealing

constructing temperature dependent improper likelihood

Experiments

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Likelihood Annealing

Posted on Apr 23, 2025

Introduction

Related Work

Methodology: Likelihood Annealing

constructing temperature dependent improper likelihood

Experiments