Calibrating Regression Uncertainty via σ Scaling

Posted on May 01, 2025

Tags: Uncertainty Quantification, Variational Inference

This note is for Laves, M.-H., Ihler, S., Fast, J. F., Kahrs, L. A., & Ortmaier, T. (2020). Well-Calibrated Regression Uncertainty in Medical Imaging with Deep Learning. Proceedings of the Third Conference on Medical Imaging with Deep Learning, 393–412.

apply estimation of predictive uncertainty by variational Bayesian inference with Monte Carlo dropout to regression tasks and show why predictive uncertainty is systematically underestimated

suggest using $\sigma$ scaling with a single scalar value

Introduction

aim to estimate a continuous target value $y\in \IR^d$ given an input image $x$
Bayesian neural networks (BNN) and their approximation provide mathematical tools for reasoning the uncertainty
in general, predictive uncertainty can be split into two types:
- aleatoric uncertainty
- epistemic uncertainty
a well-accepted approach to quantify epistemic uncertainty is variational inference with Monte Carlo dropout, where dropout is used at test time to sample from the approximate posterior
however, uncertainty obtained by deep BNNs tends to be miscalibrated,

instead of exactly model error, one might only consider the ranking, so some calibration for ranking?

Platt scaling: calibration of uncertainty in regression

given a pre-trained, miscalibrated model $H$, an auxiliary model $R: [0, 1]^d \rightarrow [0, 1]^d$ is trained, that yields a calibrated regressor $R\circ H$
this was applied to bounding box regression
an auxiliary model with enough capacity will always be able to recalibrate, even if the predicted uncertainty is completely uncorrelated with the real uncertainty
calibrate via $R$ is possible if enough independent and i.i.d. data is available
in medical imaging, large data sets are usually hard to obtain, which can cause $R$ to overfit the calibration set

the main contributions of the paper:

analyze and provide theoretical background why deep models for regression are miscalibrated with regard to predictive uncertainty
suggest to use $\sigma$ scaling in a separate calibration phase to tackle underestimation of uncertainty
perform extensive experiments on four different datasets

Methods

2.1 Conditional Log-likelihood for Regression

revisit regression under the MAP framework to derive direct estimation of heteroscedastic aleatoric uncertainty

the goal of the regression model is to predict a target value $y$ given some new input $x$ and a training set $\cD$ of $m$ inputs ${x_1,\ldots, x_m}$ and their corresponding (observed) target values $\{y_1,\ldots, y_m\}$

assume that $y$ has a Gaussian distribution $N(y; \hat y(x), \hat\sigma^2(x))$ with mean $\hat y(x)$ and variance $\hat\sigma(x)$

a neural network with parameters $\theta$

\[f_\theta(x) = [\hat y(x), \hat\sigma^2(x)], \hat y\in \IR^d, \hat\sigma^2 \ge 0\]

outputs these values for a given input.

did not consider the bias between $y$ and $\hat y(x)$?

with $m$ i.i.d. random samples, the conditional log-likelihood is given by

it is equivalent to minimizing the negative log-likelihood

in this case, $\hat\sigma_\theta$ captures the uncertainty that is inherent in the data (aleatoric uncertainty)

2.2 Biased estimation of $\sigma$

ignoring the dependence through $\theta$, the solution decouples estimation of $\hat y$ and $\hat\sigma$.

definitely

2.3 $\sigma$ Scaling for Aleatoric Uncertainty

2.4 Well-Calibrated Estimation of Predictive Uncertainty

so far the MAP point estimate for $\theta$ which does not consider uncertainty in the parameters

to quantify both aleatoric and epistemic uncertainty, extend $f_\theta$ into a fully Bayesian model under the variational inference framework with Monte Carlo dropout.

In MC dropout, the model $f_{\tilde\theta}$ is trained with dropout and dropout is applied at test time by performing $N$ stochastic forward passes to sample from the approximate Bayesian posterior $\tilde\theta\sim q(\theta)$

use GBS instead for bootstrap?

apply $\sigma$ scaling to recalibrate the predictive uncertainty $\hat\Sigma^2$

this allows a lower squared error but reduces underestimation of uncertainty

2.5 Expected Uncertainty Calibration Error for Regression

Experiments & Results

Discussion & Conclusion

well-calibrated uncertainty from MC dropout is able to reliably detect a shift in the data distribution
$\sigma$ scaling is simple to implement, does not change the predictive mean $\hat y$, and does not affect the model accuracy
many factors (e.g., network capacity, weight decay, dropout configuration) influencing the uncertainty that have not been discussed here

Published in categories

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.