Benchopt: Benchmarks for ML Optimizations

Posted on Nov 01, 2024 (Update: Nov 04, 2024)

This is the note for Moreau, T., Massias, M., Gramfort, A., Ablin, P., Bannier, P.-A., Charlier, B., Dagréou, M., Tour, T. D. la, Durif, G., Dantas, C. F., Klopfenstein, Q., Larsson, J., Lai, E., Lefort, T., Malézieux, B., Moufad, B., Nguyen, B. T., Rakotomamonjy, A., Ramzi, Z., … Vaiter, S. (2022). Benchopt: Reproducible, efficient and collaborative optimization benchmarks (No. arXiv:2206.13424). arXiv. https://doi.org/10.48550/arXiv.2206.13424

attempts for “reproducibility crisis”

open datasets (OpenML)
standardized code sharing
benchmarks (MLPerf)
the NeurIPS and ICLR reproducibility challenges
new journals

optimization algorithms pervade almost every area of ML, from empirical risk minimization, variational inference to reinforcement learning, it is thus crucial to know which methods to use depending on the task and setting

while some papers in optimization for ML provide extensive validations, many others fall short in this regard due to lack of time and resources, and in turn feature results that are hard to reproduce by other researchers
in addition, both performance and hardware evolve over time, which eventually make static benchmarks obsolete

build a living, reproducible, and standardized state of the art that can serve as a foundation for future research

benchmarks are meant to evolve over time, so Benchopt offers a modular structure through which a benchmark can be easily extened with new objective functions, datasets, and solvers by the addition of a single file of code

three canonical problems:

$\ell_2$-regularized logistic regression
the Lasso
training of ResNet18 architecture for image classification

The Benchopt library

the considered problems are of the form

\[\theta^\star \in \argmin_{\theta\in\Theta} f(\theta; \cD, \Lambda)\]

three types of object classes

objective
datasets
solver

First example: $\ell_2$-regularized logistic regression

$X\in \IR^{n\times p}$
$y\in {-1, 1}^n$

$\ell_2$-regularized logistic regression provides a GLM indexed by $\theta^\star \in \IR^p$ to discriminate the classes by solving

\[\theta^\star =\argmin_{\theta\in \IR^p} \sum_{i=1}^n \log(1+\exp(-y_iX_i^\top\theta)) + \frac{\lambda}{2}\Vert\theta\Vert_2^2\]

where $\lambda > 0$. It is a strongly convex with a Lipschitz gradient

most classical method: Newton’s method
- quasi-Netwon: L-BFGS
- truncated Netwon
methods based on stochastic estimates of the gradient: stochastic gradient descent
- core idea: use cheap and noisy estimates of the gradient
- SGD generally converges either slowly due to decreasing step sizes, or to a neighborhood of the solution for constant step sizes
- variance-reduced adaptations, such as SAG, SAGA, and SVRG
methods based on coordinate descent

Second example: the Lasso

iterative reweighted least-squares (IRLS)
iterative soft thresholding algorithm (ISTA), the accelerated FISTA

Third example: how standard is a benchmark on ResNet18?

image classification has become a de facto standard to validate novel methods in the field.

Among the different network architectures, ResNets are extensively used in the community as they provide strong and versatile baselines

Published in categories

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.