WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Multiple Descent of Minimum-Norm Interpolants

Posted on 0 Comments
Tags: Interpolation, Kernel Ridge Regression

This note is for Liang, T., Rakhlin, A., & Zhai, X. (2020). On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels. ArXiv:1908.10292 [Cs, Math, Stat].

The paper investigate the generalization and consistency of minimum-norm interpolants

\[\hat f\in \argmin_{f\in \cH} \Vert f\Vert_{\cal H}\quad \st \quad f(x_i) = y_i,i=1,\ldots,n\]

of the data $(x_1,y_1),\ldots, (x_n,y_n)\in \IR^d\times \IR$ w.r.t. a norm in a RKHS $\cH$.

The interpolant is also called Kernel Ridgeless Regression, can be viewed as a limiting solution of

\[\argmin_{f\in\cH} \frac 1n \sum_{i=1}^n (f(x_i)-y_i)^2 + \lambda \Vert f\Vert_\cH^2\]

as $\lambda\rightarrow 0$.

Recent literature has focused on understanding risk of estimators that interpolate data, including

  • Belkin et al. (2018): nonparametric local rules
  • Belkin et al. (2019) & Hastie et al. (2019): high-dimensional linear regression
  • Ghorbani et al. (2019): random features model
  • Feldman (2019): classification with rare instances
  • Belkin et al. (2018), Liang and Rakhlin (2018), Rakhlin and Zhai (2018): kernel (ridgeless) regression
    • Rakhlin and Zhai (2018): the minimum-norm interpolant w.r.t. Laplace kernel is not consistent if dimensionality $d$ of the data is constant w.r.t. $n$
    • Liang and Rakhlin (2018): when $n \asymp d$, risk can be upper bounded by quantity that can be small under favorable spectral properties of the data and the kernel matrix.

The paper studies the performance of the minimum-norm interpolants in a general scaling regime $d\asymp n^\alpha, \alpha\in (0, 1)$.

Non-monotone behavior of the upper bound on the risk of the minimum-norm interpolant,

the interpolating solution provably has a diminishing (in sample size) out-of-sample error for most of the scalings of $d$ and $n$

image


Published in categories Note