WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Monetone B-spline Smoothing

Posted on (Update: ) 0 Comments
Tags: B-spline, Monetone Smoothing, Item Response Theory, Isotonic Regression, Quantile Curves

This note is based on He, X., & Shi, P. (1998). Monotone B-Spline Smoothing. Journal of the American Statistical Association, 93(442), 643–650., and the reproduced simulations are based on the updated algorithm, Ng, P., & Maechler, M. (2007). A fast and efficient implementation of qualitatively constrained quantile smoothing splines. Statistical Modelling, 7(4), 315–328.

\[\newcommand\RMSE{\mathrm{RMSE}}\]

Introduction

There are various smoothing techniques:

  • kernel smoothing
  • nearest neighbors
  • smoothing splines
  • local polynomials
  • B-spline approximation
  • neural networks

Focus: estimate regression curves that are known or required to be monotone.

Examples:

  • growth curves, such as weight or height of growing objects over time
  • item characteristic curve in the item response theory, which measures the probability of getting a correct answer for an examinee with given latent ability parameter.

Item-response theory (IRT) model

Isotonic regression

The paper points out that the isotonic regression undersmoothes the data and is very sensitive to outlying observations at the endpoints of the design space.

A natural idea is to combine smoothing with isotonic regression, and Mammen (1991) investigates the asymptotic performance of two different strategies,

  • $m_{SI}$: first smoothing then isotonic
  • $m_{IS}$: first isotonic then smoothing

$I$ splines

$I$ splines are obtained by integrating $B$ splines with positive coefficients to ensure monotonicity.

But the class of $I$ splines is relatively small compared to the class of monotone splines, and that there is always a possibilities that the fit to the data could be improved by allowing more general monotone splines.

Proposal

propose a method based on constrained least absolute deviation principle in the space of B-spline functions

  • LAD: can be solved by linear programs
  • $L_1$ loss function: the resulting fit approximates the conditional median function rather than the conditional mean

Method

Suppose $n$ pairs of observations ${(x_i, y_i),i=1,\ldots,n}$.

\[y_i = g(x_i) + u_i, i=1,\ldots,n\]

w.l.o.g, restrict $x\in [0, 1]$.

Let $0 = t_0 < t_1 < \cdots < t_{k_n}=1$ be a partition of $[0, 1]$. Let $N=k_n+p=(k_n-1)+(p+1)$ and $\pi_1(x),\ldots, \pi_N(x)$. Choose quadratic splines $p=2$,

\[\pi(x) = (\pi_1(x), \pi_2(x), \ldots, \pi_N(x))\]

Estimate $g$ by $\hat g_n(x)=\pi(x)^T\hat\alpha$ by minimizing

\[\sum_{i=1}^n\vert y_i-\pi(x_i)^T\hat\alpha\vert\]

s.t. monotonicity on $\hat g_n$, or equivalently

\[\pi'(t_j)^T\alpha\ge 0, j=1,\ldots,k_n\]

Rewrite it as

\[\sum_{i=1}^n r_i^+ + r_i^-\]

subject to

\[r_i^+\ge 0, r_i^-\ge 0, r_i^+ - r_i^- = y_i-\pi(x_i)^T\alpha, i=1,\ldots,n\]

and

\[\pi'(t_j)^T\alpha\ge 0, j=0,\ldots,k_n\]

Propose a set of knots equally spaced in percentile ranks.

View the problem of choosing $k$ as model selection to minimize

\[IC(k) = \log\left(\sum_i\vert y_i-\hat g_n(x_i)\vert\right) + 2(k+2)/n\]

where the factor $2$ is selected mainly due to their experience.

a possible shortcoming is that the degree of freedom should depends on the number of ties in $\alpha$.

The monotone B-spline smoothing generalizes directly to the problem of estimating monotone quantile functions by replace $\vert \cdot \vert$ with

\[\rho_\tau(r) = r(\tau - I(r < 0))\]

some applications, satisfy certain boundary conditions, such as $0\le g(0)\le g(1)\le 1$ for a probability curve.

Asymptotic results

assume iid errors

prove uniform convergence and gives the rates

Simulations

Compare the root mean squared error (RMSE) with competitors

\[\RMSE(q) = \left\{(n-2q)^{-1}\sum_{i=q+1}^{n-q}(\hat g_n(x_i)-g(x_i))^2\right\}^{1/2}\]

where $q$ aims to compare the performance in the interior of the design space, and $\RMSE(0)$ is the classical $\RMSE$.

Note that the error term is calculated as $\hat g_n(x_i) - g(x_i)$ instead of $\hat g_n(x_i) - y_i$, where $y_i = g(x_i) + u_i$.

The original competitors are

  • Kernel Smoothing with WARPing algorithm, which can automatically determine the bandwidth
  • Monotone Kernel, which first perform isotonic regression then smoothing

I did not find available implementation for the WARPing algorithm, and then I pick the Local Polynomial Regression Fitting (loess in R), and also consider $m_{SI}$ in additional to $m_{IS}$.

The implementation of the paper’s method is calling cobs from Ng & Maechler (2007)’s COBS package.

The results are

Note that the numeric results for the proposed method are quite close to the original paper.


Published in categories Report