# Monetone B-spline Smoothing

##### Posted on (Update: ) 0 Comments

This note is based on He, X., & Shi, P. (1998). Monotone B-Spline Smoothing. Journal of the American Statistical Association, 93(442), 643–650., and the reproduced simulations are based on the updated algorithm, Ng, P., & Maechler, M. (2007). A fast and efficient implementation of qualitatively constrained quantile smoothing splines. Statistical Modelling, 7(4), 315–328.

\[\newcommand\RMSE{\mathrm{RMSE}}\]## Introduction

There are various smoothing techniques:

- kernel smoothing
- nearest neighbors
- smoothing splines
- local polynomials
- B-spline approximation
- neural networks

**Focus:** estimate regression curves that are known or required to be monotone.

Examples:

- growth curves, such as weight or height of growing objects over time
- item characteristic curve in the item response theory, which measures the probability of getting a correct answer for an examinee with given latent ability parameter.

### Item-response theory (IRT) model

### Isotonic regression

The paper points out that the isotonic regression undersmoothes the data and is very sensitive to outlying observations at the endpoints of the design space.

A natural idea is to combine smoothing with isotonic regression, and Mammen (1991) investigates the asymptotic performance of two different strategies,

- $m_{SI}$: first smoothing then isotonic
- $m_{IS}$: first isotonic then smoothing

### $I$ splines

$I$ splines are obtained by integrating $B$ splines with positive coefficients to ensure monotonicity.

But the class of $I$ splines is relatively small compared to the class of monotone splines, and that there is always a possibilities that the fit to the data could be improved by allowing more general monotone splines.

### Proposal

propose a method based on constrained least absolute deviation principle in the space of B-spline functions

- LAD: can be solved by linear programs
- $L_1$ loss function: the resulting fit approximates the conditional median function rather than the conditional mean

## Method

Suppose $n$ pairs of observations ${(x_i, y_i),i=1,\ldots,n}$.

\[y_i = g(x_i) + u_i, i=1,\ldots,n\]w.l.o.g, restrict $x\in [0, 1]$.

Let $0 = t_0 < t_1 < \cdots < t_{k_n}=1$ be a partition of $[0, 1]$. Let $N=k_n+p=(k_n-1)+(p+1)$ and $\pi_1(x),\ldots, \pi_N(x)$. Choose quadratic splines $p=2$,

\[\pi(x) = (\pi_1(x), \pi_2(x), \ldots, \pi_N(x))\]Estimate $g$ by $\hat g_n(x)=\pi(x)^T\hat\alpha$ by minimizing

\[\sum_{i=1}^n\vert y_i-\pi(x_i)^T\hat\alpha\vert\]s.t. monotonicity on $\hat g_n$, or equivalently

\[\pi'(t_j)^T\alpha\ge 0, j=1,\ldots,k_n\]Rewrite it as

\[\sum_{i=1}^n r_i^+ + r_i^-\]subject to

\[r_i^+\ge 0, r_i^-\ge 0, r_i^+ - r_i^- = y_i-\pi(x_i)^T\alpha, i=1,\ldots,n\]and

\[\pi'(t_j)^T\alpha\ge 0, j=0,\ldots,k_n\]Propose a set of knots equally spaced in percentile ranks.

View the problem of choosing $k$ as model selection to minimize

\[IC(k) = \log\left(\sum_i\vert y_i-\hat g_n(x_i)\vert\right) + 2(k+2)/n\]where the factor $2$ is selected mainly due to their experience.

a possible shortcoming is that the degree of freedom should depends on the number of ties in $\alpha$.

The monotone B-spline smoothing generalizes directly to the problem of estimating monotone quantile functions by replace $\vert \cdot \vert$ with

\[\rho_\tau(r) = r(\tau - I(r < 0))\]some applications, satisfy certain boundary conditions, such as $0\le g(0)\le g(1)\le 1$ for a probability curve.

## Asymptotic results

assume iid errors

prove uniform convergence and gives the rates

## Simulations

Compare the root mean squared error (RMSE) with competitors

\[\RMSE(q) = \left\{(n-2q)^{-1}\sum_{i=q+1}^{n-q}(\hat g_n(x_i)-g(x_i))^2\right\}^{1/2}\]where $q$ aims to compare the performance in the interior of the design space, and $\RMSE(0)$ is the classical $\RMSE$.

Note that the error term is calculated as $\hat g_n(x_i) - g(x_i)$ instead of $\hat g_n(x_i) - y_i$, where $y_i = g(x_i) + u_i$.

The original competitors are

- Kernel Smoothing with WARPing algorithm, which can automatically determine the bandwidth
- Monotone Kernel, which first perform isotonic regression then smoothing

I did not find available implementation for the WARPing algorithm, and then I pick the Local Polynomial Regression Fitting (`loess`

in R), and also consider $m_{SI}$ in additional to $m_{IS}$.

The implementation of the paper’s method is calling `cobs`

from Ng & Maechler (2007)’s `COBS`

package.

The results are

Note that the numeric results for the proposed method are quite close to the original paper.