WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Quantile Regression Forests

Posted on
Tags: Random Forests, Quantile Regression

This post is based on Meinshausen, N. (2006). Quantile Regression Forests. 17. since a coming seminar is related to such topic.

Introduction

In the standard regression analysis, the conditional mean minimizes the expected squared error loss,

\[E(Y\mid X=x) = \argmin_z E\left\{ (Y-z)^2\mid X=x\right\}\,.\]

Beyond the Conditional Mean

The conditional mean illuminates just one aspect of the conditional distribution of a response variable $Y$, yet neglects all other features of possible interest. The $\alpha$-quantile $Q_\alpha(x)$ is defined as

\begin{equation}\label{eq:q_def} Q_\alpha(x) = \inf{y:F(y\mid X=x)\ge \alpha}\,. \end{equation}

Prediction Intervals

Quantile regression can be used to build prediction intervals. A $95\%$ prediction interval for the value of $Y$ is given by

\[I(x) = [Q_{.025}(x), Q_{.975}(x)]\,.\]

Outlier Detection

No generally applicable rule of what precisely constitutes an “extreme” observation. One could possibly flag observations as outliers if the distance between $Y$ and the median of the conditional distribution is large; “large” being measured in comparison to some robust measure of dispersion like the conditional median absolute deviation or the conditional interquartile range.

Estimating Quantiles from Data

Let the loss function $L_\alpha$ be defined for $0 < \alpha < 1$ by the weighted absolute deviations

\[L_\alpha(y, q) = \begin{cases} \alpha \vert y-q\vert & y > q\\ (1-\alpha)\vert y-q\vert & y\le q \end{cases}\]

the conditional quantiles minimize the expected loss $E(L_\alpha)$,

\begin{equation}{eq:q_sol} Q_\alpha(x) =\argmin_q E\left{L_\alpha(Y, q)\mid X=x\right}\,. \end{equation}

Random Forests

Random forests grows an ensemble of trees, using $n$ independent observations

\[(Y_i, X_i), i=1,\ldots,n\,.\]

randomness:

  • for each tree, use a bagged version of the training data (bootstrap samples)
  • at each node, only a random subset of predictor variables is considered for splitpoint selection at each node. The size of the random subset (mtry) is the single tuning parameter.

The prediction of a single tree $T(\theta)$ for a new data point $X=x$ is obtained by averaging over the observed values in leaf $\ell(x,\theta)$, where $\theta$ determines how a tree is grown. Define the weight vector

\[w_i(x, \theta) = \frac{1_{X_i\in R_{\ell(x,\theta)}}}{\# \{j:X_j\in R_{\ell(x,\theta)}\}}\,.\]

The prediction of a single tree, given covariate $X=x$, is then the weighted average of the original observations $Y_i,i=1,\ldots,n$

\[\text{single tree: }\hat \mu(x) = \sum_{i=1}^nw_i(x,\theta)Y_i\,.\]

Using random forests, the conditional mean $E(Y\mid X=x)$ is approximated by the averaged prediction of $k$ single trees, each constructed with an iid vector $\theta_t, t=1,\ldots, k$. Let $w_i(x)$ be the average of $w_i(x, \theta)$ over this collection of trees,

\[w_i(x) = \frac 1k\sum_{t=1}^k w_i(x,\theta_t)\,.\]

The prediction of random forests is then

\[\text{random forests: }\hat\mu(x) = \sum_{i=1}^n w_i(x)Y_i\,.\]

Quantile Regression Forests

Note that

\[F(y\mid X=x) = P(Y\le y\mid X=x) = E(1_{Y\le y}\mid X=x)\,,\]

then define an approximation to $E(1{Y\le y}\mid X=x)$ by the weighted mean over the observations of $1{Y\le y}$,

\[\hat F(y\mid X=x) = \sum_{i=1}^n w_i(x)1_{Y_i\le y}\,,\]

using the same (?? a little confused, using the tree build from classical random forests? even no need to change the loss function when building the tree? if changed, do the classical building algorithm still works??) weights $w_i(x)$ as for random forests.

Estimate $\hat Q_\alpha(x)$ of the conditional quantiles $Q_\alpha(x)$ are obtained by plugging $\hat F(y\mid X=x)$ into \eqref{eq:q_def}. (This confirms my question, the estimation is not from \eqref{eq:q_sol})_{:.comment}

The key difference between quantile regression forests and random forests is that:

for each node in each tree, random forests keeps only the mean of the observations that fall into the node and neglects all other information. In contrast, quantile regression forests keeps the value of all observations in this node, not just their mean, and assesses the conditional distribution based on this information.

Example: Air Quality

Consider the predictions of next data ozone levels based on the dataset airquality, which consists of daily air quality measurements in New York, May to September 1973. Least-squares regression tries to estimate the conditional mean of ozone levels. It gives little information about the fluctuations of ozone levels around this predicted conditional mean. It might for example be of interest to find an ozone level that is with high probability not surpassed. This can be achieved with quantile regression forests.

library(quantregForest)
################################################
##  Load air-quality data (and preprocessing) ##
################################################

data(airquality)
set.seed(1)


## remove observations with mising values
airquality <- airquality[ !apply(is.na(airquality), 1,any), ]

## number of remining samples
n <- nrow(airquality)


## divide into training and test data
indextrain <- sample(1:n,round(0.6*n),replace=FALSE)
Xtrain     <- airquality[ indextrain,2:6]
Xtest      <- airquality[-indextrain,2:6]
Ytrain     <- airquality[ indextrain,1]
Ytest      <- airquality[-indextrain,1]

################################################
##     compute Quantile Regression Forests    ##
################################################

qrf <- quantregForest(x=Xtrain, y=Ytrain)
qrf <- quantregForest(x=Xtrain, y=Ytrain, nodesize=10,sampsize=30)

## predict 0.1, 0.5 and 0.9 quantiles for test data
conditionalQuantiles  <- predict(qrf,  Xtest)
print(conditionalQuantiles[1:4,])

## predict 0.1, 0.2,..., 0.9 quantiles for test data
conditionalQuantiles  <- predict(qrf, Xtest, what=0.1*(1:9))
print(conditionalQuantiles[1:4,])

Consistency

Under specific assumptions. It holds pointwise for every $x\in \calB$ that \(\sup_{y\in \IR}\vert \hat F(y\mid X=x) - F(y\mid X=x)\vert \rightarrow_p 0, n\rightarrow \infty\,.\)

In other words, the error of the approximation to the conditional distribution converges uniformly in probability to zero for $n\rightarrow \infty$. Quantile regression forests is thus a consistent way of estimating conditional distributions and quantile functions.


Published in categories Note