WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

XBART: Accelerated Bayesian Additive Regression Trees

Posted on
Tags: Regression Trees, Bayesian Additive Regression Trees

This post is based on He, J., Yalov, S., & Hahn, P. R. (2019). XBART: Accelerated Bayesian Additive Regression Trees. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, 1130–1138. https://proceedings.mlr.press/v89/he19a.html and He, J., & Hahn, P. R. (2023). Stochastic Tree Ensembles for Regularized Nonlinear Regression. Journal of the American Statistical Association, 118(541), 551–570. https://doi.org/10.1080/01621459.2021.1942012

  • BART is not merely a version of RF. The Bayesian perspective leads to a fundamentally new tree growing criterion and algorithm.

BART model is an additive error mean regression model

\[y_i = f(x_i) + \epsilon_i\]

The BART prior represents the unknown function $f(x)$ as a sum of many piecewise constant binary regression trees

\[f(x) = \sum_{l=1}^L g_l(x, T_l, \mu_l)\]

where

  • $T_l$: a regression tree
  • $\mu_l$: a vector of scalar means associated to the leaf nodes of $T_l$

the tree prior $p(T_l)$ is specified by three components

  • the probability of a node having children at depth $d$: $\alpha(1+d)^{-\beta}, \alpha\in (0, 1), \beta\in [0, +\infty)$
  • the uniform distribution over available predictors for splitting rule assignment at each interior node
  • the uniform distribution on the discrete set of available splitting values for the assigned predictor at each interior node

BART splitting criterion

the prior predictive distribution is simply a mean-zero multivariate normal distribution with covariance matrix

\[V = \tau JJ^\top + \sigma^2\bI\]

The BART MCMC

the sequences of Gibbs updates are

  1. $T_l, \mu_l \mid r_l, \sigma^2$ a. $T_l \mid r_l, \sigma^2$ b. $\mu_l\mid T_l, r_l, \sigma^2$
  2. $\sigma^2 \mid r$

Step 1(a) is handled with a random walk: Given a current tree $T$, modifications are proposed an either accepted or rejected according to a likelihood ratio.

Chipman et al. (1998) describes proposals comprising a birth/death pair

XBART

Grow-from-root backfitting

rather than making small moves to a given tree at iteration $k+1$, ignore the current tree and grow an entirely new tree from scratch

  • consider the no-split option, corresponding to a cut-point outside of the range of the available data
  • with $C$ available active cut-points and $V$ total variables, perform $C\times V +1$ likelihood evaluations

Pre-sorting Features for Efficiency

the BART criterion depends on the partition sums only

an important implication of this, for computation, is that with sorted predictor variables, the various cut-point integrated likelihoods can be computed rapidly via a single sweep through the data (per variable)

Recursively Defined Cut-points

consider a restricted number of cut-points $C$

taking every $j$-th value as an eligible split point

Sparse Proposal Distribution

================

  • a novel stochastic tree ensemble method for nonlinear regression
  • combine regularization and stochastic search strategies from Bayesian modeling with computationally efficient techniques from recursive partitioning algorithms
  • XBART provides accurate point-wise estimates of the mean function and does so faster than popular alternatives
  • using XBART To initialize the standard BART MCMC algorithm considerably improves credible interval coverage and reduces total run-time

A Recursive, Stochastic Fitting Algorithm

A tree $T$ is a set of split rules defining a rectangular partition of the covariate space to ${\cA_1,\ldots, \cA_B}$, where $B$ is the total number of terminal nodes of tree $T$

each rectangular cell $\cA_b$ is associated with leaf parameter $\mu_b$ and the pair $(T, \mu)$ parameterizes step function $g(\cdot)$ on covariate space


Published in categories