Multidimensional Monotone Bayesian Additive Regression Trees
Posted on 0 Comments
This note is for Chipman, H. A., George, E. I., McCulloch, R. E., & Shively, T. S. (2021). mBART: Multidimensional Monotone BART. ArXiv:1612.01619 [Stat].
the flexible nonparametric nature of BART (Bayesian Additive Regression Trees) allows for a much richer set of possibilities than restrictive parametric approaches.
However, subject matter considerations sometimes warrant a minimal assumption of monotonicity in at least some of the predictors.
The paper introduce mBART, a constrained version of BART that can flexibly incorporate monotonicity in any predesignated subset of predictors using a multivariate basis of monotone trees,
For such monotone relationships, mBART provides
- function estimates that are smoother and more interpretable
- better out-of-sample predictive performance
- less post-data uncertainty
While many key aspects of the unconstrained BART model carry over directly to mBART, the introduction of monotonicity constraints necessitates a fundamental rethinking of how the model is implemented.
- particularly, the original BART MCMC relied on a conditional conjugacy that is no longer available in a monotonically constrained space.
For
\[Y = f(x) + \varepsilon, \qquad \varepsilon \sim N(0, \sigma^2)\]BART can quickly obtain full posterior inference for the unknown regression function,
\[f(x) = E(Y\mid x)\]and the unknown variance $\sigma^2$.
Many contexts where prior monotonicity assumptions arise naturally, such as
- older cars as well as higher mileage cars sell for less on average
- dose-response function estimation in epidemiology or market demand function estimation in economics
Rich literature on monotone function estimation, also known as isotonic regression, a wide variety of approaches both from the frequentist and Bayesian points of view
- constrained nonparametric maximum likelihood
- spline modeling
- Gaussian processes
- projection-based methods
In contrast to all these approaches, mBART is built on an easily constrained sum-of-trees approximation of $f$, composed of simple multivariate basis elements that can adaptively incorporate numerous predictors as well as their interactions.
The extension of BART to monotonically constrained setting essentially requires two basic innovations.
- it is necessary to develop general constraints for regression tree functions to be monotone in any predesignated set of coordinates.
- require a new approach for MCMC posterior computation.
BART
\[Y = \sum_{j=1}^m g(x;T_j,M_j) + \varepsilon, \qquad \varepsilon \sim N(0, \sigma^2)\]where each $T_j$ is a binary regression tree with a set $M_j$ of associated terminal node constants $\mu_{ij}$, and $g(x;T_j,M_j)$ is the function which assigns $\mu_{ij}\in M_j$ to $x$ according to the sequence of decision rules in $T_j$.
For a subset $S$ of the coordinates of $x\in\IR^n$, a function $f:\IR^n\rightarrow R$ is said to be monotone in $S$ if for each $x_i\in S$ and all values of $x$, $f$ satisfies
\[f(x_1,\ldots,x_i+\delta, \ldots,x_p) \ge f(x_1,\ldots, x_i,\ldots,x_p)\]for all $\delta > 0$ ($f$ is nondecreasing), or for all $\delta < 0$ ($f$ is nonincreasing)
It suffices to focus on the conditions for a single tree function $g(x;T,M)$ to be monotone in $S$ since a sum-of-trees function will be monotone in $S$ whenever each of the component trees is monotone in $S$.
Note that each terminal node region of $T$ will be a rectangular region of the form
\[R_k = \{x:x_i\in [L_{ik}, U_{ik}), i=1,\ldots, d\}\]where the interval $[L_{ik}, U_{ik})$ for each $x_i$ is determined by the sequence of splitting rules leading to $R_k$.
- $R_k$ is separated from $R_{k^\star}$ if $U_{ik} < L_{ik^\star}$ or $L_{ik} > U_{ik^\star}$ for some $i$.
- If $R_k$ and $R_{k^\star}$ are not separated, then
- if $L_{ik}=U_{ik^\star}$ for some $i$, $R_k$ is an above-neighbor of $R_{k^\star}$
- if $U_{ik}=L_{ik^\star}$ for some $i$, $R_k$ is an below-neighbor of $R_{k^\star}$
Constraints Conditions for Tree Monotonicity: it dovetail perfectly with the nature of the iterative MCMC simulation calculations.
A Constrained Regularization Prior
\[p((T_1,M_1), \ldots, (T_m, M_m), \sigma) = \left[\prod_{j}p(M_j\mid T_j)p(T_j)\right] p(\sigma)\]Let $C$ be the set of all $(T, M)$ which satisfy monotonicity constraints,
\[C=\{(T,M): g(x; T, M) \text{ is monotone in }x_i\in S\}\]Constrain the support only over $C$,
\[p(M_j\mid T_j)\propto \left[\prod_{i=1}^{b_j}p(\mu_{ij}\mid T_j)\right]\chi_C(T_j,M_j)\,,\]where $b_j$ is the number of bottom (terminal) nodes of $T_j$, and $\chi_C(\cdot)=1$ on $C$ and $=0$ otherwise.
Calibrating $T_j$ prior
the tree prior $p(T_j)$ is specified by three aspects
- the probability of a node having children at depth $d (=0,1,2,\ldots)$ is $\alpha(1+d)^{-\beta}, \alpha\in(0,1), \beta\in[0,\infty)$
- the uniform distribution over available predictors
- the uniform distribution on the available splitting values
Calibrating the $\sigma$ prior
use conditionally conjugate inverse chi-square distribution
Calibrating the $M_j\mid T_j$ prior
adopt normal densities as used in BART, but with different prior variance choices depending on whether or not $\mu_{ij}$ is constrained by the set $C$
MCMC Simulation of the Constrained Posterior
Examples
The Smoothing Effectiveness of mBART
Comparing Fits and Credible Regions of BART and mBART
Improving the RMSE with Monotone Regularization
Used Car Prices
The Stock Returns Example
Discussion
The benefits of mBART over BART will rest on the validity of the monotonicity assumptions for which mBART was designed.
- it will always be useful to compare the outputs from BART and mBART to judge the plausibility of any monotonicity assumptions
- even when monotonicity seems plausible, more formal testing procedures such as Bayes factors would be valuable to have
Future research directions:
- the development of theory for mBART
enlightening to investigate empirical and theoretical comparisons of mBART with the many monotonic alternatives proposed in the references.