# Multidimensional Monotone Bayesian Additive Regression Tree

##### Posted on 0 Comments

This note is for Chipman, H. A., George, E. I., McCulloch, R. E., & Shively, T. S. (2021). mBART: Multidimensional Monotone BART. ArXiv:1612.01619 [Stat].

the flexible nonparametric nature of BART (Bayesian Additive Regression Trees) allows for a much richer set of possibilities than restrictive parametric approaches.

**However, subject matter considerations sometimes warrant a minimal assumption of monotonicity in at least some of the predictors.**

The paper introduce mBART, a constrained version of BART that can flexibly incorporate monotonicity in any predesignated subset of predictors using a multivariate basis of monotone trees,

For such monotone relationships, mBART provides

- function estimates that are smoother and more interpretable
- better out-of-sample predictive performance
- less post-data uncertainty

While many key aspects of the unconstrained BART model carry over directly to mBART, the introduction of monotonicity constraints necessitates a fundamental rethinking of how the model is implemented.

- particularly, the original BART MCMC relied on a conditional conjugacy that is no longer available in a monotonically constrained space.

For

\[Y = f(x) + \varepsilon, \qquad \varepsilon \sim N(0, \sigma^2)\]BART can quickly obtain full posterior inference for the unknown regression function,

\[f(x) = E(Y\mid x)\]and the unknown variance $\sigma^2$.

Many contexts where prior monotonicity assumptions arise naturally, such as

- older cars as well as higher mileage cars sell for less on average
- dose-response function estimation in epidemiology or market demand function estimation in economics

Rich literature on monotone function estimation, also known as isotonic regression, a wide variety of approaches both from the frequentist and Bayesian points of view

- constrained nonparametric maximum likelihood
- spline modeling
- Gaussian processes
- projection-based methods

In contrast to all these approaches, mBART is built on an easily constrained sum-of-trees approximation of $f$, composed of simple multivariate basis elements that can adaptively incorporate numerous predictors as well as their interactions.

The extension of BART to monotonically constrained setting essentially requires two basic innovations.

- it is necessary to develop general constraints for regression tree functions to be monotone in any predesignated set of coordinates.
- require a new approach for MCMC posterior computation.

BART

\[Y = \sum_{j=1}^m g(x;T_j,M_j) + \varepsilon, \qquad \varepsilon \sim N(0, \sigma^2)\]where each $T_j$ is a binary regression tree with a set $M_j$ of associated terminal node constants $\mu_{ij}$, and $g(x;T_j,M_j)$ is the function which assigns $\mu_{ij}\in M_j$ to $x$ according to the sequence of decision rules in $T_j$.

For a subset $S$ of the coordinates of $x\in\IR^n$, a function $f:\IR^n\rightarrow R$ is said to be monotone in $S$ if for each $x_i\in S$ and all values of $x$, $f$ satisfies

\[f(x_1,\ldots,x_i+\delta, \ldots,x_p) \ge f(x_1,\ldots, x_i,\ldots,x_p)\]for all $\delta > 0$ ($f$ is nondecreasing), or for all $\delta < 0$ ($f$ is nonincreasing)

It suffices to focus on the conditions for a single tree function $g(x;T,M)$ to be monotone in $S$ since a sum-of-trees function will be monotone in $S$ whenever each of the component trees is monotone in $S$.

Note that each terminal node region of $T$ will be a rectangular region of the form

\[R_k = \{x:x_i\in [L_{ik}, U_{ik}), i=1,\ldots, d\}\]where the interval $[L_{ik}, U_{ik})$ for each $x_i$ is determined by the sequence of splitting rules leading to $R_k$.

- $R_k$ is separated from $R_{k^\star}$ if $U_{ik} < L_{ik^\star}$ or $L_{ik} > U_{ik^\star}$ for some $i$.
- If $R_k$ and $R_{k^\star}$ are not separated, then
- if $L_{ik}=U_{ik^\star}$ for some $i$, $R_k$ is an above-neighbor of $R_{k^\star}$
- if $U_{ik}=L_{ik^\star}$ for some $i$, $R_k$ is an below-neighbor of $R_{k^\star}$

Constraints Conditions for Tree Monotonicity: it dovetail perfectly with the nature of the iterative MCMC simulation calculations.

## A Constrained Regularization Prior

\[p((T_1,M_1), \ldots, (T_m, M_m), \sigma) = \left[\prod_{j}p(M_j\mid T_j)p(T_j)\right] p(\sigma)\]Let $C$ be the set of all $(T, M)$ which satisfy monotonicity constraints,

\[C=\{(T,M): g(x; T, M) \text{ is monotone in }x_i\in S\}\]Constrain the support only over $C$,

\[p(M_j\mid T_j)\propto \left[\prod_{i=1}^{b_j}p(\mu_{ij}\mid T_j)\right]\chi_C(T_j,M_j)\,,\]where $b_j$ is the number of bottom (terminal) nodes of $T_j$, and $\chi_C(\cdot)=1$ on $C$ and $=0$ otherwise.

### Calibrating $T_j$ prior

the tree prior $p(T_j)$ is specified by three aspects

- the probability of a node having children at depth $d (=0,1,2,\ldots)$ is $\alpha(1+d)^{-\beta}, \alpha\in(0,1), \beta\in[0,\infty)$
- the uniform distribution over available predictors
- the uniform distribution on the available splitting values

### Calibrating the $\sigma$ prior

use conditionally conjugate inverse chi-square distribution

### Calibrating the $M_j\mid T_j$ prior

adopt normal densities as used in BART, but with different prior variance choices depending on whether or not $\mu_{ij}$ is constrained by the set $C$

## MCMC Simulation of the Constrained Posterior

## Examples

### The Smoothing Effectiveness of mBART

### Comparing Fits and Credible Regions of BART and mBART

### Improving the RMSE with Monotone Regularization

### Used Car Prices

### The Stock Returns Example

## Discussion

The benefits of mBART over BART will rest on the validity of the monotonicity assumptions for which mBART was designed.

- it will always be useful to compare the outputs from BART and mBART to judge the plausibility of any monotonicity assumptions
- even when monotonicity seems plausible, more formal testing procedures such as Bayes factors would be valuable to have

Future research directions:

- the development of theory for mBART

enlightening to investigate empirical and theoretical comparisons of mBART with the many monotonic alternatives proposed in the references.