# Multidimensional Monotone Bayesian Additive Regression Tree

##### Posted on Nov 17, 20210 Comments

the flexible nonparametric nature of BART (Bayesian Additive Regression Trees) allows for a much richer set of possibilities than restrictive parametric approaches.

However, subject matter considerations sometimes warrant a minimal assumption of monotonicity in at least some of the predictors.

The paper introduce mBART, a constrained version of BART that can flexibly incorporate monotonicity in any predesignated subset of predictors using a multivariate basis of monotone trees,

For such monotone relationships, mBART provides

• function estimates that are smoother and more interpretable
• better out-of-sample predictive performance
• less post-data uncertainty

While many key aspects of the unconstrained BART model carry over directly to mBART, the introduction of monotonicity constraints necessitates a fundamental rethinking of how the model is implemented.

• particularly, the original BART MCMC relied on a conditional conjugacy that is no longer available in a monotonically constrained space.

For

$Y = f(x) + \varepsilon, \qquad \varepsilon \sim N(0, \sigma^2)$

BART can quickly obtain full posterior inference for the unknown regression function,

$f(x) = E(Y\mid x)$

and the unknown variance $\sigma^2$.

Many contexts where prior monotonicity assumptions arise naturally, such as

• older cars as well as higher mileage cars sell for less on average
• dose-response function estimation in epidemiology or market demand function estimation in economics

Rich literature on monotone function estimation, also known as isotonic regression, a wide variety of approaches both from the frequentist and Bayesian points of view

• constrained nonparametric maximum likelihood
• spline modeling
• Gaussian processes
• projection-based methods

In contrast to all these approaches, mBART is built on an easily constrained sum-of-trees approximation of $f$, composed of simple multivariate basis elements that can adaptively incorporate numerous predictors as well as their interactions.

The extension of BART to monotonically constrained setting essentially requires two basic innovations.

• it is necessary to develop general constraints for regression tree functions to be monotone in any predesignated set of coordinates.
• require a new approach for MCMC posterior computation.

BART

$Y = \sum_{j=1}^m g(x;T_j,M_j) + \varepsilon, \qquad \varepsilon \sim N(0, \sigma^2)$

where each $T_j$ is a binary regression tree with a set $M_j$ of associated terminal node constants $\mu_{ij}$, and $g(x;T_j,M_j)$ is the function which assigns $\mu_{ij}\in M_j$ to $x$ according to the sequence of decision rules in $T_j$.

For a subset $S$ of the coordinates of $x\in\IR^n$, a function $f:\IR^n\rightarrow R$ is said to be monotone in $S$ if for each $x_i\in S$ and all values of $x$, $f$ satisfies

$f(x_1,\ldots,x_i+\delta, \ldots,x_p) \ge f(x_1,\ldots, x_i,\ldots,x_p)$

for all $\delta > 0$ ($f$ is nondecreasing), or for all $\delta < 0$ ($f$ is nonincreasing)

It suffices to focus on the conditions for a single tree function $g(x;T,M)$ to be monotone in $S$ since a sum-of-trees function will be monotone in $S$ whenever each of the component trees is monotone in $S$.

Note that each terminal node region of $T$ will be a rectangular region of the form

$R_k = \{x:x_i\in [L_{ik}, U_{ik}), i=1,\ldots, d\}$

where the interval $[L_{ik}, U_{ik})$ for each $x_i$ is determined by the sequence of splitting rules leading to $R_k$.

• $R_k$ is separated from $R_{k^\star}$ if $U_{ik} < L_{ik^\star}$ or $L_{ik} > U_{ik^\star}$ for some $i$.
• If $R_k$ and $R_{k^\star}$ are not separated, then
• if $L_{ik}=U_{ik^\star}$ for some $i$, $R_k$ is an above-neighbor of $R_{k^\star}$
• if $U_{ik}=L_{ik^\star}$ for some $i$, $R_k$ is an below-neighbor of $R_{k^\star}$

Constraints Conditions for Tree Monotonicity: it dovetail perfectly with the nature of the iterative MCMC simulation calculations.

## A Constrained Regularization Prior

$p((T_1,M_1), \ldots, (T_m, M_m), \sigma) = \left[\prod_{j}p(M_j\mid T_j)p(T_j)\right] p(\sigma)$

Let $C$ be the set of all $(T, M)$ which satisfy monotonicity constraints,

$C=\{(T,M): g(x; T, M) \text{ is monotone in }x_i\in S\}$

Constrain the support only over $C$,

$p(M_j\mid T_j)\propto \left[\prod_{i=1}^{b_j}p(\mu_{ij}\mid T_j)\right]\chi_C(T_j,M_j)\,,$

where $b_j$ is the number of bottom (terminal) nodes of $T_j$, and $\chi_C(\cdot)=1$ on $C$ and $=0$ otherwise.

### Calibrating $T_j$ prior

the tree prior $p(T_j)$ is specified by three aspects

• the probability of a node having children at depth $d (=0,1,2,\ldots)$ is $\alpha(1+d)^{-\beta}, \alpha\in(0,1), \beta\in[0,\infty)$
• the uniform distribution over available predictors
• the uniform distribution on the available splitting values

### Calibrating the $\sigma$ prior

use conditionally conjugate inverse chi-square distribution

### Calibrating the $M_j\mid T_j$ prior

adopt normal densities as used in BART, but with different prior variance choices depending on whether or not $\mu_{ij}$ is constrained by the set $C$

## Discussion

The benefits of mBART over BART will rest on the validity of the monotonicity assumptions for which mBART was designed.

• it will always be useful to compare the outputs from BART and mBART to judge the plausibility of any monotonicity assumptions
• even when monotonicity seems plausible, more formal testing procedures such as Bayes factors would be valuable to have

Future research directions:

• the development of theory for mBART

enlightening to investigate empirical and theoretical comparisons of mBART with the many monotonic alternatives proposed in the references.

Published in categories Note