Asymptotic Properties of High-Dimensional Random Forests

Posted on Nov 09, 2021

This note is Chi, C.-M., Vossler, P., Fan, Y., & Lv, J. (2021). Asymptotic Properties of High-Dimensional Random Forests. ArXiv:2004.13953 [Math, Stat]..

The empirical success and popularity of random forests raise a natural question of how to understand its underling mechanisms from the theoretical perspective.

recent work on the consistency of random forests.

some of the earlier consistency results usually considered certain simplified versions of the original random forests algorithm, where the splitting rules are assumed to be independent of the response.
some contributes to the consistency of the original version of the random forests algorithm for the classical setting of fixed-dimensional ambient feature space.
some consistency results with the rates of convergence in terms of the number of informative features in sparse models by assuming a simplified version of the random forests algorithm.
additional theoretical results on random forests includes the pointwise consistency, asymptotic distribution, and confidence intervals of random forests predictions.

unclear how to characterize the consistency rate for the original version of the random forests algorithm in a general high-dimensional nonparametric regression setting.

main contribution:

characterize such consistency rate for random forests with non-fully grown trees
the random forests estimator can be consistent with a rate of some polynomial order of sample size
the bias analysis reveals how the bias depends on the sample size, column subsampling parameter, and forest height.

Q&A after the talk

Me: I’m just wondering, as is based on knowledge that there always a post pruning procedure in a tree, for example, in your theoretical analysis, it is a full complete tree, and the number of terminal nodes is two to the power k terminal knows. So i’m wondering if we have performed the pruning procedure, we would have much fewer nodes, so can your theoretical analysis apply on such an unbalanced tree.

Yingying Fan: yeah yeah that’s an excellent question actually I do not talk about it here, but the in our paper we wrote about age. So we can see it, but the reason we mentioned pruning the trees, because of the sad condition, so I think that people asked about that as it as it condition, when I presented it right and we said that for the defined for all cells where’s that just trying to find it here. And we’re thinking that maybe we can relax this assumption by conditioning just a larger sales for the small cells are just broad. Then, or maybe even make it a depend on the size of yourself. So that way is so first of all, you may be able to relax this assumption second that we may be able to get rid of this interesting error, because if we pull it down we don’t have them by. Though in this current of work we do not consider this approach and.

Published in categories Note

← previous next →

See all posts →

WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Asymptotic Properties of High-Dimensional Random Forests

Posted on Nov 09, 2021