
Yingying Fan is Centennial Chair in Business Administration and Professor in Data Sciences and Operations Department of the Marshall School of Business at the University of Southern California. She received her Ph.D. in Operations Research and Financial Engineering from Princeton University in 2007. She was Lecturer in the Department of Statistics at Harvard University from 2007-2008 and Dean's Associate Professor in Business Administration at USC from 2018-2021. Her research interests include statistics, data science, machine learning, economics, big data and business applications. Her latest works have focused on statistical inference for networks, and AI models empowered by some most recent developments in random matrix theory and statistical learning theory. She is the recipient of the Institute of Mathematical Statistics Medallion Lecture (2023), the International Congress of Chinese Mathematicians 45-Minute Invited Lecture (2022), Centennial Chair in Business Administration (2021, inaugural holder), NSF Focused Research Group (FRG) Grant (2021), Fellow of Institute of Mathematical Statistics (2020), Associate Member of USC Norris Comprehensive Cancer Center (2020), Fellow of American Statistical Association (2019), Dean's Associate Professor in Business Administration (2018), NIH R01 Grant (2018), the Royal Statistical Society Guy Medal in Bronze (2017), USC Marshall Dean's Award for Research Excellence (2017), the USC Marshall Inaugural Dr. Douglas Basil Award for Junior Business Faculty (2014), the American Statistical Association Noether Young Scholar Award (2013), and the NSF Faculty Early Career Development (CAREER) Award (2012). She has served as an associate editor of The Annals of Statistics (2022-present), Information and Inference (2022-present), Journal of the American Statistical Association (2014-present), Journal of Econometrics (2015-2018), Journal of Business & Economic Statistics (2018-present), The Econometrics Journal (2012-present), and Journal of Multivariate Analysis (2013-2016).
As a flexible nonparametric learning tool, random forests algorithm has been widely applied to various real applications with appealing empirical performance, even in the presence of high-dimensional feature space. Unveiling the underlying mechanisms has led to some important recent theoretical results on the consistency of the random forests algorithm and its variants. However, to our knowledge, all existing works concerning random forests consistency in high dimensional setting were established for various modified random forests models where the splitting rules are independent of the response. In light of this, in this paper we derive the consistency rates for the random forests algorithm associated with the sample CART splitting criterion, which is the one used in the original version of the algorithm (Breiman2001), in a general high-dimensional nonparametric regression setting through a bias-variance decomposition analysis. Our new theoretical results show that random forests can indeed adapt to high dimensionality and allow for discontinuous regression function. Our bias analysis characterizes explicitly how the random forests bias depends on the sample size, tree height, and column subsampling parameter. Some limitations on our current results are also discussed.