In this note, the material about Jackknife is based on Wasserman (2006) and Efron and Hastie (2016), while the Jackknife estimation of Mutual Information is based on Zeng et al. (2018).

This post is based on Benko, M., Härdle, W., & Kneip, A. (2009). Common functional principal components. The Annals of Statistics, 37(1), 1–34.

kjytay’s blog summarizes some properties of equicorrelation matix, which has the following form,

This note is based on Ma, J., Du, K., & Gu, G. (2019). An efficient exponential twisting importance sampling technique for pricing financial derivatives. Communications in Statistics - Theory and Methods, 48(2), 203–219.

This post is based on the talk given by Dr. Yue Wang at the Department of Statistics and Data Science, Southern University of Science and Technology on Jan. 04, 2020.

This post is based on the talk given by T. Kanamori at the 11th ICSA International Conference on Dec. 22nd, 2019.

Prof. Inchi HU will give a talk on Large Scale Inference for Chi-squared Data tomorrow, which proposes the Tweedie’s formula in the Bayesian hierarchical model for chi-squared data, and he mentioned a thought-provoking paper, Efron, B. (2011). Tweedie’s Formula and Selection Bias. Journal of the American Statistical Association, 106(496), 1602–1614., which is the focus of this note.

This post is based on the talk, Gradient-based Sparse Principal Component Analysis, given by Dr. Yixuan Qiu at the Department of Statistics and Data Science, Southern University of Science and Technology on Jan. 05, 2020.

This post is based on the Pao-Lu Hsu Award Lecture given by Prof. Hongyu Zhao at the 11th ICSA International Conference on Dec. 21th, 2019.

This post is based on the seminar, Data Acquisition, Registration and Modelling for Multi-dimensional Functional Data, given by Prof. Shi.

This post is based on the material of the second lecture of STAT 6050 instructed by Prof. Wicker, and mainly refer some more formally description from the book, Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar - Foundations of Machine Learning-The MIT Press (2012).

This post is based on the Peter Hall Lecture given by Prof. Jianqing Fan at the 11th ICSA International Conference on Dec. 20th, 2019.

Prof. Jon A. WELLNER introduced the application of a new multiplier inequality on lasso in the distinguish lecture, which reminds me that it is necessary to read more theoretical results of lasso, and so this is the post, which is based on Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical Learning with Sparsity. 362.

This post is based on the talk, Next-Generation Statistical Methods for Association Analysis of Now-Generation Sequencing Studies, given by Dr. Xiang Zhan at the Department of Statistics and Data Science, Southern University of Science and Technology on Jan. 05, 2020.

This note is based on

- Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M., & Lin, X. (2011). Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test. American Journal of Human Genetics, 89(1), 82–93.
- Wang, M. H., Weng, H., Sun, R., Lee, J., Wu, W. K. K., Chong, K. C., & Zee, B. C.-Y. (2017). A Zoom-Focus algorithm (ZFA) to locate the optimal testing region for rare variant association tests. Bioinformatics, 33(15), 2330–2336.

This post is based on the slides for the talk given by Zijian Guo at The International Statistical Conference In Memory of Professor Sik-Yum Lee

This post is the online version of my report for the Project 2 of STAT 5050 taught by Prof. Wei.

This post is the notes for Mithani et al. (2009).

This note is based on Huang, Y.-T. (2019). Variance component tests of multivariate mediation effects under composite null hypotheses. Biometrics, 0(0).

This post is based on section 8.3 of Casella and Berger (2001).

This post is based on Li, Y., Wang, N., & Carroll, R. J. (2010). Generalized Functional Linear Models With Semiparametric Single-Index Interactions. Journal of the American Statistical Association, 105(490), 621–633.

This post is based on Li, H., & Zhou, Q. (2019). Gaussian DAGs on network data. ArXiv:1905.10848 [Cs, Stat].

This post is based on Fan, J., Weng, H., & Zhou, Y. (2019). Optimal estimation of functionals of high-dimensional mean and covariance matrix. ArXiv:1908.07460 [Math, Stat].

This note is based on Li (1991) and Ma and Zhu (2012).

This note is based on Shao, J., Wang, Y., Deng, X., & Wang, S. (2011). Sparse linear discriminant analysis by thresholding for high dimensional data. The Annals of Statistics, 39(2), 1241–1265.

This note is based on Fan, J., & Fan, Y. (2008). High-dimensional classification using features annealed independence rules. The Annals of Statistics, 36(6), 2605–2637.

This post is based on Candes, E., & Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. The Annals of Statistics, 35(6), 2313–2351.

This post is based on Lauritzen, S., Uhler, C., & Zwiernik, P. (2019). Maximum likelihood estimation in Gaussian models under total positivity. The Annals of Statistics, 47(4), 1835–1863.

This note is based on Chapter 15 of Wainwright, M. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint (Cambridge Series in Statistical and Probabilistic Mathematics). Cambridge: Cambridge University Press.

This note is for Doucet, A., & Johansen, A. M. (2009). A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of Nonlinear Filtering, 12(656–704), 3. For the sake of clarity, I split the general SMC methods (section 3) into my next post.

I read the topic in kiytay’s blog: Proximal operators and generalized gradient descent, and then read its reference, Hastie et al. (2015), and write some program to get a better understanding.

This note is for Luo, W., Xing, J., Milan, A., Zhang, X., Liu, W., Zhao, X., & Kim, T.-K. (2014). Multiple Object Tracking: A Literature Review. ArXiv:1409.7618 [Cs].

Gibbs sampler is an iterative algorithm that constructs a dependent sequence of parameter values whose distribution converges to the target joint posterior distribution.

Prof. YUAN Ming will give a distinguish lecture on Low Rank Tensor Methods in High Dimensional Data Analysis. To get familiar with his work on tensor, I read his paper, Yuan, M., & Zhang, C.-H. (2016). On Tensor Completion via Nuclear Norm Minimization. Foundations of Computational Mathematics, 16(4), 1031–1068., which is the topic of this post.

This note is based on Wong, S. W. K., Liu, J. S., & Kou, S. C. (2018). Exploring the conformational space for protein folding with sequential Monte Carlo. The Annals of Applied Statistics, 12(3), 1628–1654.

Larry wrote that “Noninformative priors are a lost cause” in his post, LOST CAUSES IN STATISTICS II: Noninformative Priors, and he mentioned his review paper Kass and Wasserman (1996) on noninformative priors. This note is for this paper.

This note is based on Loskot, P., Atitey, K., & Mihaylova, L. (2019). Comprehensive review of models and methods for inferences in bio-chemical reaction networks.

This report shows how to use importance sampling to estimate the expectation.

The first peep to SMC as an abecedarian, a more comprehensive note can be found here.

There is an important probability distribution used in many applications, the chain-structured model.

This report implements the simulation of growing a polymer under the self-avoid walk model, and summary the sequential importance sampling techniques for this problem.

There are my notes when I read the paper called Genetic network inference.

There are my notes when I read the paper called System Genetic Approach.

There are my notes when I read the paper called Maximal information component analysis.

There are my notes when I read the paper called Detecting Novel Associations in Large Data Sets.

This is the implement in R of MINE.

Use the e1071 library in R to demonstrate the support vector classifier and the SVM.

Any time series without a constant mean over time is nonstationary.

For a given time series, how to choose appropriate values for $p, d, q$

Monte Carlo plays a key role in evaluating integrals and simulating stochastic systems, and the most critical step of Monte Carlo algorithm is sampling from an appropriate probability distribution $\pi (\mathbf x)$. There are two ways to solve this problem, one is to do **importance sampling**, another is to produce statistically dependent samples based on the idea of **Markov chain Monte Carlo sampling**.

“The p value was never meant to be used the way it’s used today.” –Goodman

The conjugate gradient method is an iterative method for solving a linear system of equations, so we can use conjugate method to estimate the parameters in (linear/ridge) regression.

**Survival analysis** examines and models the time it takes for events to occur. It focuses on the distribution of survival times. There are many well known methods for estimating unconditional survival distribution, and they examines the relationship between survival and one or more predictors, usually terms **covariates** in the survival-analysis literature. And Cox Proportional-Hazards regression model is one of the most widely used method of survival analysis.

This post is the notes of this paper.

Sebastian Schreiber gave a talk titled Persistence of species in the face of environmental stochasticity.

*Repeated Linear Regression* means that repeat the fitting of linear regression for many times, and there are some common parts among these regressions.

*Repeated Linear Regressions* refer to a set of linear regressions in which there are several same variables.

Discuss three different methods for formulating stochastic epidemic models.

This post aims to clarify the relationship between rates and probabilities.

The note is for Gilks, W. R., Richardson, S., & Spiegelhalter, D. (Eds.). (1995). Markov chain Monte Carlo in practice. CRC press..

The note is for Chapter 1 of *Soyer, Orkun S., ed. 2012 Evolutionary Systems Biology. Advances in Experimental Medicine and Biology, 751. New York: Springer*.

The note is for Chapter 2 of *Soyer, Orkun S., ed. 2012 Evolutionary Systems Biology. Advances in Experimental Medicine and Biology, 751. New York: Springer*.

This note is based on Yuan, Y., Shen, X., Pan, W., & Wang, Z. (n.d.). Constrained likelihood for reconstructing a directed acyclic Gaussian graph. Biometrika.

The note is for Green, P.J. (1995). “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination”. Biometrika. 82 (4): 711–732.

This note is based on Cook, R. D., & Forzani, L. (2019). Partial least squares prediction in high-dimensional regression. The Annals of Statistics, 47(2), 884–908.

This note is for Section 3 of Doucet, A., & Johansen, A. M. (2009). A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of Nonlinear Filtering, 12(656–704), 3., and it is the complement of my previous post.

This note is for Wang, L., Wang, S., & Bouchard-Côté, A. (2018). An Annealed Sequential Monte Carlo Method for Bayesian Phylogenetics. ArXiv:1806.08813 [q-Bio, Stat].

This is the note for Neal, R. M. (1998). Annealed Importance Sampling. ArXiv:Physics/9803008.

This post caught a glimpse of the pseudolikelihood.

This note is based on Fan, X., Pyne, S., & Liu, J. S. (2010). Bayesian meta-analysis for identifying periodically expressed genes in fission yeast cell cycle. The Annals of Applied Statistics, 4(2), 988–1013.

This note is based on Chapter 7 of Hoff PD. A first course in Bayesian statistical methods. Springer Science & Business Media; 2009 Jun 2.

This note is based on Varin, C., Reid, N., & Firth, D. (2011). AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS. Statistica Sinica, 21(1), 5–42., a survey of recent developments in the theory and application of composite likelihood.

In Prof. Shao’s wonderful talk, Wandering around the Asymptotic Theory, he mentioned the Studentized U-statistics. I am interested in the derivation of the variances in the denominator.

This note is based on LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

This note is for Polson, N. G., & Sokolov, V. (2017). Deep Learning: A Bayesian Perspective. Bayesian Analysis, 12(4), 1275–1304.

The paper, Greenshtein and Ritov (2004), is recommended by Larry Wasserman in his post Consistency, Sparsistency and Presistency.

I encounter the term RIP in Larry Wasserman’s post, RIP RIP (Restricted Isometry Property, Rest In Peace), and also find some material in Hastie et al.’s book: Statistical Learning with Sparsity about RIP.

This note is based on Karl Sigman’s IEOR 6711: Continuous-Time Markov Chains.

I learned Stein’s Paradox from Larry Wasserman’s post, STEIN’S PARADOX, perhaps I had encountered this term before but I cannot recall anything about it. (~~I am guilty~~)

This report is motivated by comments under Larry’s post, Modern Two-Sample Tests.

This post reviewed the topic of path sampling in the lecture slides of STAT 5020, and noted a general path sampling described by Gelman and Meng (1998), then used a toy example to illustrate it with Stan programming language.

A brief summary of the post, Eid ma clack shaw zupoven del ba.

I noticed that the papers of matrix/tensor completion always talk about the Bernstein inequality, then I picked the Bernstein Bounds discussed in Wainwright (2019).

This note is for Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of Science. The Annals of Applied Statistics, 1(1), 17–35.

This note is for Volgushev, S., Chao, S.-K., & Cheng, G. (2019). Distributed inference for quantile regression processes. The Annals of Statistics, 47(3), 1634–1662.

This note is for Han, Q., & Wellner, J. A. (2017). Convergence rates of least squares regression estimators with heavy-tailed errors.

I happened to read Yixuan’s blog about a question related to the course *Statistical Inference*, whether two marginal distributions can determine the joint distribution. The question is adopted from Exercise 4.47 of Casella and Berger (2002).

This note is for Efron’s slide: Frequentist Accuracy of Bayesian Estimates, which is recommended by Larry’s post: Shaking the Bayesian Machine.

This post is based on Chapter 7 of Statistical Learning with Sparsity: The Lasso and Generalizations, and I wrote R program to reproduce the simulations to get a better understanding.

This note is based on Larry’s post, Mixture Models: The Twilight Zone of Statistics.

This post is mainly based on Hastie et al. (2015), and incorporated with some materials from Watson (1992).

This post is for the survey paper, Meijering, E., Dzyubachyk, O., & Smal, I. (2012). Chapter nine - Methods for Cell and Particle Tracking. In P. M. conn (Ed.), Methods in Enzymology (pp. 183–200).

Larry discussed the normalizing constant paradox in his blog.

This note is based on the Chapter 6 of Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical Learning with Sparsity. 362..

In the last lecture of STAT 5030, Prof. Lin shared one of the results in the paper, Neykov, M., Liu, J. S., & Cai, T. (2016). L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs. Journal of Machine Learning Research, 17(87), 1–37., or say the start point for the paper—the following Lemma. Because it seems that the condition and the conclusion is completely same with Sliced Inverse Regression, except for a direct interpretation—the least square regression.

Materials from STAT 5030.

This note consists of the lecture material of STAT 6060 taught by Prof. Shao, four homework (indexed by “Homework”) and several personal comments (indexed by “Note”).

This note is based on Chapter 1 of Lehmann EL, Romano JP. Testing statistical hypotheses. Springer Science & Business Media; 2006 Mar 30.

Last two days, I attended the conference Medicine Meets AI 2019: East Meets West, which help me know more AI from the industrial and medical perspective.

This post is based on Hastie, T., Montanari, A., Rosset, S., & Tibshirani, R. J. (2019). Surprises in High-Dimensional Ridgeless Least Squares Interpolation. 53.

This note is for Cockayne, J., Oates, C. J., Ipsen, I. C. F., & Girolami, M. (2018). A Bayesian Conjugate Gradient Method. Bayesian Analysis.

This note is based on Li Zhang, Yuan Li, & Nevatia, R. (2008). Global data association for multi-object tracking using network flows. 2008 IEEE Conference on Computer Vision and Pattern Recognition, 1–8.

This note is based on Campbell, N. A. (1979). CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS. 243.

This post is based on Ristic, B., Clark, D., & Vo, B. (2010). Improved SMC implementation of the PHD filter. 2010 13th International Conference on Information Fusion, 1–8.

This post is based on Li, T., Corchado, J. M., Sun, S., & Fan, H. (2017). Multi-EAP: Extended EAP for multi-estimate extraction for SMC-PHD filter. Chinese Journal of Aeronautics, 30(1), 368–379.

This note is based on Li, Q., & Hao, S. (2018). An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks. ArXiv:1803.01299 [Cs].

This post is based on Li, S., Cai, T. T., & Li, H. (2019). Inference for high-dimensional linear mixed-effects models: A quasi-likelihood approach. ArXiv:1907.06116 [Stat].

This post is based on Ramdas, A., Zrnic, T., Wainwright, M., & Jordan, M. (2018). SAFFRON: An adaptive algorithm for online control of the false discovery rate. ArXiv:1802.09098 [Cs, Math, Stat].

This note is based on Chapter 13 of Nocedal, J., & Wright, S. (2006). Numerical optimization. Springer Science & Business Media.

This note is based on Yu, G., Bien, J., & Tibshirani, R. (2019). Reluctant Interaction Modeling. ArXiv:1907.08414 [Stat].

This post is based on Rossell, D., & Rubio, F. J. (2019). Additive Bayesian variable selection under censoring and misspecification. ArXiv:1907.13563 [Math, Stat].

Nocedal and Wright (2006) and Boyd and Vandenberghe (2004) present slightly different introduction on Interior-point method. More specifically, the former one only considers equality constraints, while the latter incorporates the inequality constraints.

This post is based on Section 6.4 of Hastie, Trevor, Robert Tibshirani, and Martin Wainwright. “Statistical Learning with Sparsity,” 2016, 362.

This note is for Thomas, O., Dutta, R., Corander, J., Kaski, S., & Gutmann, M. U. (2016). Likelihood-free inference by ratio estimation. ArXiv:1611.10242 [Stat]., and I got this paper from Xi’an’s blog.

This note is based on de Boor, C. (1978). A Practical Guide to Splines, Springer, New York.

This post is based on Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis (Second edition). New York, NY: Springer.

This note is based on the survey paper Camplani, M., Paiement, A., Mirmehdi, M., Damen, D., Hannuna, S., Burghardt, T., & Tao, L. (2016). Multiple human tracking in RGB-depth data: A survey. IET Computer Vision, 11(4), 265–285.

This post is based on Prof. Robert’s slides on JSM 2019 and an intuitive blog from Rasmus Bååth.

This note is based on Cai, T. T., Zhang, A., & Zhou, Y. (2019). Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference. ArXiv:1909.09851 [Cs, Math, Stat].

This note is based on Liang, T., & Rakhlin, A. (2018). Just Interpolate: Kernel “Ridgeless” Regression Can Generalize. ArXiv:1808.00387 [Cs, Math, Stat].

This post is based on Wainwright (2019).

This post is based on Slawski, M., Diao, G., & Ben-David, E. (2019). A Pseudo-Likelihood Approach to Linear Regression with Partially Shuffled Data. ArXiv:1910.01623 [Cs, Stat].

I learnt the term **Noise Outsourcing** in kjytay’s blog, which is based on Teh Yee Whye’s IMS Medallion Lecture at JSM 2019.

I came across **isotropic** and **anisotropic** covariance functions in kjytay’s blog, and then I found more materials, chapter 4 from the book Gaussian Processes for Machine Learning, via the reference in StackExchange: What is an isotropic (spherical) covariance matrix?.

This post is based on Delaigle, A., & Hall, P. (2012). Methodology and theory for partial least squares applied to functional data. The Annals of Statistics, 40(1), 322–352.

This post is based on Lin Z†, Zamanighomi M, Daley T, Ma S and Wong WH†: Model-based approach to the joint analysis of single-cell data on chromatin accessibility and gene expression. Statistical Science

This post is based on Guo, Z., Wang, W., Cai, T. T., & Li, H. (2019). Optimal Estimation of Genetic Relatedness in High-Dimensional Linear Models. Journal of the American Statistical Association, 114(525), 358–369.

The post is based on Zhou, H., Hu, L., Zhou, J., & Lange, K. (2019). MM Algorithms for Variance Components Models. Journal of Computational and Graphical Statistics, 28(2), 350–361.

This note is based on Cai, T. T., Wang, Y., & Zhang, L. (2019). The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy. ArXiv:1902.04495 [Cs, Stat].

This post is based on Ray, N., & Acton, S. T. (2002). Active contours for cell tracking. Proceedings Fifth IEEE Southwest Symposium on Image Analysis and Interpretation, 274–278.

I came across the term meta-analysis in the previous post, and I had another question about nominal size while reading the paper of the previous post, which reminds me Keith’s notes. By coincidence, I also find the topic about meta-analysis in the same notes. Hence, this post is mainly based on Keith’s notes, and reproduce the power curves by myself.

The post is based on Jiang, Y., Neyshabur, B., Mobahi, H., Krishnan, D., & Bengio, S. (2019). Fantastic Generalization Measures and Where to Find Them. ArXiv:1912.02178 [Cs, Stat].which was shared by one of my friend in the WeChat Moment, and then I took a quick look.

This post is based on Meinshausen, N. (2006). Quantile Regression Forests. 17. since a coming seminar is related to such topic.

This note is based on the slides of the seminar, Dr. ZHU, Huichen. Conditional Quantile Random Forest.

This post is based on Peter BENTLER’s talk, S.-Y. Lee’s Lagrange Multiplier Test in Structural Modeling: Still Useful? in the International Statistical Conference in Memory of Professor Sik-Yum Lee.

This post is based on the talk given by Yuchao Jiang at the 11th ICSA International Conference on Dec. 20th, 2019.

This post is based on the material of the first lecture of STAT6050 instructed by Prof. Wicker.

This post is based on the talk, given by Timothy I. Cannings at the 11th ICSA International Conference on Dec. 22th, 2019, the corresponding paper is Cannings, T. I., Fan, Y., & Samworth, R. J. (2019). Classification with imperfect training labels. ArXiv:1805.11505 [Math, Stat]

The first two sections are based on a good tutorial on the isotonic regression, and the third section consists of the slides for the talk given by Prof. Cun-Hui Zhang at the 11th ICSA International Conference on Dec. 21st, 2019.

I came across the Bernstein-von Mises theorem in Yuling Yao’s blog, and I also found a quick definition in the blog hosted by Prof. Andrew Gelman, although this one is not by Gelman. By coincidence, the former is the PhD student of the latter!

This post is based on Flury (1984).

This note is based on Lehmann, E. L., & Romano, J. P. (2005). Testing statistical hypotheses (3rd ed). Springer.

This post is based on Shang, H. L. (2014). A survey of functional principal component analysis. AStA Advances in Statistical Analysis, 98(2), 121–142.

This post is based on Hyndman, R. J., & Shahid Ullah, Md. (2007). Robust forecasting of mortality and fertility rates: A functional data approach. Computational Statistics & Data Analysis, 51(10), 4942–4956.

This note is based on the survey paper, Aminikhanghahi, S., & Cook, D. J. (2017). A Survey of Methods for Time Series Change Point Detection. Knowledge and Information Systems, 51(2), 339–367.

This note is for Cuturi, M., Teboul, O., Berthet, Q., Doucet, A., & Vert, J.-P. (2020). Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design.

This post is mainly based on Hastie, T., & Stuetzle, W. (1989). Principal Curves. Journal of the American Statistical Association.

This note is for Besl, P. J., & McKay, N. D. (1992). Method for registration of 3-D shapes. Sensor Fusion IV: Control Paradigms and Data Structures, 1611, 586–606..