| WeiYa's Work Yard

Survival analysis examines and models the time it takes for events to occur. It focuses on the distribution of survival times. There are many well known methods for estimating unconditional survival distribution, and they examines the relationship between survival and one or more predictors, usually terms covariates in the survival-analysis literature. And Cox Proportional-Hazards regression model is one of the most widely used method of survival analysis.

Similarity Network Fusion

December 28, 2022 (Update: December 31, 2022)

This post is for Wang, B., Mezlini, A. M., Demir, F., Fiume, M., Tu, Z., Brudno, M., Haibe-Kains, B., & Goldenberg, A. (2014). Similarity network fusion for aggregating data types on a genomic scale. Nature Methods, 11(3), Article 3. and a related paper Ruan, P., Wang, Y., Shen, R., & Wang, S. (2019). Using association signal annotations to boost similarity network fusion. Bioinformatics, 35(19), 3718–3726.

Rare Variant Association Testing

July 18, 2019 (Update: December 19, 2022)

This note is based on

The General Decision Problem

May 06, 2019 (Update: December 14, 2022)

This note is based on Chapter 1 of Lehmann EL, Romano JP. Testing statistical hypotheses. Springer Science & Business Media; 2006 Mar 30.

Machine Learning for Multi-omics Data

July 15, 2022 (Update: November 17, 2022)

This note is based on Cai, Z., Poulos, R. C., Liu, J., & Zhong, Q. (2022). Machine learning for multi-omics data integration in cancer. IScience, 25(2), 103798.

Differentiable Sorting and Ranking

November 04, 2022 (Update: November 14, 2022)

This note is for Blondel, M., Teboul, O., Berthet, Q., & Djolonga, J. (2020). Fast Differentiable Sorting and Ranking (arXiv:2002.08871). arXiv.

Joint Local False Discovery Rate in GWAS

November 12, 2022 (Update: November 14, 2022)

This note is for Jiang, W., & Yu, W. (2017). Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies. Bioinformatics, 33(4), 500–507.

Generative Bootstrap/Multi-purpose Samplers

September 22, 2021 (Update: October 12, 2022)

This post is based on the first version of Shin, M., Wang, L., & Liu, J. S. (2020). Scalable Uncertainty Quantification via GenerativeBootstrap Sampler., which is lately updated as Shin, M., Wang, S., & Liu, J. S. (2022). Generative Multiple-purpose Sampler for Weighted M-estimation (arXiv:2006.00767; Version 2). arXiv.

High Dimensional Linear Discriminant Analysis

July 15, 2019 (Update: October 09, 2022)

This note is for Cai, T. T., & Zhang, L. (2019). High dimensional linear discriminant analysis: Optimality, adaptive algorithm and missing data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(4), 675–705.

Joint Model of Longitudinal and Survival Data

October 02, 2022 (Update: October 02, 2022)

This post is based on Rizopoulos, D. (2017). An Introduction to the Joint Modeling of Longitudinal and Survival Data, with Applications in R. 235.

Conformal Inference

September 22, 2021 (Update: September 22, 2022) 0 Comments

The note is based on Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J., & Wasserman, L. (2018). Distribution-Free Predictive Inference for Regression. Journal of the American Statistical Association, 113(523), 1094–1111. and Tibshirani, R. J., Candès, E. J., Barber, R. F., & Ramdas, A. (2019). Conformal Prediction Under Covariate Shift. Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2530–2540.

Multicenter IPF-PRO Registry Cohort

August 25, 2022 (Update: August 31, 2022)

This note is for Todd, J. L., Vinisko, R., Liu, Y., Neely, M. L., Overton, R., Flaherty, K. R., Noth, I., Newby, L. K., Lasky, J. A., Olman, M. A., Hesslinger, C., Leonard, T. B., Palmer, S. M., & Belperio, J. A. (2020). Circulating matrix metalloproteinases and tissue metalloproteinase inhibitors in patients with idiopathic pulmonary fibrosis in the multicenter IPF-PRO Registry cohort. BMC Pulmonary Medicine, 20(1), 64.

Robust Registration of 2D and 3D Point Sets

November 05, 2020 (Update: June 14, 2022)

This note is for Fitzgibbon, A. W. (2003). Robust registration of 2D and 3D point sets. Image and Vision Computing, 21(13), 1145–1153.

Test of Monotonicity by Calibrating for Linear Functions

May 11, 2022 (Update: June 14, 2022)

This note is for Hall, P., & Heckman, N. E. (2000). Testing for Monotonicity of a Regression Mean by Calibrating for Linear Functions. The Annals of Statistics, 28(1), 20–39.

Estimation of Location and Scale Parameters of Continuous Density

March 22, 2022 (Update: March 25, 2022)

This note is for Pitman, E. J. G. (1939). The Estimation of the Location and Scale Parameters of a Continuous Population of any Given Form. Biometrika, 30(3/4), 391–421. and Kagan, AM & Rukhin, AL. (1967). On the estimation of a scale parameter. Theory of Probability \& Its Applications, 12, 672–678.

Cross-Validation for High-Dimensional Ridge and Lasso

September 16, 2021 (Update: March 18, 2022) 0 Comments

This note collects several references on the research of cross-validation.

Surrogate Splits in Classification and Regression Trees

January 08, 2020 (Update: January 10, 2022)

This note is for Section 5.3 of Breiman, L. (Ed.). (1998). Classification and regression trees (1. CRC Press repr). Chapman & Hall/CRC.

A pHMM Algorithm for Correcting Long Reads

May 26, 2021 (Update: January 08, 2022) 0 Comments

This note is for Firtina, C., Bar-Joseph, Z., Alkan, C., & Cicek, A. E. (2018). Hercules: A profile HMM-based hybrid error correction algorithm for long reads. Nucleic Acids Research, 46(21), e125.

Infinite Relational Model

November 18, 2021 (Update: December 07, 2021) 0 Comments

This note is based on Kemp, C., Tenenbaum, J. B., Grifﬁths, T. L., Yamada, T., & Ueda, N. (n.d.). Learning Systems of Concepts with an Inﬁnite Relational Model. 8. and Saad, F. A., & Mansinghka, V. K. (2021). Hierarchical Infinite Relational Model. ArXiv:2108.07208 [Cs, Stat].

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

June 24, 2019 (Update: November 30, 2021)

This post is based on Hastie, T., Montanari, A., Rosset, S., & Tibshirani, R. J. (2019). Surprises in High-Dimensional Ridgeless Least Squares Interpolation. 53.

Local Tracklets Filtering and Global Tracklets Association

July 05, 2021 (Update: July 06, 2021) 0 Comments

This note is for Xing, J., Ai, H., & Lao, S. (2009). Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 1200–1207.

Instance Segmentation with Cosine Embeddings

April 25, 2021 (Update: May 24, 2021) 0 Comments

This note is for Payer, C., Štern, D., Neff, T., Bischof, H., & Urschler, M. (2018). Instance Segmentation and Tracking with Cosine Embeddings and Recurrent Hourglass Networks. ArXiv:1806.02070 [Cs].

Illustrate Path Sampling by Stan Programming

March 06, 2019 (Update: May 14, 2021) 0 Comments

This post reviewed the topic of path sampling in the lecture slides of STAT 5020, and noted a general path sampling described by Gelman and Meng (1998), then used a toy example to illustrate it with Stan programming language.

Bootstrap Hypothesis Testing

March 03, 2019 (Update: April 12, 2021) 0 Comments

This report is motivated by comments under Larry’s post, Modern Two-Sample Tests.

Monetone B-spline Smoothing

March 09, 2021 (Update: March 12, 2021) 0 Comments

This note is based on He, X., & Shi, P. (1998). Monotone B-Spline Smoothing. Journal of the American Statistical Association, 93(442), 643–650., and the reproduced simulations are based on the updated algorithm, Ng, P., & Maechler, M. (2007). A fast and efficient implementation of qualitatively constrained quantile smoothing splines. Statistical Modelling, 7(4), 315–328.

Principal Curves

September 28, 2020 (Update: January 21, 2021)

This post is mainly based on Hastie, T., & Stuetzle, W. (1989). Principal Curves. Journal of the American Statistical Association.

Metabolic Network and Their Evolution

December 31, 2018 (Update: December 24, 2020)

The note is for Wagner, A. (2012). Metabolic Networks and Their Evolution. In O. S. Soyer (Ed.), Evolutionary Systems Biology (Vol. 751, pp. 29–52). Springer New York.

Sequence Alignment in EHR

November 12, 2020 (Update: November 13, 2020)

This note is for Huang, M., Shah, N. D., & Yao, L. (2019). Evaluating global and local sequence alignment methods for comparing patient medical records. BMC Medical Informatics and Decision Making, 19(6), 263.

Efficient ICP Variants

November 07, 2020 (Update: November 08, 2020)

This note is for Rusinkiewicz, S., & Levoy, M. (2001). Efficient variants of the ICP algorithm. Proceedings Third International Conference on 3-D Digital Imaging and Modeling, 145–152..

Particle Tracking as Linear Assignment Problem

September 24, 2020 (Update: September 28, 2020)

This post is based on Jaqaman, K., Loerke, D., Mettlen, M., Kuwata, H., Grinstein, S., Schmid, S. L., & Danuser, G. (2008). Robust single-particle tracking in live-cell time-lapse sequences. Nature Methods, 5(8), 695–702.

Eleven Challengs in Single Cell Data Science

June 08, 2020 (Update: June 09, 2020)

This note is for Lähnemann, D., Köster, J., Szczurek, E., McCarthy, D. J., Hicks, S. C., Robinson, M. D., Vallejos, C. A., Campbell, K. R., Beerenwinkel, N., Mahfouz, A., Pinello, L., Skums, P., Stamatakis, A., Attolini, C. S.-O., Aparicio, S., Baaijens, J., Balvert, M., Barbanson, B. de, Cappuccio, A., … Schönhuth, A. (2020). Eleven grand challenges in single-cell data science. Genome Biology, 21(1), 31.

CFPCA for Human Movement Data

April 26, 2020 (Update: April 30, 2020)

This post is based on Coffey, N., Harrison, A. J., Donoghue, O. A., & Hayes, K. (2011). Common functional principal components analysis: A new approach to analyzing human movement data. Human Movement Science, 30(6), 1144–1166.

Jackknife and Mutual Information

January 07, 2019 (Update: April 21, 2020) 0 Comments

In this note, the material about Jackknife is based on Wasserman (2006) and Efron and Hastie (2016), while the Jackknife estimation of Mutual Information is based on Zeng et al. (2018).

Common Functional Principal Components

February 29, 2020 (Update: March 29, 2020)

This post is based on Benko, M., Härdle, W., & Kneip, A. (2009). Common functional principal components. The Annals of Statistics, 37(1), 1–34.

Equicorrelation Matrix

February 22, 2020 (Update: March 16, 2020)

kjytay’s blog summarizes some properties of equicorrelation matix, which has the following form,

Exponential Twisting in Importance Sampling

September 18, 2019 (Update: March 01, 2020)

This note is based on Ma, J., Du, K., & Gu, G. (2019). An efficient exponential twisting importance sampling technique for pricing financial derivatives. Communications in Statistics - Theory and Methods, 48(2), 203–219.

Generalized Matrix Decomposition

January 17, 2020 (Update: February 15, 2020)

This post is based on the talk given by Dr. Yue Wang at the Department of Statistics and Data Science, Southern University of Science and Technology on Jan. 04, 2020.

Statistical Inference with Unnormalized Models

February 10, 2020 (Update: February 15, 2020)

This post is based on the talk given by T. Kanamori at the 11th ICSA International Conference on Dec. 22nd, 2019.

Tweedie's Formula and Selection Bias

March 11, 2019 (Update: January 31, 2020)

Prof. Inchi HU will give a talk on Large Scale Inference for Chi-squared Data tomorrow, which proposes the Tweedie’s formula in the Bayesian hierarchical model for chi-squared data, and he mentioned a thought-provoking paper, Efron, B. (2011). Tweedie’s Formula and Selection Bias. Journal of the American Statistical Association, 106(496), 1602–1614., which is the focus of this note.

Gradient-based Sparse Principal Component Analysis

January 05, 2020 (Update: January 30, 2020)

This post is based on the talk, Gradient-based Sparse Principal Component Analysis, given by Dr. Yixuan Qiu at the Department of Statistics and Data Science, Southern University of Science and Technology on Jan. 05, 2020.

Quantitative Genetics

December 21, 2019 (Update: January 30, 2020)

This post is based on the Pao-Lu Hsu Award Lecture given by Prof. Hongyu Zhao at the 11th ICSA International Conference on Dec. 21th, 2019.

Registration Problem in Functional Data Analysis

January 21, 2020 (Update: January 29, 2020)

This post is based on the seminar, Data Acquisition, Registration and Modelling for Multi-dimensional Functional Data, given by Prof. Shi.

Rademacher Complexity

January 16, 2020 (Update: January 17, 2020)

This post is based on the material of the second lecture of STAT 6050 instructed by Prof. Wicker, and mainly refer some more formally description from the book, Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar - Foundations of Machine Learning-The MIT Press (2012).

CEASE

December 20, 2019 (Update: January 16, 2020)

This post is based on the Peter Hall Lecture given by Prof. Jianqing Fan at the 11th ICSA International Conference on Dec. 20th, 2019.

Theoretical Results of Lasso

March 26, 2019 (Update: January 16, 2020)

Prof. Jon A. WELLNER introduced the application of a new multiplier inequality on lasso in the distinguish lecture, which reminds me that it is necessary to read more theoretical results of lasso, and so this is the post, which is based on Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical Learning with Sparsity. 362.

NGS for NGS

January 11, 2020 (Update: January 15, 2020)

This post is based on the talk, Next-Generation Statistical Methods for Association Analysis of Now-Generation Sequencing Studies, given by Dr. Xiang Zhan at the Department of Statistics and Data Science, Southern University of Science and Technology on Jan. 05, 2020.

Group Inference in High Dimensions

December 17, 2019 (Update: January 02, 2020)

This post is based on the slides for the talk given by Zijian Guo at The International Statistical Conference In Memory of Professor Sik-Yum Lee

Gibbs Sampler for Finding Motif

December 10, 2018 (Update: December 25, 2019)

This post is the online version of my report for the Project 2 of STAT 5050 taught by Prof. Wei.

A Stochastic Model for Evolution of Metabolic Network

August 07, 2018 (Update: December 05, 2019)

This post is the notes for Mithani et al. (2009).

Controlling bias and inflation in EWAS/TWAS

December 04, 2019 (Update: December 04, 2019)

The post is based on the BIOS Consortium, van Iterson, M., van Zwet, E. W., & Heijmans, B. T. (2017). Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution. Genome Biology, 18(1), 19.

Multivariate Mediation Effects

November 04, 2019 (Update: December 03, 2019)

This note is based on Huang, Y.-T. (2019). Variance component tests of multivariate mediation effects under composite null hypotheses. Biometrics, 0(0).

Union-intersection tests and Intersection-union tests

December 02, 2019 (Update: December 03, 2019)

This post is based on section 8.3 of Casella and Berger (2001).

Generalized Functional Linear Models with Semiparametric Single-index Interactions

October 29, 2019 (Update: November 27, 2019)

This post is based on Li, Y., Wang, N., & Carroll, R. J. (2010). Generalized Functional Linear Models With Semiparametric Single-Index Interactions. Journal of the American Statistical Association, 105(490), 621–633.

Gaussian DAGs on Network Data

November 19, 2019 (Update: November 25, 2019)

This post is based on Li, H., & Zhou, Q. (2019). Gaussian DAGs on network data. ArXiv:1905.10848 [Cs, Stat].

Optimal estimation of functionals of high-dimensional mean and covariance matrix

August 26, 2019 (Update: November 03, 2019)

This post is based on Fan, J., Weng, H., & Zhou, Y. (2019). Optimal estimation of functionals of high-dimensional mean and covariance matrix. ArXiv:1908.07460 [Math, Stat].

SIR and Its Implementation

January 05, 2019 (Update: November 01, 2019) 0 Comments

Link-free v.s. Semiparametric

January 08, 2019 (Update: November 01, 2019)

This note is based on Li (1991) and Ma and Zhu (2012).

Sparse LDA

September 17, 2019 (Update: October 10, 2019)

This note is based on Shao, J., Wang, Y., Deng, X., & Wang, S. (2011). Sparse linear discriminant analysis by thresholding for high dimensional data. The Annals of Statistics, 39(2), 1241–1265.

Feature Annealed Independent Rules

September 17, 2019 (Update: September 20, 2019) 0 Comments

This note is based on Fan, J., & Fan, Y. (2008). High-dimensional classification using features annealed independence rules. The Annals of Statistics, 36(6), 2605–2637.

Dantzig Selector

August 16, 2019 (Update: September 13, 2019)

This post is based on Candes, E., & Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. The Annals of Statistics, 35(6), 2313–2351.

MLE for MTP2

July 05, 2019 (Update: July 25, 2019)

This post is based on Lauritzen, S., Uhler, C., & Zwiernik, P. (2019). Maximum likelihood estimation in Gaussian models under total positivity. The Annals of Statistics, 47(4), 1835–1863.

TreeClone

July 08, 2019 (Update: July 13, 2019)

This note is based on Zhou, T., Sengupta, S., Müller, P., & Ji, Y. (2019). TreeClone: Reconstruction of tumor subclone phylogeny based on mutation pairs using next generation sequencing data. The Annals of Applied Statistics, 13(2), 874–899.

Minimax Lower Bounds

June 28, 2019 (Update: July 12, 2019)

This note is based on Chapter 15 of Wainwright, M. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint (Cambridge Series in Statistical and Probabilistic Mathematics). Cambridge: Cambridge University Press.

Change Points

May 28, 2019 (Update: June 23, 2019)

Fourier Series

May 07, 2019 (Update: June 23, 2019)

M-estimator

May 09, 2019 (Update: June 23, 2019)

Particle Filtering and Smoothing

January 18, 2019 (Update: April 09, 2019) 0 Comments

This note is for Doucet, A., & Johansen, A. M. (2009). A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of Nonlinear Filtering, 12(656–704), 3. For the sake of clarity, I split the general SMC methods (section 3) into my next post.

Generalized Gradient Descent

March 20, 2019 (Update: April 08, 2019)

I read the topic in kiytay’s blog: Proximal operators and generalized gradient descent, and then read its reference, Hastie et al. (2015), and write some program to get a better understanding.

Multiple Object Tracking

March 26, 2019 (Update: March 28, 2019)

This note is for Luo, W., Xing, J., Milan, A., Zhang, X., Liu, W., Zhao, X., & Kim, T.-K. (2014). Multiple Object Tracking: A Literature Review. ArXiv:1409.7618 [Cs].

The Gibbs Sampler

June 04, 2017 (Update: March 12, 2019) 0 Comments

Gibbs sampler is an iterative algorithm that constructs a dependent sequence of parameter values whose distribution converges to the target joint posterior distribution.

Tensor Completion

March 07, 2019 (Update: March 12, 2019)

Prof. YUAN Ming will give a distinguish lecture on Low Rank Tensor Methods in High Dimensional Data Analysis. To get familiar with his work on tensor, I read his paper, Yuan, M., & Zhang, C.-H. (2016). On Tensor Completion via Nuclear Norm Minimization. Foundations of Computational Mathematics, 16(4), 1031–1068., which is the topic of this post.

SMC for Protein Folding Problem

February 23, 2019 (Update: March 09, 2019)

This note is based on Wong, S. W. K., Liu, J. S., & Kou, S. C. (2018). Exploring the conformational space for protein folding with sequential Monte Carlo. The Annals of Applied Statistics, 12(3), 1628–1654.

Select Prior by Formal Rules

March 04, 2019 (Update: March 05, 2019)

Larry wrote that “Noninformative priors are a lost cause” in his post, LOST CAUSES IN STATISTICS II: Noninformative Priors, and he mentioned his review paper Kass and Wasserman (1996) on noninformative priors. This note is for this paper.

Bio-chemical Reaction Networks

February 25, 2019 (Update: February 27, 2019)

This note is based on Loskot, P., Atitey, K., & Mihaylova, L. (2019). Comprehensive review of models and methods for inferences in bio-chemical reaction networks.

An Illustration of Importance Sampling

July 16, 2017 (Update: January 31, 2019) 0 Comments

This report shows how to use importance sampling to estimate the expectation.

Sequential Monte Carlo Methods

June 10, 2017 (Update: January 31, 2019) 0 Comments

The first peep to SMC as an abecedarian, a more comprehensive note can be found here.

Chain-Structured Models

September 08, 2017 (Update: January 30, 2019) 0 Comments

There is an important probability distribution used in many applications, the chain-structured model.

The Applications of Monte Carlo

September 07, 2017 (Update: January 30, 2019) 0 Comments

Growing A Polymer

July 17, 2017 (Update: January 30, 2019) 0 Comments

This report implements the simulation of growing a polymer under the self-avoid walk model, and summary the sequential importance sampling techniques for this problem.

Genetic network inference

March 14, 2017 0 Comments

There are my notes when I read the paper called Genetic network inference.

Systems Genetic Approach

March 16, 2017 0 Comments

There are my notes when I read the paper called System Genetic Approach.

MICA

March 17, 2017 0 Comments

There are my notes when I read the paper called Maximal information component analysis.

MINE

March 17, 2017 0 Comments

There are my notes when I read the paper called Detecting Novel Associations in Large Data Sets.

Implement of MINE

March 17, 2017 0 Comments

This is the implement in R of MINE.

Ensemble Learning

May 17, 2017 0 Comments

Illustrations of Support Vector Machines

May 18, 2017 0 Comments

Use the e1071 library in R to demonstrate the support vector classifier and the SVM.

One Parameter Models

June 04, 2017 0 Comments

The Normal Model

June 05, 2017 0 Comments

Sequential Monte Carlo samplers

June 11, 2017 0 Comments

This note is for Moral, P. D., Doucet, A., & Jasra, A. (2006). Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3), 411–436.

SMC for Mixture Distribution

June 11, 2017

ARIMA

July 11, 2017 0 Comments

Any time series without a constant mean over time is nonstationary.

Adaptive Importance Sampling

July 16, 2017 0 Comments

Model Specification

July 17, 2017 0 Comments

For a given time series, how to choose appropriate values for $p, d, q$

A Bayesian Missing Data Problem

July 18, 2017 0 Comments

Metropolis Algorithm

July 21, 2017 0 Comments

Monte Carlo plays a key role in evaluating integrals and simulating stochastic systems, and the most critical step of Monte Carlo algorithm is sampling from an appropriate probability distribution $\pi (\mathbf x)$. There are two ways to solve this problem, one is to do importance sampling, another is to produce statistically dependent samples based on the idea of Markov chain Monte Carlo sampling.

SMC in Biological Problems

July 22, 2017 0 Comments

Estimate Parameters in Logistic Regression

July 30, 2017 0 Comments

Poisson Regression

July 31, 2017 0 Comments

Story about P value

August 09, 2017 0 Comments

“The p value was never meant to be used the way it’s used today.” –Goodman

Conjugate Gradient for Regression

August 13, 2017 0 Comments

The conjugate gradient method is an iterative method for solving a linear system of equations, so we can use conjugate method to estimate the parameters in (linear/ridge) regression.

Restricted Boltzmann Machines

August 26, 2017 0 Comments

Dynamics of Helicobacter pylori colonization

August 31, 2017 0 Comments

This post is the notes of this paper.

Healthy Human Microbiome

September 01, 2017 0 Comments

This post is for The Human Microbiome Project Consortium, Huttenhower, C., Gevers, D., Knight, R., Abubucker, S., Badger, J. H., … White, O. (2012). Structure, function and diversity of the healthy human microbiome. Nature, 486(7402), 207–214.

Dynamics of Helicobacter pylori Infection

September 01, 2017 0 Comments

The note is for Kirschner, D. E., & Blaser, M. J. (1995). The dynamics of helicobacter pylori infection of the human stomach. Journal of Theoretical Biology, 176(2), 281–290.

Basic Principles of Monte Carlo

September 07, 2017 0 Comments

Persistence of species in the face of environmental stochasticity

September 18, 2017 0 Comments

Sebastian Schreiber gave a talk titled Persistence of species in the face of environmental stochasticity.

A Faster Algorithm for Repeated Linear Regression

September 21, 2017 0 Comments

Repeated Linear Regression means that repeat the fitting of linear regression for many times, and there are some common parts among these regressions.

An R Package: Fit Repeated Linear Regressions

September 26, 2017 0 Comments

Repeated Linear Regressions refer to a set of linear regressions in which there are several same variables.

Stochastic Epidemic Models

October 11, 2017 0 Comments

Discuss three different methods for formulating stochastic epidemic models.

Essentials of Survival Time Analysis

October 11, 2017 0 Comments

This post aims to clarify the relationship between rates and probabilities.

Model-Free Scoring System for Risk Prediction

October 17, 2017 0 Comments

Power Analysis

December 27, 2017 0 Comments

ECOC

August 18, 2018

The note is for Dietterich, T. and Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes, Journal of Artificial Intelligence Research 2: 263–286..

Gibbs in genetics

August 24, 2018

The note is for Gilks, W. R., Richardson, S., & Spiegelhalter, D. (Eds.). (1995). Markov chain Monte Carlo in practice. CRC press..

Evolutionary Systems Biology

December 30, 2018

The note is for Chapter 1 of Soyer, Orkun S., ed. 2012 Evolutionary Systems Biology. Advances in Experimental Medicine and Biology, 751. New York: Springer.

Small World inside Large Metabolic Networks

January 02, 2019

The note is for Wagner, A., & Fell, D. A. (2001). The small world inside large metabolic networks. Proceedings of the Royal Society of London B: Biological Sciences, 268(1478), 1803-1810..

Counting Process Based Dimension Reduction Methods for Censored Data

January 06, 2019

The note is for Sun, Q., Zhu, R., Wang, T., & Zeng, D. (2017). Counting Process Based Dimension Reduction Methods for Censored Outcomes. ArXiv:1704.05046 [Stat].

Reconstruct Gaussian DAG

January 09, 2019

This note is based on Yuan, Y., Shen, X., Pan, W., & Wang, Z. (2019). Constrained likelihood for reconstructing a directed acyclic Gaussian graph. Biometrika, 106(1), 109–125.

Reversible jump Markov chain Monte Carlo

January 10, 2019

The note is for Green, P.J. (1995). “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination”. Biometrika. 82 (4): 711–732.

Approximate $\ell_0$-penalized piecewise-constant estimate of graphs

January 13, 2019

This note is for Fan, Z., & Guan, L. (2018). Approximate $\ell_{0}$-penalized estimation of piecewise-constant signals on graphs. The Annals of Statistics, 46(6B), 3217–3245.

PLS in High-Dimensional Regression

January 15, 2019

This note is based on Cook, R. D., & Forzani, L. (2019). Partial least squares prediction in high-dimensional regression. The Annals of Statistics, 47(2), 884–908.

Sequential Monte Carlo Methods

January 19, 2019

This note is for Section 3 of Doucet, A., & Johansen, A. M. (2009). A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of Nonlinear Filtering, 12(656–704), 3., and it is the complement of my previous post.

The Kalman Filter and Extended Kalman Filter

January 21, 2019

Annealed SMC for Bayesian Phylogenetics

January 24, 2019

This note is for Wang, L., Wang, S., & Bouchard-Côté, A. (2018). An Annealed Sequential Monte Carlo Method for Bayesian Phylogenetics. ArXiv:1806.08813 [q-Bio, Stat].

Annealed Importance Sampling

January 28, 2019

This is the note for Neal, R. M. (1998). Annealed Importance Sampling. ArXiv:Physics/9803008.

Calculating Marginal likelihood

January 30, 2019

The note is for Fourment, M., Magee, A. F., Whidden, C., Bilge, A., Matsen IV, F. A., & Minin, V. N. (2018). 19 dubious ways to compute the marginal likelihood of a phylogenetic tree topology.

The First Glimpse into Pseudolikelihood

February 12, 2019

This post caught a glimpse of the pseudolikelihood.

Comparisons of Three Likelihood Criteria

February 12, 2019

The note is for Nelder, J. A., & Lee, Y. (1992). Likelihood, Quasi-Likelihood and Pseudolikelihood: Some Comparisons. Journal of the Royal Statistical Society. Series B (Methodological), 54(1), 273–284.

Identification of PE Genes in Cell Cycle

February 13, 2019

This note is based on Fan, X., Pyne, S., & Liu, J. S. (2010). Bayesian meta-analysis for identifying periodically expressed genes in fission yeast cell cycle. The Annals of Applied Statistics, 4(2), 988–1013.

Gibbs Sampling for the Multivariate Normal

February 13, 2019

This note is based on Chapter 7 of Hoff PD. A first course in Bayesian statistical methods. Springer Science & Business Media; 2009 Jun 2.

Review of Composite Likelihood

February 13, 2019

This note is based on Varin, C., Reid, N., & Firth, D. (2011). AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS. Statistica Sinica, 21(1), 5–42., a survey of recent developments in the theory and application of composite likelihood.

Studentized U-statistics

February 15, 2019 0 Comments

In Prof. Shao’s wonderful talk, Wandering around the Asymptotic Theory, he mentioned the Studentized U-statistics. I am interested in the derivation of the variances in the denominator.

Deep Learning

February 16, 2019

This note is based on LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

A Bayesian Perspective of Deep Learning

February 17, 2019

This note is for Polson, N. G., & Sokolov, V. (2017). Deep Learning: A Bayesian Perspective. Bayesian Analysis, 12(4), 1275–1304.

Presistency

February 18, 2019

The paper, Greenshtein and Ritov (2004), is recommended by Larry Wasserman in his post Consistency, Sparsistency and Presistency.

Restricted Isometry Property

February 19, 2019

I encounter the term RIP in Larry Wasserman’s post, RIP RIP (Restricted Isometry Property, Rest In Peace), and also find some material in Hastie et al.’s book: Statistical Learning with Sparsity about RIP.

Continuous Time Markov Chain

February 20, 2019

This note is based on Karl Sigman’s IEOR 6711: Continuous-Time Markov Chains.

Stein's Paradox

February 21, 2019

I learned Stein’s Paradox from Larry Wasserman’s post, STEIN’S PARADOX, perhaps I had encountered this term before but I cannot recall anything about it. (~~I am guilty~~)

Evaluate Variational Inference

March 07, 2019

A brief summary of the post, Eid ma clack shaw zupoven del ba.

Bernstein Bounds

March 08, 2019

I noticed that the papers of matrix/tensor completion always talk about the Bernstein inequality, then I picked the Bernstein Bounds discussed in Wainwright (2019).

The Correlated Topic Model

March 12, 2019

This note is for Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of Science. The Annals of Applied Statistics, 1(1), 17–35.

Distributed inference for quantile regression processes

March 13, 2019

This note is for Volgushev, S., Chao, S.-K., & Cheng, G. (2019). Distributed inference for quantile regression processes. The Annals of Statistics, 47(3), 1634–1662.

Functional Data Analysis

March 14, 2019

Functional Data Analysis by Matrix Completion

March 15, 2019

High Dimensional Covariance Matrix Estimation

March 19, 2019

Convergence rates of least squares

March 25, 2019

This note is for Han, Q., & Wellner, J. A. (2017). Convergence rates of least squares regression estimators with heavy-tailed errors.

Joint Summarized by Marginal or Conditional?

March 25, 2019

I happened to read Yixuan’s blog about a question related to the course Statistical Inference, whether two marginal distributions can determine the joint distribution. The question is adopted from Exercise 4.47 of Casella and Berger (2002).

FARM-Test

March 29, 2019

This note is for Fan, J., Ke, Y., Sun, Q., & Zhou, W.-X. (2017). FarmTest: Factor-Adjusted Robust Multiple Testing with Approximate False Discovery Control. ArXiv:1711.05386 [Stat]..

Frequentist Accuracy of Bayesian Estimates

March 31, 2019

This note is for Efron’s slide: Frequentist Accuracy of Bayesian Estimates, which is recommended by Larry’s post: Shaking the Bayesian Machine.

Soft Imputation in Matrix Completion

April 01, 2019

This post is based on Chapter 7 of Statistical Learning with Sparsity: The Lasso and Generalizations, and I wrote R program to reproduce the simulations to get a better understanding.

Coupled Minimum-Cost Flow Cell Tracking

April 02, 2019

This note is for Padfield, D., Rittscher, J., & Roysam, B. (2011). Coupled minimum-cost flow cell tracking for high-throughput quantitative analysis. Medical Image Analysis, 15(4), 650–668..

Wierd Things in Mixture Models

April 04, 2019

This note is based on Larry’s post, Mixture Models: The Twilight Zone of Statistics.

Subgradient

April 08, 2019

This post is mainly based on Hastie et al. (2015), and incorporated with some materials from Watson (1992).

Tracking Multiple Interacting Targets via MCMC-MRF

April 09, 2019

This note is for Khan, Z., Balch, T., & Dellaert, F. (2004). An MCMC-Based Particle Filter for Tracking Multiple Interacting Targets. In T. Pajdla & J. Matas (Eds.), Computer Vision - ECCV 2004 (pp. 279–290). Springer Berlin Heidelberg.

Methods for Cell Tracking

April 09, 2019

This post is for the survey paper, Meijering, E., Dzyubachyk, O., & Smal, I. (2012). Chapter nine - Methods for Cell and Particle Tracking. In P. M. conn (Ed.), Methods in Enzymology (pp. 183–200).

Normalizing Constant

April 10, 2019

Larry discussed the normalizing constant paradox in his blog.

Multiple Tracking with Rao-Blackwellized marginal particle filtering

April 10, 2019

This note is for Smal, I., Meijering, E., Draegestein, K., Galjart, N., Grigoriev, I., Akhmanova, A., … Niessen, W. (2008). Multiple object tracking in molecular bioimaging by Rao-Blackwellized marginal particle filtering. Medical Image Analysis, 12(6), 764–777.

Statistical Inference for Lasso

April 15, 2019

This note is based on the Chapter 6 of Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical Learning with Sparsity. 362..

Least Squares for SIMs

April 15, 2019

In the last lecture of STAT 5030, Prof. Lin shared one of the results in the paper, Neykov, M., Liu, J. S., & Cai, T. (2016). L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs. Journal of Machine Learning Research, 17(87), 1–37., or say the start point for the paper—the following Lemma. Because it seems that the condition and the conclusion is completely same with Sliced Inverse Regression, except for a direct interpretation—the least square regression.

Identifiability and Estimability

April 20, 2019

Materials from STAT 5030.

Self-normalized Limit Theory and Stein's Method

May 01, 2019

This note consists of the lecture material of STAT 6060 taught by Prof. Shao, four homework (indexed by “Homework”) and several personal comments (indexed by “Note”).

Medicine Meets AI

June 23, 2019

Last two days, I attended the conference Medicine Meets AI 2019: East Meets West, which help me know more AI from the industrial and medical perspective.

Bayesian Conjugate Gradient Method

June 27, 2019

This note is for Cockayne, J., Oates, C. J., Ipsen, I. C. F., & Girolami, M. (2018). A Bayesian Conjugate Gradient Method. Bayesian Analysis.

Global data association for MOT using network flows

July 10, 2019

This note is based on Li Zhang, Yuan Li, & Nevatia, R. (2008). Global data association for multi-object tracking using network flows. 2008 IEEE Conference on Computer Vision and Pattern Recognition, 1–8.

Canonical Variate Analysis

July 16, 2019

This note is based on Campbell, N. A. (1979). CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS. 243.

SMC-PHD Filter

July 17, 2019

This post is based on Ristic, B., Clark, D., & Vo, B. (2010). Improved SMC implementation of the PHD filter. 2010 13th International Conference on Information Fusion, 1–8.

Multi-estimate extraction for SMC-PHD

July 17, 2019

This post is based on Li, T., Corchado, J. M., Sun, S., & Fan, H. (2017). Multi-EAP: Extended EAP for multi-estimate extraction for SMC-PHD filter. Chinese Journal of Aeronautics, 30(1), 368–379.

A Optimal Control Approach for Deep Learning

July 19, 2019

This note is based on Li, Q., & Hao, S. (2018). An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks. ArXiv:1803.01299 [Cs].

High-dimensional linear mixed-effect model

July 21, 2019

This post is based on Li, S., Cai, T. T., & Li, H. (2019). Inference for high-dimensional linear mixed-effects models: A quasi-likelihood approach. ArXiv:1907.06116 [Stat].

An Adaptive Algorithm for online FDR

July 21, 2019

This post is based on Ramdas, A., Zrnic, T., Wainwright, M., & Jordan, M. (2018). SAFFRON: An adaptive algorithm for online control of the false discovery rate. ArXiv:1802.09098 [Cs, Math, Stat].

The Simplex Method

July 23, 2019

This note is based on Chapter 13 of Nocedal, J., & Wright, S. (2006). Numerical optimization. Springer Science & Business Media.

Reluctant Interaction Modeling

July 23, 2019

This note is based on Yu, G., Bien, J., & Tibshirani, R. (2019). Reluctant Interaction Modeling. ArXiv:1907.08414 [Stat].

Additive Bayesian Variable Selection

August 05, 2019

This post is based on Rossell, D., & Rubio, F. J. (2019). Additive Bayesian variable selection under censoring and misspecification. ArXiv:1907.13563 [Math, Stat].

Interior-point Method

August 16, 2019

Nocedal and Wright (2006) and Boyd and Vandenberghe (2004) present slightly different introduction on Interior-point method. More specifically, the former one only considers equality constraints, while the latter incorporates the inequality constraints.

Debiased Lasso

September 08, 2019

This post is based on Section 6.4 of Hastie, Trevor, Robert Tibshirani, and Martin Wainwright. “Statistical Learning with Sparsity,” 2016, 362.

Likelihood-free inference by ratio estimation

September 09, 2019 0 Comments

This note is for Thomas, O., Dutta, R., Corander, J., Kaski, S., & Gutmann, M. U. (2016). Likelihood-free inference by ratio estimation. ArXiv:1611.10242 [Stat]., and I got this paper from Xi’an’s blog.

Basic of $B$-splines

September 09, 2019 0 Comments

This note is based on de Boor, C. (1978). A Practical Guide to Splines, Springer, New York.

Functional PCA

September 20, 2019

This post is based on Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis (Second edition). New York, NY: Springer.

Multiple human tracking with RGB-D data

September 20, 2019

This note is based on the survey paper Camplani, M., Paiement, A., Mirmehdi, M., Damen, D., Hannuna, S., Burghardt, T., & Tao, L. (2016). Multiple human tracking in RGB-depth data: A survey. IET Computer Vision, 11(4), 265–285.

ABC for Socks

September 24, 2019 0 Comments

This post is based on Prof. Robert’s slides on JSM 2019 and an intuitive blog from Rasmus Bååth.

Optimality for Sparse Group Lasso

September 29, 2019

This note is based on Cai, T. T., Zhang, A., & Zhou, Y. (2019). Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference. ArXiv:1909.09851 [Cs, Math, Stat].

Kernel Ridgeless Regression Can Generalize

September 30, 2019

This note is based on Liang, T., & Rakhlin, A. (2018). Just Interpolate: Kernel “Ridgeless” Regression Can Generalize. ArXiv:1808.00387 [Cs, Math, Stat].

Sub Gaussian

October 05, 2019

This post is based on Wainwright (2019).

Linear Regression with Partially Shuffled Data

October 08, 2019

This post is based on Slawski, M., Diao, G., & Ben-David, E. (2019). A Pseudo-Likelihood Approach to Linear Regression with Partially Shuffled Data. ArXiv:1910.01623 [Cs, Stat].

Noise Outsourcing

October 10, 2019

I learnt the term Noise Outsourcing in kjytay’s blog, which is based on Teh Yee Whye’s IMS Medallion Lecture at JSM 2019.

Isotropic vs. Anisotropic

October 24, 2019

I came across isotropic and anisotropic covariance functions in kjytay’s blog, and then I found more materials, chapter 4 from the book Gaussian Processes for Machine Learning, via the reference in StackExchange: What is an isotropic (spherical) covariance matrix?.

Partial Least Squares for Functional Data

October 31, 2019

This post is based on Delaigle, A., & Hall, P. (2012). Methodology and theory for partial least squares applied to functional data. The Annals of Statistics, 40(1), 322–352.

Model-based Approach for Joint Analysis of Single-cell data

October 31, 2019

This post is based on Lin, Z., Zamanighomi, M., Daley, T., Ma, S., & Wong, W. H. (2020). Model-Based Approach to the Joint Analysis of Single-cell Data on Chromatin Accessibility and Gene Expression. Statistical Science, 35(1), 2–13.

Genetic Relatedness in High-Dimensional Linear Models

October 31, 2019

This post is based on Guo, Z., Wang, W., Cai, T. T., & Li, H. (2019). Optimal Estimation of Genetic Relatedness in High-Dimensional Linear Models. Journal of the American Statistical Association, 114(525), 358–369.

The Cost of Privacy

November 01, 2019

This note is based on Cai, T. T., Wang, Y., & Zhang, L. (2019). The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy. ArXiv:1902.04495 [Cs, Stat].

Active Contours

November 12, 2019

This post is based on Ray, N., & Acton, S. T. (2002). Active contours for cell tracking. Proceedings Fifth IEEE Southwest Symposium on Image Analysis and Interpretation, 274–278.

Combining $p$-values in Meta Analysis

December 04, 2019

I came across the term meta-analysis in the previous post, and I had another question about nominal size while reading the paper of the previous post, which reminds me Keith’s notes. By coincidence, I also find the topic about meta-analysis in the same notes. Hence, this post is mainly based on Keith’s notes, and reproduce the power curves by myself.

Fantastic Generalization Measures and Where to Find Them

December 06, 2019

The post is based on Jiang, Y., Neyshabur, B., Mobahi, H., Krishnan, D., & Bengio, S. (2019). Fantastic Generalization Measures and Where to Find Them. ArXiv:1912.02178 [Cs, Stat].which was shared by one of my friend in the WeChat Moment, and then I took a quick look.

Quantile Regression Forests

December 10, 2019

This post is based on Meinshausen, N. (2006). Quantile Regression Forests. 17. since a coming seminar is related to such topic.

Conditional Quantile Regression Forests

December 12, 2019

This note is based on the slides of the seminar, Dr. ZHU, Huichen. Conditional Quantile Random Forest.

Lagrange Multiplier Test

December 17, 2019

This post is based on Peter BENTLER’s talk, S.-Y. Lee’s Lagrange Multiplier Test in Structural Modeling: Still Useful? in the International Statistical Conference in Memory of Professor Sik-Yum Lee.

DNA copy number profiling: from bulk tissue to single cells

January 02, 2020

This post is based on the talk given by Yuchao Jiang at the 11th ICSA International Conference on Dec. 20th, 2019.

Concentration Inequality for Machine Learning

January 09, 2020

This post is based on the material of the first lecture of STAT6050 instructed by Prof. Wicker.

Classification with Imperfect Training Labels

January 15, 2020

This post is based on the talk, given by Timothy I. Cannings at the 11th ICSA International Conference on Dec. 22th, 2019, the corresponding paper is Cannings, T. I., Fan, Y., & Samworth, R. J. (2019). Classification with imperfect training labels. ArXiv:1805.11505 [Math, Stat]

Multiple Isotonic Regression

February 20, 2020

The first two sections are based on a good tutorial on the isotonic regression, and the third section consists of the slides for the talk given by Prof. Cun-Hui Zhang at the 11th ICSA International Conference on Dec. 21st, 2019.

Bernstein-von Mises Theorem

February 24, 2020

I came across the Bernstein-von Mises theorem in Yuling Yao’s blog, and I also found a quick definition in the blog hosted by Prof. Andrew Gelman, although this one is not by Gelman. By coincidence, the former is the PhD student of the latter!