WeiYa's Work Yard

Illustrate Path Sampling by Stan Programming

March 06, 2019 (Update: May 14, 2021) 0 Comments

This post reviewed the topic of path sampling in the lecture slides of STAT 5020, and noted a general path sampling described by Gelman and Meng (1998), then used a toy example to illustrate it with Stan programming language.

Bootstrap Hypothesis Testing

March 03, 2019 (Update: April 12, 2021) 0 Comments

This report is motivated by comments under Larry’s post, Modern Two-Sample Tests.

Monetone B-spline Smoothing

March 09, 2021 (Update: March 12, 2021) 0 Comments

This note is based on He, X., & Shi, P. (1998). Monotone B-Spline Smoothing. Journal of the American Statistical Association, 93(442), 643–650., and the reproduced simulations are based on the updated algorithm, Ng, P., & Maechler, M. (2007). A fast and efficient implementation of qualitatively constrained quantile smoothing splines. Statistical Modelling, 7(4), 315–328.

Principal Curves

September 28, 2020 (Update: January 21, 2021)

This post is mainly based on Hastie, T., & Stuetzle, W. (1989). Principal Curves. Journal of the American Statistical Association.

Jackknife and Mutual Information

January 07, 2019 (Update: April 21, 2020) 0 Comments

In this note, the material about Jackknife is based on Wasserman (2006) and Efron and Hastie (2016), while the Jackknife estimation of Mutual Information is based on Zeng et al. (2018).

Equicorrelation Matrix

February 22, 2020 (Update: March 16, 2020)

kjytay’s blog summarizes some properties of equicorrelation matix, which has the following form,

Generalized Matrix Decomposition

January 17, 2020 (Update: February 15, 2020)

This post is based on the talk given by Dr. Yue Wang at the Department of Statistics and Data Science, Southern University of Science and Technology on Jan. 04, 2020.

Statistical Inference with Unnormalized Models

February 10, 2020 (Update: February 15, 2020)

This post is based on the talk given by T. Kanamori at the 11th ICSA International Conference on Dec. 22nd, 2019.

Tweedie's Formula and Selection Bias

March 11, 2019 (Update: January 31, 2020)

Prof. Inchi HU will give a talk on Large Scale Inference for Chi-squared Data tomorrow, which proposes the Tweedie’s formula in the Bayesian hierarchical model for chi-squared data, and he mentioned a thought-provoking paper, Efron, B. (2011). Tweedie’s Formula and Selection Bias. Journal of the American Statistical Association, 106(496), 1602–1614., which is the focus of this note.

January 05, 2020 (Update: January 30, 2020)

This post is based on the talk, Gradient-based Sparse Principal Component Analysis, given by Dr. Yixuan Qiu at the Department of Statistics and Data Science, Southern University of Science and Technology on Jan. 05, 2020.

Quantitative Genetics

December 21, 2019 (Update: January 30, 2020)

This post is based on the Pao-Lu Hsu Award Lecture given by Prof. Hongyu Zhao at the 11th ICSA International Conference on Dec. 21th, 2019.

Registration Problem in Functional Data Analysis

January 21, 2020 (Update: January 29, 2020)

This post is based on the seminar, Data Acquisition, Registration and Modelling for Multi-dimensional Functional Data, given by Prof. Shi.

January 16, 2020 (Update: January 17, 2020)

This post is based on the material of the second lecture of STAT 6050 instructed by Prof. Wicker, and mainly refer some more formally description from the book, Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar - Foundations of Machine Learning-The MIT Press (2012).

CEASE

December 20, 2019 (Update: January 16, 2020)

This post is based on the Peter Hall Lecture given by Prof. Jianqing Fan at the 11th ICSA International Conference on Dec. 20th, 2019.

Theoretical Results of Lasso

March 26, 2019 (Update: January 16, 2020)

Prof. Jon A. WELLNER introduced the application of a new multiplier inequality on lasso in the distinguish lecture, which reminds me that it is necessary to read more theoretical results of lasso, and so this is the post, which is based on Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical Learning with Sparsity. 362.

NGS for NGS

January 11, 2020 (Update: January 15, 2020)

This post is based on the talk, Next-Generation Statistical Methods for Association Analysis of Now-Generation Sequencing Studies, given by Dr. Xiang Zhan at the Department of Statistics and Data Science, Southern University of Science and Technology on Jan. 05, 2020.

Rare Variant Association Testing

July 18, 2019 (Update: January 15, 2020)

This note is based on

Group Inference in High Dimensions

December 17, 2019 (Update: January 02, 2020)

This post is based on the slides for the talk given by Zijian Guo at The International Statistical Conference In Memory of Professor Sik-Yum Lee

Gibbs Sampler for Finding Motif

December 10, 2018 (Update: December 25, 2019)

This post is the online version of my report for the Project 2 of STAT 5050 taught by Prof. Wei.

A Stochastic Model for Evolution of Metabolic Network

August 07, 2018 (Update: December 05, 2019)

This post is the notes for Mithani et al. (2009).

Union-intersection tests and Intersection-union tests

December 02, 2019 (Update: December 03, 2019)

This post is based on section 8.3 of Casella and Berger (2001).

SIR and Its Implementation

January 08, 2019 (Update: November 01, 2019)

This note is based on Li (1991) and Ma and Zhu (2012).

Particle Filtering and Smoothing

January 18, 2019 (Update: April 09, 2019) 0 Comments

This note is for Doucet, A., & Johansen, A. M. (2009). A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of Nonlinear Filtering, 12(656–704), 3. For the sake of clarity, I split the general SMC methods (section 3) into my next post.

March 20, 2019 (Update: April 08, 2019)

I read the topic in kiytay’s blog: Proximal operators and generalized gradient descent, and then read its reference, Hastie et al. (2015), and write some program to get a better understanding.

The Gibbs Sampler

June 04, 2017 (Update: March 12, 2019) 0 Comments

Gibbs sampler is an iterative algorithm that constructs a dependent sequence of parameter values whose distribution converges to the target joint posterior distribution.

Tensor Completion

March 07, 2019 (Update: March 12, 2019)

Prof. YUAN Ming will give a distinguish lecture on Low Rank Tensor Methods in High Dimensional Data Analysis. To get familiar with his work on tensor, I read his paper, Yuan, M., & Zhang, C.-H. (2016). On Tensor Completion via Nuclear Norm Minimization. Foundations of Computational Mathematics, 16(4), 1031–1068., which is the topic of this post.

Select Prior by Formal Rules

March 04, 2019 (Update: March 05, 2019)

Larry wrote that “Noninformative priors are a lost cause” in his post, LOST CAUSES IN STATISTICS II: Noninformative Priors, and he mentioned his review paper Kass and Wasserman (1996) on noninformative priors. This note is for this paper.

An Illustration of Importance Sampling

July 16, 2017 (Update: January 31, 2019) 0 Comments

This report shows how to use importance sampling to estimate the expectation.

Sequential Monte Carlo Methods

June 10, 2017 (Update: January 31, 2019) 0 Comments

The first peep to SMC as an abecedarian, a more comprehensive note can be found here.

Chain-Structured Models

September 08, 2017 (Update: January 30, 2019) 0 Comments

There is an important probability distribution used in many applications, the chain-structured model.

Growing A Polymer

July 17, 2017 (Update: January 30, 2019) 0 Comments

This report implements the simulation of growing a polymer under the self-avoid walk model, and summary the sequential importance sampling techniques for this problem.

Genetic network inference

There are my notes when I read the paper called Genetic network inference.

Systems Genetic Approach

There are my notes when I read the paper called System Genetic Approach.

MICA

There are my notes when I read the paper called Maximal information component analysis.

MINE

There are my notes when I read the paper called Detecting Novel Associations in Large Data Sets.

Implement of MINE

This is the implement in R of MINE.

Illustrations of Support Vector Machines

Use the e1071 library in R to demonstrate the support vector classifier and the SVM.

ARIMA

Any time series without a constant mean over time is nonstationary.

Model Specification

For a given time series, how to choose appropriate values for $p, d, q$

Metropolis Algorithm

Monte Carlo plays a key role in evaluating integrals and simulating stochastic systems, and the most critical step of Monte Carlo algorithm is sampling from an appropriate probability distribution $\pi (\mathbf x)$. There are two ways to solve this problem, one is to do importance sampling, another is to produce statistically dependent samples based on the idea of Markov chain Monte Carlo sampling.

Poisson Regression

“The p value was never meant to be used the way it’s used today.” –Goodman

The conjugate gradient method is an iterative method for solving a linear system of equations, so we can use conjugate method to estimate the parameters in (linear/ridge) regression.

Cox Regression

Survival analysis examines and models the time it takes for events to occur. It focuses on the distribution of survival times. There are many well known methods for estimating unconditional survival distribution, and they examines the relationship between survival and one or more predictors, usually terms covariates in the survival-analysis literature. And Cox Proportional-Hazards regression model is one of the most widely used method of survival analysis.

Dynamics of Helicobacter pylori colonization

This post is the notes of this paper.

A Faster Algorithm for Repeated Linear Regression

Repeated Linear Regression means that repeat the fitting of linear regression for many times, and there are some common parts among these regressions.

An R Package: Fit Repeated Linear Regressions

Repeated Linear Regressions refer to a set of linear regressions in which there are several same variables.

Stochastic Epidemic Models

Discuss three different methods for formulating stochastic epidemic models.

Essentials of Survival Time Analysis

This post aims to clarify the relationship between rates and probabilities.

Evolutionary Systems Biology

December 30, 2018

The note is for Chapter 1 of Soyer, Orkun S., ed. 2012 Evolutionary Systems Biology. Advances in Experimental Medicine and Biology, 751. New York: Springer.

Sequential Monte Carlo Methods

January 19, 2019

This note is for Section 3 of Doucet, A., & Johansen, A. M. (2009). A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of Nonlinear Filtering, 12(656–704), 3., and it is the complement of my previous post.

Annealed Importance Sampling

January 28, 2019

This is the note for Neal, R. M. (1998). Annealed Importance Sampling. ArXiv:Physics/9803008.

The First Glimpse into Pseudolikelihood

February 12, 2019

This post caught a glimpse of the pseudolikelihood.

Gibbs Sampling for the Multivariate Normal

February 13, 2019

This note is based on Chapter 7 of Hoff PD. A first course in Bayesian statistical methods. Springer Science & Business Media; 2009 Jun 2.

Review of Composite Likelihood

February 13, 2019

This note is based on Varin, C., Reid, N., & Firth, D. (2011). AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS. Statistica Sinica, 21(1), 5–42., a survey of recent developments in the theory and application of composite likelihood.

Studentized U-statistic

In Prof. Shao’s wonderful talk, Wandering around the Asymptotic Theory, he mentioned the Studentized U-statistics. I am interested in the derivation of the variances in the denominator.

Presistency

February 18, 2019

The paper, Greenshtein and Ritov (2004), is recommended by Larry Wasserman in his post Consistency, Sparsistency and Presistency.

Restricted Isometry Property

February 19, 2019

I encounter the term RIP in Larry Wasserman’s post, RIP RIP (Restricted Isometry Property, Rest In Peace), and also find some material in Hastie et al.’s book: Statistical Learning with Sparsity about RIP.

Continuous Time Markov Chain

February 20, 2019

This note is based on Karl Sigman’s IEOR 6711: Continuous-Time Markov Chains.

February 21, 2019

I learned Stein’s Paradox from Larry Wasserman’s post, STEIN’S PARADOX, perhaps I had encountered this term before but I cannot recall anything about it. (I am guilty)

Evaluate Variational Inference

March 07, 2019

A brief summary of the post, Eid ma clack shaw zupoven del ba.

Bernstein Bounds

March 08, 2019

I noticed that the papers of matrix/tensor completion always talk about the Bernstein inequality, then I picked the Bernstein Bounds discussed in Wainwright (2019).

Joint Summarized by Marginal or Conditional?

March 25, 2019

I happened to read Yixuan’s blog about a question related to the course Statistical Inference, whether two marginal distributions can determine the joint distribution. The question is adopted from Exercise 4.47 of Casella and Berger (2002).

Frequentist Accuracy of Bayesian Estimates

March 31, 2019

This note is for Efron’s slide: Frequentist Accuracy of Bayesian Estimates, which is recommended by Larry’s post: Shaking the Bayesian Machine.

Soft Imputation in Matrix Completion

April 01, 2019

This post is based on Chapter 7 of Statistical Learning with Sparsity: The Lasso and Generalizations, and I wrote R program to reproduce the simulations to get a better understanding.

Wierd Things in Mixture Models

April 04, 2019

This note is based on Larry’s post, Mixture Models: The Twilight Zone of Statistics.

April 08, 2019

This post is mainly based on Hastie et al. (2015), and incorporated with some materials from Watson (1992).

Normalizing Constant

April 10, 2019

Larry discussed the normalizing constant paradox in his blog.

Statistical Inference for Lasso

April 15, 2019

This note is based on the Chapter 6 of Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical Learning with Sparsity. 362..

Least Squares for SIMs

April 15, 2019

In the last lecture of STAT 5030, Prof. Lin shared one of the results in the paper, Neykov, M., Liu, J. S., & Cai, T. (2016). L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs. Journal of Machine Learning Research, 17(87), 1–37., or say the start point for the paper—the following Lemma. Because it seems that the condition and the conclusion is completely same with Sliced Inverse Regression, except for a direct interpretation—the least square regression.

Identifiability and Estimability

April 20, 2019

Materials from STAT 5030.

Self-normalized Limit Theory and Stein's Method

May 01, 2019

This note consists of the lecture material of STAT 6060 taught by Prof. Shao, four homework (indexed by “Homework”) and several personal comments (indexed by “Note”).

The General Decision Problem

May 06, 2019

This note is based on Chapter 1 of Lehmann EL, Romano JP. Testing statistical hypotheses. Springer Science & Business Media; 2006 Mar 30.

Medicine Meets AI

June 23, 2019

Last two days, I attended the conference Medicine Meets AI 2019: East Meets West, which help me know more AI from the industrial and medical perspective.

Canonical Variate Analysis

July 16, 2019

This note is based on Campbell, N. A. (1979). CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS. 243.

The Simplex Method

July 23, 2019

This note is based on Chapter 13 of Nocedal, J., & Wright, S. (2006). Numerical optimization. Springer Science & Business Media.

Interior-point Method

August 16, 2019

Nocedal and Wright (2006) and Boyd and Vandenberghe (2004) present slightly different introduction on Interior-point method. More specifically, the former one only considers equality constraints, while the latter incorporates the inequality constraints.

Debiased Lasso

September 08, 2019

This post is based on Section 6.4 of Hastie, Trevor, Robert Tibshirani, and Martin Wainwright. “Statistical Learning with Sparsity,” 2016, 362.

Likelihood-free inference by ratio estimation

This note is for Thomas, O., Dutta, R., Corander, J., Kaski, S., & Gutmann, M. U. (2016). Likelihood-free inference by ratio estimation. ArXiv:1611.10242 [Stat]., and I got this paper from Xi’an’s blog.

Basic of $B$-splines

This note is based on de Boor, C. (1978). A Practical Guide to Splines, Springer, New York.

ABC for Socks

This post is based on Prof. Robert’s slides on JSM 2019 and an intuitive blog from Rasmus Bååth.

Sub Gaussian

October 05, 2019

This post is based on Wainwright (2019).

Noise Outsourcing

October 10, 2019

I learnt the term Noise Outsourcing in kjytay’s blog, which is based on Teh Yee Whye’s IMS Medallion Lecture at JSM 2019.

Isotropic vs. Anisotropic

October 24, 2019

I came across isotropic and anisotropic covariance functions in kjytay’s blog, and then I found more materials, chapter 4 from the book Gaussian Processes for Machine Learning, via the reference in StackExchange: What is an isotropic (spherical) covariance matrix?.

Combining $p$-values in Meta Analysis

December 04, 2019

I came across the term meta-analysis in the previous post, and I had another question about nominal size while reading the paper of the previous post, which reminds me Keith’s notes. By coincidence, I also find the topic about meta-analysis in the same notes. Hence, this post is mainly based on Keith’s notes, and reproduce the power curves by myself.

Fantastic Generalization Measures and Where to Find Them

December 06, 2019

The post is based on Jiang, Y., Neyshabur, B., Mobahi, H., Krishnan, D., & Bengio, S. (2019). Fantastic Generalization Measures and Where to Find Them. ArXiv:1912.02178 [Cs, Stat].which was shared by one of my friend in the WeChat Moment, and then I took a quick look.

Quantile Regression Forests

December 10, 2019

This post is based on Meinshausen, N. (2006). Quantile Regression Forests. 17. since a coming seminar is related to such topic.

Conditional Quantile Regression Forests

December 12, 2019

This note is based on the slides of the seminar, Dr. ZHU, Huichen. Conditional Quantile Random Forest.

DNA copy number profiling: from bulk tissue to single cells

January 02, 2020

This post is based on the talk given by Yuchao Jiang at the 11th ICSA International Conference on Dec. 20th, 2019.

Concentration Inequality for Machine Learning

January 09, 2020

This post is based on the material of the first lecture of STAT6050 instructed by Prof. Wicker.

Classification with Imperfect Training Labels

January 15, 2020

This post is based on the talk, given by Timothy I. Cannings at the 11th ICSA International Conference on Dec. 22th, 2019, the corresponding paper is Cannings, T. I., Fan, Y., & Samworth, R. J. (2019). Classification with imperfect training labels. ArXiv:1805.11505 [Math, Stat]

Multiple Isotonic Regression

February 20, 2020

The first two sections are based on a good tutorial on the isotonic regression, and the third section consists of the slides for the talk given by Prof. Cun-Hui Zhang at the 11th ICSA International Conference on Dec. 21st, 2019.

Bernstein-von Mises Theorem

February 24, 2020

I came across the Bernstein-von Mises theorem in Yuling Yao’s blog, and I also found a quick definition in the blog hosted by Prof. Andrew Gelman, although this one is not by Gelman. By coincidence, the former is the PhD student of the latter!

Common Principal Components

February 28, 2020

This post is based on Flury (1984).

Survey on Time Series Change Points

May 31, 2020

This note is based on the survey paper, Aminikhanghahi, S., & Cook, D. J. (2017). A Survey of Methods for Time Series Change Point Detection. Knowledge and Information Systems, 51(2), 339–367.

Star Formation

This note is for Chapter 19 of Astronomy Today, 8th Edition.

Generalized Degrees of Freedom

This note is for Friedman, J. H. (1991). Multivariate Adaptive Regression Splines. The Annals of Statistics, 19(1), 1–67.

End-to-End Instance Segmentation

This note is for ISTR: End-to-End Instance Segmentation with Transformers.

Knowledge Graph and Electronic Medical Records

This note covers several papers on Knowledge Graph and Electronic Medical Records.