WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Benchopt: Benchmarks for ML Optimizations

November 01, 2024 (Update: )

This is the note for Moreau, T., Massias, M., Gramfort, A., Ablin, P., Bannier, P.-A., Charlier, B., Dagréou, M., Tour, T. D. la, Durif, G., Dantas, C. F., Klopfenstein, Q., Larsson, J., Lai, E., Lefort, T., Malézieux, B., Moufad, B., Nguyen, B. T., Rakotomamonjy, A., Ramzi, Z., … Vaiter, S. (2022). Benchopt: Reproducible, efficient and collaborative optimization benchmarks (No. arXiv:2206.13424). arXiv. https://doi.org/10.48550/arXiv.2206.13424

Continue reading



Guarantees of Lloyd’s Algorithm

September 10, 2024 (Update: ) 0 Comments

This note is for Lu, Y., & Zhou, H. H. (2016). Statistical and Computational Guarantees of Lloyd’s Algorithm and its Variants (No. arXiv:1612.02099). arXiv. http://arxiv.org/abs/1612.02099

Continue reading



scDRS: single-cell disease relevance score

September 10, 2024 (Update: ) 0 Comments

This note is for Zhang, M. J., Hou, K., Dey, K. K., Sakaue, S., Jagadeesh, K. A., Weinand, K., Taychameekiatchai, A., Rao, P., Pisco, A. O., Zou, J., Wang, B., Gandal, M., Raychaudhuri, S., Pasaniuc, B., & Price, A. L. (2022). Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nature Genetics, 54(10), 1572–1580. https://doi.org/10.1038/s41588-022-01167-z

Continue reading



Data Fission

August 05, 2024 (Update: ) 0 Comments

This note is for the discussion paper Leiner, J., Duan, B., Wasserman, L., & Ramdas, A. (2023). Data fission: Splitting a single data point (arXiv:2112.11079). arXiv. http://arxiv.org/abs/2112.11079 in the JASA invited session at JSM 2024

Continue reading



Talagrand Concentration

July 30, 2024 (Update: ) 0 Comments

This note is for Wainwright, M. J. (n.d.). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. 604.

Continue reading



Approximating Bayes

July 04, 2024 (Update: ) 0 Comments

This is the note for Martin, G. M., Frazier, D. T., & Robert, C. P. (2024). Approximating Bayes in the 21st Century. Statistical Science, 39(1), 20–45. https://doi.org/10.1214/22-STS875

Continue reading



Perference Matching in RLHF

August 05, 2024 (Update: )

This is the note for the talk Statistical Inference in Large Language Models: Alignment and Copyright given by Weijie Su at JSM 2024

Continue reading



Training in Large Language Models

August 05, 2024 (Update: )

This is the note for the talk LLMs training given by Linjun Zhang at JSM 2024

Continue reading



Watermarks in Large Language Models

August 05, 2024 (Update: ) 0 Comments

This is the note for the talk Statistical Inference in Large Language Models: A Statistical Framework of Watermarks given by Weijie Su at JSM 2024

Continue reading



Causal Inference on Distribution Functions

February 20, 2024 (Update: )

This post is for Lin, Z., Kong, D., & Wang, L. (2023). Causal inference on distribution functions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(2), 378–398.

Continue reading



Model-X Knockoffs

April 20, 2024 (Update: )

This note is for Candes, E., Fan, Y., Janson, L., & Lv, J. (2017). Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection. arXiv:1610.02351 [Math, Stat].

Continue reading



MMRM: Mixed-Models for Repeated Measures

January 10, 2024 (Update: )

This post is based on vignettes of MMRM R package: https://openpharma.github.io/mmrm/main/index.html

Continue reading



sctransform: Normalization using Regularized Negative Binomial Regression

February 24, 2024 (Update: )

The note is for Hafemeister, C., & Satija, R. (2019). Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biology, 20(1), 296.

Continue reading



Effective Gene Expression Prediction

January 26, 2024 (Update: )

This note is for Avsec, Ž., Agarwal, V., Visentin, D., Ledsam, J. R., Grabska-Barwinska, A., Taylor, K. R., Assael, Y., Jumper, J., Kohli, P., & Kelley, D. R. (2021). Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, 18(10), 1196–1203.

Continue reading



BAMLSS: Flexible Bayesian Additive Joint Model

January 31, 2024 (Update: )

This post is for Köhler, M., Umlauf, N., Beyerlein, A., Winkler, C., Ziegler, A.-G., & Greven, S. (2017). Flexible Bayesian additive joint models with an application to type 1 diabetes research. Biometrical Journal, 59(6), 1144–1165.

Continue reading



Joint Model in High Dimension

January 31, 2024 (Update: )

This post is for Liu, M., Sun, J., Herazo-Maya, J. D., Kaminski, N., & Zhao, H. (2019). Joint Models for Time-to-Event Data and Longitudinal Biomarkers of High Dimension. Statistics in Biosciences, 11(3), 614–629.

Continue reading



FDR Control in GLM

January 15, 2024 (Update: )

This post is for Dai, C., Lin, B., Xing, X., & Liu, J. S. (2023). A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models. Journal of the American Statistical Association, 118(543), 1551–1565.

Continue reading



One-way Matching with Low Rank

January 06, 2024 (Update: )

This post is for Chen, Shuxiao, Sizun Jiang, Zongming Ma, Garry P. Nolan, and Bokai Zhu. “One-Way Matching of Datasets with Low Rank Signals.” arXiv, October 3, 2022.

Continue reading



FDR Control via Data Splitting

December 19, 2020 (Update: )

This note is for Dai, C., Lin, B., Xing, X., & Liu, J. S. (2020). False Discovery Rate Control via Data Splitting. ArXiv:2002.08542 [Stat].

Continue reading



CountSplit for scRNA Data

December 08, 2023 (Update: )

The post is for Neufeld, Anna, Lucy L Gao, Joshua Popp, Alexis Battle, and Daniela Witten. “Inference after Latent Variable Estimation for Single-cell RNA Sequencing Data.” Biostatistics, December 13, 2022, kxac047.

Continue reading



Lamian: Differential Pseudotime Analysis

September 14, 2023 (Update: )

This note is for Hou, W., Ji, Z., Chen, Z., Wherry, E. J., Hicks, S. C., & Ji, H. (2021). A statistical framework for differential pseudotime analysis with multiple single-cell RNA-seq samples (p. 2021.07.10.451910). bioRxiv.

Continue reading



MM algorithm for Variance Components Models

November 01, 2019 (Update: )

The post is based on Zhou, H., Hu, L., Zhou, J., & Lange, K. (2019). MM Algorithms for Variance Components Models. Journal of Computational and Graphical Statistics, 28(2), 350–361.

Continue reading



fGWAS: Dynamic Model for GWAS

June 11, 2023 (Update: )

The note is for Das, K., Li, J., Wang, Z., Tong, C., Fu, G., Li, Y., Xu, M., Ahn, K., Mauger, D., Li, R., & Wu, R. (2011). A dynamic model for genome-wide association studies. Human Genetics, 129(6), 629–639.

Continue reading



Time-varying Group Sparse Additive Model for GWAS

June 11, 2023 (Update: )

This post is for Marchetti-Bowick, M., Yin, J., Howrylak, J. A., & Xing, E. P. (2016). A time-varying group sparse additive model for genome-wide association studies of dynamic complex traits. Bioinformatics, 32(19), 2903–2910.

Continue reading



Benchmarking Algorithms for Gene Regulatory Network Inference

July 14, 2023 (Update: )

This note is for Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A., & Murali, T. M. (2020). Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nature Methods, 17(2), Article 2.

Continue reading



GWAS of Longitudinal Trajectories at Biobank Scale

June 11, 2023 (Update: )

This post is for Ko, S., German, C. A., Jensen, A., Shen, J., Wang, A., Mehrotra, D. V., Sun, Y. V., Sinsheimer, J. S., Zhou, H., & Zhou, J. J. (2022). GWAS of longitudinal trajectories at biobank scale. The American Journal of Human Genetics, 109(3), 433–445.

Continue reading



Magnetic Field Orientations in Star Formation

January 12, 2022 (Update: )

Continue reading



LD Score Regression

December 15, 2022 (Update: )

This note is for Bulik-Sullivan, B. K., Loh, P.-R., Finucane, H. K., Ripke, S., Yang, J., Patterson, N., Daly, M. J., Price, A. L., & Neale, B. M. (2015). LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics, 47(3), 291–295.

Continue reading



Cox Regression

August 17, 2017 (Update: ) 0 Comments

Survival analysis examines and models the time it takes for events to occur. It focuses on the distribution of survival times. There are many well known methods for estimating unconditional survival distribution, and they examines the relationship between survival and one or more predictors, usually terms covariates in the survival-analysis literature. And Cox Proportional-Hazards regression model is one of the most widely used method of survival analysis.

Continue reading



Similarity Network Fusion

December 28, 2022 (Update: )

This post is for Wang, B., Mezlini, A. M., Demir, F., Fiume, M., Tu, Z., Brudno, M., Haibe-Kains, B., & Goldenberg, A. (2014). Similarity network fusion for aggregating data types on a genomic scale. Nature Methods, 11(3), Article 3. and a related paper Ruan, P., Wang, Y., Shen, R., & Wang, S. (2019). Using association signal annotations to boost similarity network fusion. Bioinformatics, 35(19), 3718–3726.

Continue reading



Rare Variant Association Testing

July 18, 2019 (Update: )

This note is based on

Continue reading



The General Decision Problem

May 06, 2019 (Update: )

This note is based on Chapter 1 of Lehmann EL, Romano JP. Testing statistical hypotheses. Springer Science & Business Media; 2006 Mar 30.

Continue reading



Machine Learning for Multi-omics Data

July 15, 2022 (Update: )

This note is based on Cai, Z., Poulos, R. C., Liu, J., & Zhong, Q. (2022). Machine learning for multi-omics data integration in cancer. IScience, 25(2), 103798.

Continue reading



Differentiable Sorting and Ranking

November 04, 2022 (Update: )

This note is for Blondel, M., Teboul, O., Berthet, Q., & Djolonga, J. (2020). Fast Differentiable Sorting and Ranking (arXiv:2002.08871). arXiv.

Continue reading



Joint Local False Discovery Rate in GWAS

November 12, 2022 (Update: )

This note is for Jiang, W., & Yu, W. (2017). Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies. Bioinformatics, 33(4), 500–507.

Continue reading



Generative Bootstrap/Multi-purpose Samplers

September 22, 2021 (Update: )

This post is based on the first version of Shin, M., Wang, L., & Liu, J. S. (2020). Scalable Uncertainty Quantification via GenerativeBootstrap Sampler., which is lately updated as Shin, M., Wang, S., & Liu, J. S. (2022). Generative Multiple-purpose Sampler for Weighted M-estimation (arXiv:2006.00767; Version 2). arXiv.

Continue reading



High Dimensional Linear Discriminant Analysis

July 15, 2019 (Update: )

This note is for Cai, T. T., & Zhang, L. (2019). High dimensional linear discriminant analysis: Optimality, adaptive algorithm and missing data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(4), 675–705.

Continue reading



Joint Model of Longitudinal and Survival Data

October 02, 2022 (Update: )

This post is based on Rizopoulos, D. (2017). An Introduction to the Joint Modeling of Longitudinal and Survival Data, with Applications in R. 235.

Continue reading



Conformal Inference

September 22, 2021 (Update: ) 0 Comments

The note is based on Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J., & Wasserman, L. (2018). Distribution-Free Predictive Inference for Regression. Journal of the American Statistical Association, 113(523), 1094–1111. and Tibshirani, R. J., Candès, E. J., Barber, R. F., & Ramdas, A. (2019). Conformal Prediction Under Covariate Shift. Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2530–2540.

Continue reading



Multicenter IPF-PRO Registry Cohort

August 25, 2022 (Update: )

This note is for Todd, J. L., Vinisko, R., Liu, Y., Neely, M. L., Overton, R., Flaherty, K. R., Noth, I., Newby, L. K., Lasky, J. A., Olman, M. A., Hesslinger, C., Leonard, T. B., Palmer, S. M., & Belperio, J. A. (2020). Circulating matrix metalloproteinases and tissue metalloproteinase inhibitors in patients with idiopathic pulmonary fibrosis in the multicenter IPF-PRO Registry cohort. BMC Pulmonary Medicine, 20(1), 64.

Continue reading



Robust Registration of 2D and 3D Point Sets

November 05, 2020 (Update: )

This note is for Fitzgibbon, A. W. (2003). Robust registration of 2D and 3D point sets. Image and Vision Computing, 21(13), 1145–1153.

Continue reading



Test of Monotonicity by Calibrating for Linear Functions

May 11, 2022 (Update: )

This note is for Hall, P., & Heckman, N. E. (2000). Testing for Monotonicity of a Regression Mean by Calibrating for Linear Functions. The Annals of Statistics, 28(1), 20–39.

Continue reading



Estimation of Location and Scale Parameters of Continuous Density

March 22, 2022 (Update: )

This note is for Pitman, E. J. G. (1939). The Estimation of the Location and Scale Parameters of a Continuous Population of any Given Form. Biometrika, 30(3/4), 391–421. and Kagan, AM & Rukhin, AL. (1967). On the estimation of a scale parameter. Theory of Probability \& Its Applications, 12, 672–678.

Continue reading



Cross-Validation for High-Dimensional Ridge and Lasso

September 16, 2021 (Update: ) 0 Comments

This note collects several references on the research of cross-validation.

Continue reading



Surrogate Splits in Classification and Regression Trees

January 08, 2020 (Update: )

This note is for Section 5.3 of Breiman, L. (Ed.). (1998). Classification and regression trees (1. CRC Press repr). Chapman & Hall/CRC.

Continue reading



A pHMM Algorithm for Correcting Long Reads

May 26, 2021 (Update: ) 0 Comments

This note is for Firtina, C., Bar-Joseph, Z., Alkan, C., & Cicek, A. E. (2018). Hercules: A profile HMM-based hybrid error correction algorithm for long reads. Nucleic Acids Research, 46(21), e125.

Continue reading



Infinite Relational Model

November 18, 2021 (Update: ) 0 Comments

This note is based on Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T., & Ueda, N. (n.d.). Learning Systems of Concepts with an Infinite Relational Model. 8. and Saad, F. A., & Mansinghka, V. K. (2021). Hierarchical Infinite Relational Model. ArXiv:2108.07208 [Cs, Stat].

Continue reading



Surprises in High-Dimensional Ridgeless Least Squares Interpolation

June 24, 2019 (Update: )

This post is based on Hastie, T., Montanari, A., Rosset, S., & Tibshirani, R. J. (2019). Surprises in High-Dimensional Ridgeless Least Squares Interpolation. 53.

Continue reading



Local Tracklets Filtering and Global Tracklets Association

July 05, 2021 (Update: ) 0 Comments

This note is for Xing, J., Ai, H., & Lao, S. (2009). Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 1200–1207.

Continue reading



Instance Segmentation with Cosine Embeddings

April 25, 2021 (Update: ) 0 Comments

This note is for Payer, C., Štern, D., Neff, T., Bischof, H., & Urschler, M. (2018). Instance Segmentation and Tracking with Cosine Embeddings and Recurrent Hourglass Networks. ArXiv:1806.02070 [Cs].

Continue reading



Illustrate Path Sampling by Stan Programming

March 06, 2019 (Update: ) 0 Comments

This post reviewed the topic of path sampling in the lecture slides of STAT 5020, and noted a general path sampling described by Gelman and Meng (1998), then used a toy example to illustrate it with Stan programming language.

Continue reading



Bootstrap Hypothesis Testing

March 03, 2019 (Update: ) 0 Comments

This report is motivated by comments under Larry’s post, Modern Two-Sample Tests.

Continue reading



Monetone B-spline Smoothing

March 09, 2021 (Update: ) 0 Comments

This note is based on He, X., & Shi, P. (1998). Monotone B-Spline Smoothing. Journal of the American Statistical Association, 93(442), 643–650., and the reproduced simulations are based on the updated algorithm, Ng, P., & Maechler, M. (2007). A fast and efficient implementation of qualitatively constrained quantile smoothing splines. Statistical Modelling, 7(4), 315–328.

Continue reading



Principal Curves

September 28, 2020 (Update: )

This post is mainly based on Hastie, T., & Stuetzle, W. (1989). Principal Curves. Journal of the American Statistical Association.

Continue reading



Metabolic Network and Their Evolution

December 31, 2018 (Update: )

The note is for Wagner, A. (2012). Metabolic Networks and Their Evolution. In O. S. Soyer (Ed.), Evolutionary Systems Biology (Vol. 751, pp. 29–52). Springer New York.

Continue reading



Sequence Alignment in EHR

November 12, 2020 (Update: )

This note is for Huang, M., Shah, N. D., & Yao, L. (2019). Evaluating global and local sequence alignment methods for comparing patient medical records. BMC Medical Informatics and Decision Making, 19(6), 263.

Continue reading



Efficient ICP Variants

November 07, 2020 (Update: )

This note is for Rusinkiewicz, S., & Levoy, M. (2001). Efficient variants of the ICP algorithm. Proceedings Third International Conference on 3-D Digital Imaging and Modeling, 145–152..

Continue reading



Particle Tracking as Linear Assignment Problem

September 24, 2020 (Update: )

This post is based on Jaqaman, K., Loerke, D., Mettlen, M., Kuwata, H., Grinstein, S., Schmid, S. L., & Danuser, G. (2008). Robust single-particle tracking in live-cell time-lapse sequences. Nature Methods, 5(8), 695–702.

Continue reading



Eleven Challengs in Single Cell Data Science

June 08, 2020 (Update: )

This note is for Lähnemann, D., Köster, J., Szczurek, E., McCarthy, D. J., Hicks, S. C., Robinson, M. D., Vallejos, C. A., Campbell, K. R., Beerenwinkel, N., Mahfouz, A., Pinello, L., Skums, P., Stamatakis, A., Attolini, C. S.-O., Aparicio, S., Baaijens, J., Balvert, M., Barbanson, B. de, Cappuccio, A., … Schönhuth, A. (2020). Eleven grand challenges in single-cell data science. Genome Biology, 21(1), 31.

Continue reading



CFPCA for Human Movement Data

April 26, 2020 (Update: )

This post is based on Coffey, N., Harrison, A. J., Donoghue, O. A., & Hayes, K. (2011). Common functional principal components analysis: A new approach to analyzing human movement data. Human Movement Science, 30(6), 1144–1166.

Continue reading



Jackknife and Mutual Information

January 07, 2019 (Update: ) 0 Comments

In this note, the material about Jackknife is based on Wasserman (2006) and Efron and Hastie (2016), while the Jackknife estimation of Mutual Information is based on Zeng et al. (2018).

Continue reading



Common Functional Principal Components

February 29, 2020 (Update: )

This post is based on Benko, M., Härdle, W., & Kneip, A. (2009). Common functional principal components. The Annals of Statistics, 37(1), 1–34.

Continue reading



Equicorrelation Matrix

February 22, 2020 (Update: )

kjytay’s blog summarizes some properties of equicorrelation matix, which has the following form,

Continue reading



Exponential Twisting in Importance Sampling

September 18, 2019 (Update: )

This note is based on Ma, J., Du, K., & Gu, G. (2019). An efficient exponential twisting importance sampling technique for pricing financial derivatives. Communications in Statistics - Theory and Methods, 48(2), 203–219.

Continue reading



Generalized Matrix Decomposition

January 17, 2020 (Update: )

This post is based on the talk given by Dr. Yue Wang at the Department of Statistics and Data Science, Southern University of Science and Technology on Jan. 04, 2020.

Continue reading



Statistical Inference with Unnormalized Models

February 10, 2020 (Update: )

This post is based on the talk given by T. Kanamori at the 11th ICSA International Conference on Dec. 22nd, 2019.

Continue reading



Tweedie's Formula and Selection Bias

March 11, 2019 (Update: )

Prof. Inchi HU will give a talk on Large Scale Inference for Chi-squared Data tomorrow, which proposes the Tweedie’s formula in the Bayesian hierarchical model for chi-squared data, and he mentioned a thought-provoking paper, Efron, B. (2011). Tweedie’s Formula and Selection Bias. Journal of the American Statistical Association, 106(496), 1602–1614., which is the focus of this note.

Continue reading



Gradient-based Sparse Principal Component Analysis

January 05, 2020 (Update: )

This post is based on the talk, Gradient-based Sparse Principal Component Analysis, given by Dr. Yixuan Qiu at the Department of Statistics and Data Science, Southern University of Science and Technology on Jan. 05, 2020.

Continue reading



Quantitative Genetics

December 21, 2019 (Update: )

This post is based on the Pao-Lu Hsu Award Lecture given by Prof. Hongyu Zhao at the 11th ICSA International Conference on Dec. 21th, 2019.

Continue reading



Registration Problem in Functional Data Analysis

January 21, 2020 (Update: )

This post is based on the seminar, Data Acquisition, Registration and Modelling for Multi-dimensional Functional Data, given by Prof. Shi.

Continue reading



Rademacher Complexity

January 16, 2020 (Update: )

This post is based on the material of the second lecture of STAT 6050 instructed by Prof. Wicker, and mainly refer some more formally description from the book, Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar - Foundations of Machine Learning-The MIT Press (2012).

Continue reading



CEASE

December 20, 2019 (Update: )

This post is based on the Peter Hall Lecture given by Prof. Jianqing Fan at the 11th ICSA International Conference on Dec. 20th, 2019.

Continue reading



Theoretical Results of Lasso

March 26, 2019 (Update: )

Prof. Jon A. WELLNER introduced the application of a new multiplier inequality on lasso in the distinguish lecture, which reminds me that it is necessary to read more theoretical results of lasso, and so this is the post, which is based on Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical Learning with Sparsity. 362.

Continue reading



NGS for NGS

January 11, 2020 (Update: )

This post is based on the talk, Next-Generation Statistical Methods for Association Analysis of Now-Generation Sequencing Studies, given by Dr. Xiang Zhan at the Department of Statistics and Data Science, Southern University of Science and Technology on Jan. 05, 2020.

Continue reading



Group Inference in High Dimensions

December 17, 2019 (Update: )

This post is based on the slides for the talk given by Zijian Guo at The International Statistical Conference In Memory of Professor Sik-Yum Lee

Continue reading



Gibbs Sampler for Finding Motif

December 10, 2018 (Update: )

This post is the online version of my report for the Project 2 of STAT 5050 taught by Prof. Wei.

Continue reading



A Stochastic Model for Evolution of Metabolic Network

August 07, 2018 (Update: )

This post is the notes for Mithani et al. (2009).

Continue reading



Controlling bias and inflation in EWAS/TWAS

December 04, 2019 (Update: )

The post is based on the BIOS Consortium, van Iterson, M., van Zwet, E. W., & Heijmans, B. T. (2017). Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution. Genome Biology, 18(1), 19.

Continue reading



Multivariate Mediation Effects

November 04, 2019 (Update: )

This note is based on Huang, Y.-T. (2019). Variance component tests of multivariate mediation effects under composite null hypotheses. Biometrics, 0(0).

Continue reading



Union-intersection tests and Intersection-union tests

December 02, 2019 (Update: )

This post is based on section 8.3 of Casella and Berger (2001).

Continue reading



Generalized Functional Linear Models with Semiparametric Single-index Interactions

October 29, 2019 (Update: )

This post is based on Li, Y., Wang, N., & Carroll, R. J. (2010). Generalized Functional Linear Models With Semiparametric Single-Index Interactions. Journal of the American Statistical Association, 105(490), 621–633.

Continue reading



Gaussian DAGs on Network Data

November 19, 2019 (Update: )

This post is based on Li, H., & Zhou, Q. (2019). Gaussian DAGs on network data. ArXiv:1905.10848 [Cs, Stat].

Continue reading



Optimal estimation of functionals of high-dimensional mean and covariance matrix

August 26, 2019 (Update: )

This post is based on Fan, J., Weng, H., & Zhou, Y. (2019). Optimal estimation of functionals of high-dimensional mean and covariance matrix. ArXiv:1908.07460 [Math, Stat].

Continue reading



SIR and Its Implementation

January 05, 2019 (Update: ) 0 Comments

Continue reading



Link-free v.s. Semiparametric

January 08, 2019 (Update: )

This note is based on Li (1991) and Ma and Zhu (2012).

Continue reading



Sparse LDA

September 17, 2019 (Update: )

This note is based on Shao, J., Wang, Y., Deng, X., & Wang, S. (2011). Sparse linear discriminant analysis by thresholding for high dimensional data. The Annals of Statistics, 39(2), 1241–1265.

Continue reading



Feature Annealed Independent Rules

September 17, 2019 (Update: ) 0 Comments

This note is based on Fan, J., & Fan, Y. (2008). High-dimensional classification using features annealed independence rules. The Annals of Statistics, 36(6), 2605–2637.

Continue reading



Dantzig Selector

August 16, 2019 (Update: )

This post is based on Candes, E., & Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. The Annals of Statistics, 35(6), 2313–2351.

Continue reading



MLE for MTP2

July 05, 2019 (Update: )

This post is based on Lauritzen, S., Uhler, C., & Zwiernik, P. (2019). Maximum likelihood estimation in Gaussian models under total positivity. The Annals of Statistics, 47(4), 1835–1863.

Continue reading



TreeClone

July 08, 2019 (Update: )

This note is based on Zhou, T., Sengupta, S., Müller, P., & Ji, Y. (2019). TreeClone: Reconstruction of tumor subclone phylogeny based on mutation pairs using next generation sequencing data. The Annals of Applied Statistics, 13(2), 874–899.

Continue reading



Minimax Lower Bounds

June 28, 2019 (Update: )

This note is based on Chapter 15 of Wainwright, M. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint (Cambridge Series in Statistical and Probabilistic Mathematics). Cambridge: Cambridge University Press.

Continue reading



Change Points

May 28, 2019 (Update: )

Continue reading



Fourier Series

May 07, 2019 (Update: )

Continue reading



M-estimator

May 09, 2019 (Update: )

Continue reading



Particle Filtering and Smoothing

January 18, 2019 (Update: ) 0 Comments

This note is for Doucet, A., & Johansen, A. M. (2009). A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of Nonlinear Filtering, 12(656–704), 3. For the sake of clarity, I split the general SMC methods (section 3) into my next post.

Continue reading



Generalized Gradient Descent

March 20, 2019 (Update: )

I read the topic in kiytay’s blog: Proximal operators and generalized gradient descent, and then read its reference, Hastie et al. (2015), and write some program to get a better understanding.

Continue reading



Multiple Object Tracking

March 26, 2019 (Update: )

This note is for Luo, W., Xing, J., Milan, A., Zhang, X., Liu, W., Zhao, X., & Kim, T.-K. (2014). Multiple Object Tracking: A Literature Review. ArXiv:1409.7618 [Cs].

Continue reading



The Gibbs Sampler

June 04, 2017 (Update: ) 0 Comments

Gibbs sampler is an iterative algorithm that constructs a dependent sequence of parameter values whose distribution converges to the target joint posterior distribution.

Continue reading



Tensor Completion

March 07, 2019 (Update: )

Prof. YUAN Ming will give a distinguish lecture on Low Rank Tensor Methods in High Dimensional Data Analysis. To get familiar with his work on tensor, I read his paper, Yuan, M., & Zhang, C.-H. (2016). On Tensor Completion via Nuclear Norm Minimization. Foundations of Computational Mathematics, 16(4), 1031–1068., which is the topic of this post.

Continue reading



SMC for Protein Folding Problem

February 23, 2019 (Update: )

This note is based on Wong, S. W. K., Liu, J. S., & Kou, S. C. (2018). Exploring the conformational space for protein folding with sequential Monte Carlo. The Annals of Applied Statistics, 12(3), 1628–1654.

Continue reading



Select Prior by Formal Rules

March 04, 2019 (Update: )

Larry wrote that “Noninformative priors are a lost cause” in his post, LOST CAUSES IN STATISTICS II: Noninformative Priors, and he mentioned his review paper Kass and Wasserman (1996) on noninformative priors. This note is for this paper.

Continue reading



Bio-chemical Reaction Networks

February 25, 2019 (Update: )

This note is based on Loskot, P., Atitey, K., & Mihaylova, L. (2019). Comprehensive review of models and methods for inferences in bio-chemical reaction networks.

Continue reading



An Illustration of Importance Sampling

July 16, 2017 (Update: ) 0 Comments

This report shows how to use importance sampling to estimate the expectation.

Continue reading



Sequential Monte Carlo Methods

June 10, 2017 (Update: ) 0 Comments

The first peep to SMC as an abecedarian, a more comprehensive note can be found here.

Continue reading



Chain-Structured Models

September 08, 2017 (Update: ) 0 Comments

There is an important probability distribution used in many applications, the chain-structured model.

Continue reading



The Applications of Monte Carlo

September 07, 2017 (Update: ) 0 Comments

Continue reading



Growing A Polymer

July 17, 2017 (Update: ) 0 Comments

This report implements the simulation of growing a polymer under the self-avoid walk model, and summary the sequential importance sampling techniques for this problem.

Continue reading



Genetic network inference

March 14, 2017 0 Comments

There are my notes when I read the paper called Genetic network inference.

Continue reading



Systems Genetic Approach

March 16, 2017 0 Comments

There are my notes when I read the paper called System Genetic Approach.

Continue reading



MICA

March 17, 2017 0 Comments

There are my notes when I read the paper called Maximal information component analysis.

Continue reading



MINE

March 17, 2017 0 Comments

There are my notes when I read the paper called Detecting Novel Associations in Large Data Sets.

Continue reading



Implement of MINE

March 17, 2017 0 Comments

This is the implement in R of MINE.

Continue reading



Ensemble Learning

May 17, 2017 0 Comments

Continue reading



Illustrations of Support Vector Machines

May 18, 2017 0 Comments

Use the e1071 library in R to demonstrate the support vector classifier and the SVM.

Continue reading



One Parameter Models

June 04, 2017 0 Comments

Continue reading



The Normal Model

June 05, 2017 0 Comments

Continue reading



Sequential Monte Carlo samplers

June 11, 2017 0 Comments

This note is for Moral, P. D., Doucet, A., & Jasra, A. (2006). Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3), 411–436.

Continue reading



SMC for Mixture Distribution

June 11, 2017

Continue reading



ARIMA

July 11, 2017 0 Comments

Any time series without a constant mean over time is nonstationary.

Continue reading



Adaptive Importance Sampling

July 16, 2017 0 Comments

Continue reading



Model Specification

July 17, 2017 0 Comments

For a given time series, how to choose appropriate values for $p, d, q$

Continue reading



A Bayesian Missing Data Problem

July 18, 2017 0 Comments

Continue reading



Metropolis Algorithm

July 21, 2017 0 Comments

Monte Carlo plays a key role in evaluating integrals and simulating stochastic systems, and the most critical step of Monte Carlo algorithm is sampling from an appropriate probability distribution $\pi (\mathbf x)$. There are two ways to solve this problem, one is to do importance sampling, another is to produce statistically dependent samples based on the idea of Markov chain Monte Carlo sampling.

Continue reading



SMC in Biological Problems

July 22, 2017 0 Comments

Continue reading



Estimate Parameters in Logistic Regression

July 30, 2017 0 Comments

Continue reading



Poisson Regression

July 31, 2017 0 Comments

Continue reading



Story about P value

August 09, 2017 0 Comments

“The p value was never meant to be used the way it’s used today.” –Goodman

Continue reading



Conjugate Gradient for Regression

August 13, 2017 0 Comments

The conjugate gradient method is an iterative method for solving a linear system of equations, so we can use conjugate method to estimate the parameters in (linear/ridge) regression.

Continue reading



Restricted Boltzmann Machines

August 26, 2017 0 Comments

Continue reading



Dynamics of Helicobacter pylori colonization

August 31, 2017 0 Comments

This post is the notes of this paper.

Continue reading



Healthy Human Microbiome

September 01, 2017 0 Comments

This post is for The Human Microbiome Project Consortium, Huttenhower, C., Gevers, D., Knight, R., Abubucker, S., Badger, J. H., … White, O. (2012). Structure, function and diversity of the healthy human microbiome. Nature, 486(7402), 207–214.

Continue reading



Dynamics of Helicobacter pylori Infection

September 01, 2017 0 Comments

The note is for Kirschner, D. E., & Blaser, M. J. (1995). The dynamics of helicobacter pylori infection of the human stomach. Journal of Theoretical Biology, 176(2), 281–290.

Continue reading



Basic Principles of Monte Carlo

September 07, 2017 0 Comments

Continue reading



Persistence of species in the face of environmental stochasticity

September 18, 2017 0 Comments

Sebastian Schreiber gave a talk titled Persistence of species in the face of environmental stochasticity.

Continue reading



A Faster Algorithm for Repeated Linear Regression

September 21, 2017 0 Comments

Repeated Linear Regression means that repeat the fitting of linear regression for many times, and there are some common parts among these regressions.

Continue reading



An R Package: Fit Repeated Linear Regressions

September 26, 2017 0 Comments

Repeated Linear Regressions refer to a set of linear regressions in which there are several same variables.

Continue reading



Stochastic Epidemic Models

October 11, 2017 0 Comments

Discuss three different methods for formulating stochastic epidemic models.

Continue reading



Essentials of Survival Time Analysis

October 11, 2017 0 Comments

This post aims to clarify the relationship between rates and probabilities.

Continue reading



Model-Free Scoring System for Risk Prediction

October 17, 2017 0 Comments

Continue reading



Power Analysis

December 27, 2017 0 Comments

Continue reading



ECOC

August 18, 2018

The note is for Dietterich, T. and Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes, Journal of Artificial Intelligence Research 2: 263–286..

Continue reading



Gibbs in genetics

August 24, 2018

The note is for Gilks, W. R., Richardson, S., & Spiegelhalter, D. (Eds.). (1995). Markov chain Monte Carlo in practice. CRC press..

Continue reading



Evolutionary Systems Biology

December 30, 2018

The note is for Chapter 1 of Soyer, Orkun S., ed. 2012 Evolutionary Systems Biology. Advances in Experimental Medicine and Biology, 751. New York: Springer.

Continue reading



Small World inside Large Metabolic Networks

January 02, 2019

The note is for Wagner, A., & Fell, D. A. (2001). The small world inside large metabolic networks. Proceedings of the Royal Society of London B: Biological Sciences, 268(1478), 1803-1810..

Continue reading



Counting Process Based Dimension Reduction Methods for Censored Data

January 06, 2019

The note is for Sun, Q., Zhu, R., Wang, T., & Zeng, D. (2017). Counting Process Based Dimension Reduction Methods for Censored Outcomes. ArXiv:1704.05046 [Stat].

Continue reading



Reconstruct Gaussian DAG

January 09, 2019

This note is based on Yuan, Y., Shen, X., Pan, W., & Wang, Z. (2019). Constrained likelihood for reconstructing a directed acyclic Gaussian graph. Biometrika, 106(1), 109–125.

Continue reading



Reversible jump Markov chain Monte Carlo

January 10, 2019

The note is for Green, P.J. (1995). “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination”. Biometrika. 82 (4): 711–732.

Continue reading



Approximate $\ell_0$-penalized piecewise-constant estimate of graphs

January 13, 2019

This note is for Fan, Z., & Guan, L. (2018). Approximate $\ell_{0}$-penalized estimation of piecewise-constant signals on graphs. The Annals of Statistics, 46(6B), 3217–3245.

Continue reading



PLS in High-Dimensional Regression

January 15, 2019

This note is based on Cook, R. D., & Forzani, L. (2019). Partial least squares prediction in high-dimensional regression. The Annals of Statistics, 47(2), 884–908.

Continue reading



Sequential Monte Carlo Methods

January 19, 2019

This note is for Section 3 of Doucet, A., & Johansen, A. M. (2009). A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of Nonlinear Filtering, 12(656–704), 3., and it is the complement of my previous post.

Continue reading



The Kalman Filter and Extended Kalman Filter

January 21, 2019

Continue reading



Annealed SMC for Bayesian Phylogenetics

January 24, 2019

This note is for Wang, L., Wang, S., & Bouchard-Côté, A. (2018). An Annealed Sequential Monte Carlo Method for Bayesian Phylogenetics. ArXiv:1806.08813 [q-Bio, Stat].

Continue reading



Annealed Importance Sampling

January 28, 2019

This is the note for Neal, R. M. (1998). Annealed Importance Sampling. ArXiv:Physics/9803008.

Continue reading



Calculating Marginal likelihood

January 30, 2019

The note is for Fourment, M., Magee, A. F., Whidden, C., Bilge, A., Matsen IV, F. A., & Minin, V. N. (2018). 19 dubious ways to compute the marginal likelihood of a phylogenetic tree topology.

Continue reading



The First Glimpse into Pseudolikelihood

February 12, 2019

This post caught a glimpse of the pseudolikelihood.

Continue reading



Comparisons of Three Likelihood Criteria

February 12, 2019

The note is for Nelder, J. A., & Lee, Y. (1992). Likelihood, Quasi-Likelihood and Pseudolikelihood: Some Comparisons. Journal of the Royal Statistical Society. Series B (Methodological), 54(1), 273–284.

Continue reading



Identification of PE Genes in Cell Cycle

February 13, 2019

This note is based on Fan, X., Pyne, S., & Liu, J. S. (2010). Bayesian meta-analysis for identifying periodically expressed genes in fission yeast cell cycle. The Annals of Applied Statistics, 4(2), 988–1013.

Continue reading



Gibbs Sampling for the Multivariate Normal

February 13, 2019

This note is based on Chapter 7 of Hoff PD. A first course in Bayesian statistical methods. Springer Science & Business Media; 2009 Jun 2.

Continue reading



Review of Composite Likelihood

February 13, 2019

This note is based on Varin, C., Reid, N., & Firth, D. (2011). AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS. Statistica Sinica, 21(1), 5–42., a survey of recent developments in the theory and application of composite likelihood.

Continue reading



Studentized U-statistics

February 15, 2019 0 Comments

In Prof. Shao’s wonderful talk, Wandering around the Asymptotic Theory, he mentioned the Studentized U-statistics. I am interested in the derivation of the variances in the denominator.

Continue reading



Deep Learning

February 16, 2019

This note is based on LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

Continue reading



A Bayesian Perspective of Deep Learning

February 17, 2019

This note is for Polson, N. G., & Sokolov, V. (2017). Deep Learning: A Bayesian Perspective. Bayesian Analysis, 12(4), 1275–1304.

Continue reading



Presistency

February 18, 2019

The paper, Greenshtein and Ritov (2004), is recommended by Larry Wasserman in his post Consistency, Sparsistency and Presistency.

Continue reading



Restricted Isometry Property

February 19, 2019

I encounter the term RIP in Larry Wasserman’s post, RIP RIP (Restricted Isometry Property, Rest In Peace), and also find some material in Hastie et al.’s book: Statistical Learning with Sparsity about RIP.

Continue reading



Continuous Time Markov Chain

February 20, 2019

This note is based on Karl Sigman’s IEOR 6711: Continuous-Time Markov Chains.

Continue reading



Stein's Paradox

February 21, 2019

I learned Stein’s Paradox from Larry Wasserman’s post, STEIN’S PARADOX, perhaps I had encountered this term before but I cannot recall anything about it. (I am guilty)

Continue reading



Evaluate Variational Inference

March 07, 2019

A brief summary of the post, Eid ma clack shaw zupoven del ba.

Continue reading



Bernstein Bounds

March 08, 2019

I noticed that the papers of matrix/tensor completion always talk about the Bernstein inequality, then I picked the Bernstein Bounds discussed in Wainwright (2019).

Continue reading



The Correlated Topic Model

March 12, 2019

This note is for Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of Science. The Annals of Applied Statistics, 1(1), 17–35.

Continue reading



Distributed inference for quantile regression processes

March 13, 2019

This note is for Volgushev, S., Chao, S.-K., & Cheng, G. (2019). Distributed inference for quantile regression processes. The Annals of Statistics, 47(3), 1634–1662.

Continue reading



Functional Data Analysis

March 14, 2019

Continue reading



Functional Data Analysis by Matrix Completion

March 15, 2019

Continue reading



High Dimensional Covariance Matrix Estimation

March 19, 2019

Continue reading



Convergence rates of least squares

March 25, 2019

This note is for Han, Q., & Wellner, J. A. (2017). Convergence rates of least squares regression estimators with heavy-tailed errors.

Continue reading



Joint Summarized by Marginal or Conditional?

March 25, 2019

I happened to read Yixuan’s blog about a question related to the course Statistical Inference, whether two marginal distributions can determine the joint distribution. The question is adopted from Exercise 4.47 of Casella and Berger (2002).

Continue reading



FARM-Test

March 29, 2019

This note is for Fan, J., Ke, Y., Sun, Q., & Zhou, W.-X. (2017). FarmTest: Factor-Adjusted Robust Multiple Testing with Approximate False Discovery Control. ArXiv:1711.05386 [Stat]..

Continue reading



Frequentist Accuracy of Bayesian Estimates

March 31, 2019

This note is for Efron’s slide: Frequentist Accuracy of Bayesian Estimates, which is recommended by Larry’s post: Shaking the Bayesian Machine.

Continue reading



Soft Imputation in Matrix Completion

April 01, 2019

This post is based on Chapter 7 of Statistical Learning with Sparsity: The Lasso and Generalizations, and I wrote R program to reproduce the simulations to get a better understanding.

Continue reading



Coupled Minimum-Cost Flow Cell Tracking

April 02, 2019

This note is for Padfield, D., Rittscher, J., & Roysam, B. (2011). Coupled minimum-cost flow cell tracking for high-throughput quantitative analysis. Medical Image Analysis, 15(4), 650–668..

Continue reading



Wierd Things in Mixture Models

April 04, 2019

This note is based on Larry’s post, Mixture Models: The Twilight Zone of Statistics.

Continue reading



Subgradient

April 08, 2019

This post is mainly based on Hastie et al. (2015), and incorporated with some materials from Watson (1992).

Continue reading



Tracking Multiple Interacting Targets via MCMC-MRF

April 09, 2019

This note is for Khan, Z., Balch, T., & Dellaert, F. (2004). An MCMC-Based Particle Filter for Tracking Multiple Interacting Targets. In T. Pajdla & J. Matas (Eds.), Computer Vision - ECCV 2004 (pp. 279–290). Springer Berlin Heidelberg.

Continue reading



Methods for Cell Tracking

April 09, 2019

This post is for the survey paper, Meijering, E., Dzyubachyk, O., & Smal, I. (2012). Chapter nine - Methods for Cell and Particle Tracking. In P. M. conn (Ed.), Methods in Enzymology (pp. 183–200).

Continue reading



Normalizing Constant

April 10, 2019

Larry discussed the normalizing constant paradox in his blog.

Continue reading



Multiple Tracking with Rao-Blackwellized marginal particle filtering

April 10, 2019

This note is for Smal, I., Meijering, E., Draegestein, K., Galjart, N., Grigoriev, I., Akhmanova, A., … Niessen, W. (2008). Multiple object tracking in molecular bioimaging by Rao-Blackwellized marginal particle filtering. Medical Image Analysis, 12(6), 764–777.

Continue reading



Statistical Inference for Lasso

April 15, 2019

This note is based on the Chapter 6 of Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical Learning with Sparsity. 362..

Continue reading



Least Squares for SIMs

April 15, 2019

In the last lecture of STAT 5030, Prof. Lin shared one of the results in the paper, Neykov, M., Liu, J. S., & Cai, T. (2016). L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs. Journal of Machine Learning Research, 17(87), 1–37., or say the start point for the paper—the following Lemma. Because it seems that the condition and the conclusion is completely same with Sliced Inverse Regression, except for a direct interpretation—the least square regression.

Continue reading



Identifiability and Estimability

April 20, 2019

Materials from STAT 5030.

Continue reading



Self-normalized Limit Theory and Stein's Method

May 01, 2019

This note consists of the lecture material of STAT 6060 taught by Prof. Shao, four homework (indexed by “Homework”) and several personal comments (indexed by “Note”).

Continue reading



Medicine Meets AI

June 23, 2019

Last two days, I attended the conference Medicine Meets AI 2019: East Meets West, which help me know more AI from the industrial and medical perspective.

Continue reading



Bayesian Conjugate Gradient Method

June 27, 2019

This note is for Cockayne, J., Oates, C. J., Ipsen, I. C. F., & Girolami, M. (2018). A Bayesian Conjugate Gradient Method. Bayesian Analysis.

Continue reading



Global data association for MOT using network flows

July 10, 2019

This note is based on Li Zhang, Yuan Li, & Nevatia, R. (2008). Global data association for multi-object tracking using network flows. 2008 IEEE Conference on Computer Vision and Pattern Recognition, 1–8.

Continue reading



Canonical Variate Analysis

July 16, 2019

This note is based on Campbell, N. A. (1979). CANONICAL VARIATE ANALYSIS: SOME PRACTICAL ASPECTS. 243.

Continue reading



SMC-PHD Filter

July 17, 2019

This post is based on Ristic, B., Clark, D., & Vo, B. (2010). Improved SMC implementation of the PHD filter. 2010 13th International Conference on Information Fusion, 1–8.

Continue reading



Multi-estimate extraction for SMC-PHD

July 17, 2019

This post is based on Li, T., Corchado, J. M., Sun, S., & Fan, H. (2017). Multi-EAP: Extended EAP for multi-estimate extraction for SMC-PHD filter. Chinese Journal of Aeronautics, 30(1), 368–379.

Continue reading



A Optimal Control Approach for Deep Learning

July 19, 2019

This note is based on Li, Q., & Hao, S. (2018). An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks. ArXiv:1803.01299 [Cs].

Continue reading



High-dimensional linear mixed-effect model

July 21, 2019

This post is based on Li, S., Cai, T. T., & Li, H. (2019). Inference for high-dimensional linear mixed-effects models: A quasi-likelihood approach. ArXiv:1907.06116 [Stat].

Continue reading



An Adaptive Algorithm for online FDR

July 21, 2019

This post is based on Ramdas, A., Zrnic, T., Wainwright, M., & Jordan, M. (2018). SAFFRON: An adaptive algorithm for online control of the false discovery rate. ArXiv:1802.09098 [Cs, Math, Stat].

Continue reading



The Simplex Method

July 23, 2019

This note is based on Chapter 13 of Nocedal, J., & Wright, S. (2006). Numerical optimization. Springer Science & Business Media.

Continue reading



Reluctant Interaction Modeling

July 23, 2019

This note is based on Yu, G., Bien, J., & Tibshirani, R. (2019). Reluctant Interaction Modeling. ArXiv:1907.08414 [Stat].

Continue reading



Additive Bayesian Variable Selection

August 05, 2019

This post is based on Rossell, D., & Rubio, F. J. (2019). Additive Bayesian variable selection under censoring and misspecification. ArXiv:1907.13563 [Math, Stat].

Continue reading



Interior-point Method

August 16, 2019

Nocedal and Wright (2006) and Boyd and Vandenberghe (2004) present slightly different introduction on Interior-point method. More specifically, the former one only considers equality constraints, while the latter incorporates the inequality constraints.

Continue reading



Debiased Lasso

September 08, 2019

This post is based on Section 6.4 of Hastie, Trevor, Robert Tibshirani, and Martin Wainwright. “Statistical Learning with Sparsity,” 2016, 362.

Continue reading



Likelihood-free inference by ratio estimation

September 09, 2019 0 Comments

This note is for Thomas, O., Dutta, R., Corander, J., Kaski, S., & Gutmann, M. U. (2016). Likelihood-free inference by ratio estimation. ArXiv:1611.10242 [Stat]., and I got this paper from Xi’an’s blog.

Continue reading



Basic of $B$-splines

September 09, 2019 0 Comments

This note is based on de Boor, C. (1978). A Practical Guide to Splines, Springer, New York.

Continue reading



Functional PCA

September 20, 2019

This post is based on Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis (Second edition). New York, NY: Springer.

Continue reading



Multiple human tracking with RGB-D data

September 20, 2019

This note is based on the survey paper Camplani, M., Paiement, A., Mirmehdi, M., Damen, D., Hannuna, S., Burghardt, T., & Tao, L. (2016). Multiple human tracking in RGB-depth data: A survey. IET Computer Vision, 11(4), 265–285.

Continue reading



ABC for Socks

September 24, 2019 0 Comments

This post is based on Prof. Robert’s slides on JSM 2019 and an intuitive blog from Rasmus Bååth.

Continue reading



Optimality for Sparse Group Lasso

September 29, 2019

This note is based on Cai, T. T., Zhang, A., & Zhou, Y. (2019). Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference. ArXiv:1909.09851 [Cs, Math, Stat].

Continue reading



Kernel Ridgeless Regression Can Generalize

September 30, 2019

This note is based on Liang, T., & Rakhlin, A. (2018). Just Interpolate: Kernel “Ridgeless” Regression Can Generalize. ArXiv:1808.00387 [Cs, Math, Stat].

Continue reading



Sub Gaussian

October 05, 2019

This post is based on Wainwright (2019).

Continue reading



Linear Regression with Partially Shuffled Data

October 08, 2019

This post is based on Slawski, M., Diao, G., & Ben-David, E. (2019). A Pseudo-Likelihood Approach to Linear Regression with Partially Shuffled Data. ArXiv:1910.01623 [Cs, Stat].

Continue reading



Noise Outsourcing

October 10, 2019

I learnt the term Noise Outsourcing in kjytay’s blog, which is based on Teh Yee Whye’s IMS Medallion Lecture at JSM 2019.

Continue reading



Isotropic vs. Anisotropic

October 24, 2019

I came across isotropic and anisotropic covariance functions in kjytay’s blog, and then I found more materials, chapter 4 from the book Gaussian Processes for Machine Learning, via the reference in StackExchange: What is an isotropic (spherical) covariance matrix?.

Continue reading



Partial Least Squares for Functional Data

October 31, 2019

This post is based on Delaigle, A., & Hall, P. (2012). Methodology and theory for partial least squares applied to functional data. The Annals of Statistics, 40(1), 322–352.

Continue reading



Model-based Approach for Joint Analysis of Single-cell data

October 31, 2019

This post is based on Lin, Z., Zamanighomi, M., Daley, T., Ma, S., & Wong, W. H. (2020). Model-Based Approach to the Joint Analysis of Single-cell Data on Chromatin Accessibility and Gene Expression. Statistical Science, 35(1), 2–13.

Continue reading



Genetic Relatedness in High-Dimensional Linear Models

October 31, 2019

This post is based on Guo, Z., Wang, W., Cai, T. T., & Li, H. (2019). Optimal Estimation of Genetic Relatedness in High-Dimensional Linear Models. Journal of the American Statistical Association, 114(525), 358–369.

Continue reading



The Cost of Privacy

November 01, 2019

This note is based on Cai, T. T., Wang, Y., & Zhang, L. (2019). The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy. ArXiv:1902.04495 [Cs, Stat].

Continue reading



Active Contours

November 12, 2019

This post is based on Ray, N., & Acton, S. T. (2002). Active contours for cell tracking. Proceedings Fifth IEEE Southwest Symposium on Image Analysis and Interpretation, 274–278.

Continue reading



Combining $p$-values in Meta Analysis

December 04, 2019

I came across the term meta-analysis in the previous post, and I had another question about nominal size while reading the paper of the previous post, which reminds me Keith’s notes. By coincidence, I also find the topic about meta-analysis in the same notes. Hence, this post is mainly based on Keith’s notes, and reproduce the power curves by myself.

Continue reading



Fantastic Generalization Measures and Where to Find Them

December 06, 2019

The post is based on Jiang, Y., Neyshabur, B., Mobahi, H., Krishnan, D., & Bengio, S. (2019). Fantastic Generalization Measures and Where to Find Them. ArXiv:1912.02178 [Cs, Stat].which was shared by one of my friend in the WeChat Moment, and then I took a quick look.

Continue reading



Quantile Regression Forests

December 10, 2019

This post is based on Meinshausen, N. (2006). Quantile Regression Forests. 17. since a coming seminar is related to such topic.

Continue reading



Conditional Quantile Regression Forests

December 12, 2019

This note is based on the slides of the seminar, Dr. ZHU, Huichen. Conditional Quantile Random Forest.

Continue reading



Lagrange Multiplier Test

December 17, 2019

This post is based on Peter BENTLER’s talk, S.-Y. Lee’s Lagrange Multiplier Test in Structural Modeling: Still Useful? in the International Statistical Conference in Memory of Professor Sik-Yum Lee.

Continue reading



DNA copy number profiling: from bulk tissue to single cells

January 02, 2020

This post is based on the talk given by Yuchao Jiang at the 11th ICSA International Conference on Dec. 20th, 2019.

Continue reading



Concentration Inequality for Machine Learning

January 09, 2020

This post is based on the material of the first lecture of STAT6050 instructed by Prof. Wicker.

Continue reading



Classification with Imperfect Training Labels

January 15, 2020

This post is based on the talk, given by Timothy I. Cannings at the 11th ICSA International Conference on Dec. 22th, 2019, the corresponding paper is Cannings, T. I., Fan, Y., & Samworth, R. J. (2019). Classification with imperfect training labels. ArXiv:1805.11505 [Math, Stat]

Continue reading



Multiple Isotonic Regression

February 20, 2020

The first two sections are based on a good tutorial on the isotonic regression, and the third section consists of the slides for the talk given by Prof. Cun-Hui Zhang at the 11th ICSA International Conference on Dec. 21st, 2019.

Continue reading



Bernstein-von Mises Theorem

February 24, 2020

I came across the Bernstein-von Mises theorem in Yuling Yao’s blog, and I also found a quick definition in the blog hosted by Prof. Andrew Gelman, although this one is not by Gelman. By coincidence, the former is the PhD student of the latter!

Continue reading



Common Principal Components

February 28, 2020

This post is based on Flury (1984).

Continue reading



Bootstrap Sampling Distribution

March 05, 2020

This note is based on Lehmann, E. L., & Romano, J. P. (2005). Testing statistical hypotheses (3rd ed). Springer.

Continue reading



Survey on Functional Principal Component Analysis

April 25, 2020

This post is based on Shang, H. L. (2014). A survey of functional principal component analysis. AStA Advances in Statistical Analysis, 98(2), 121–142.

Continue reading



Robust Forecasting by Functional Principal Component Analysis

April 25, 2020

This post is based on Hyndman, R. J., & Shahid Ullah, Md. (2007). Robust forecasting of mortality and fertility rates: A functional data approach. Computational Statistics & Data Analysis, 51(10), 4942–4956.

Continue reading



Internal migration and transmission dynamics of tuberculosis

April 30, 2020

This post is based on Yang, C., Lu, L., Warren, J. L., Wu, J., Jiang, Q., Zuo, T., Gan, M., Liu, M., Liu, Q., DeRiemer, K., Hong, J., Shen, X., Colijn, C., Guo, X., Gao, Q., & Cohen, T. (2018). Internal migration and transmission dynamics of tuberculosis in Shanghai, China: An epidemiological, spatial, genomic analysis. The Lancet Infectious Diseases, 18(7), 788–795.

Continue reading



Survey on Time Series Change Points

May 31, 2020

This note is based on the survey paper, Aminikhanghahi, S., & Cook, D. J. (2017). A Survey of Methods for Time Series Change Point Detection. Knowledge and Information Systems, 51(2), 339–367.

Continue reading



Noise Adaptive Group Testing

July 30, 2020

This note is for Cuturi, M., Teboul, O., Berthet, Q., Doucet, A., & Vert, J.-P. (2020). Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design.

Continue reading



An objective comparison of cell-tracking algorithms

August 21, 2020

This note is for Ulman, V., Maška, M., Magnusson, K. E. G., Ronneberger, O., Haubold, C., Harder, N., Matula, P., Matula, P., Svoboda, D., Radojevic, M., Smal, I., Rohr, K., Jaldén, J., Blau, H. M., Dzyubachyk, O., Lelieveldt, B., Xiao, P., Li, Y., Cho, S.-Y., … Ortiz-de-Solorzano, C. (2017). An objective comparison of cell-tracking algorithms. Nature Methods, 14(12), 1141–1152.

Continue reading



Global Tracking via the Viterbi Algorithm

August 27, 2020

This post is for Magnusson, K. E. G., Jalden, J., Gilbert, P. M., & Blau, H. M. (2015). Global Linking of Cell Tracks Using the Viterbi Algorithm. IEEE Transactions on Medical Imaging, 34(4), 911–929.

Continue reading



Iterative Closest Point

November 07, 2020

This note is for Besl, P. J., & McKay, N. D. (1992). Method for registration of 3-D shapes. Sensor Fusion IV: Control Paradigms and Data Structures, 1611, 586–606..

Continue reading



Star Formation

March 13, 2021 0 Comments

This note is for Chapter 19 of Astronomy Today, 8th Edition.

Continue reading



Generalized Degrees of Freedom

May 03, 2021 0 Comments

This note is for Ye, J. (1998). On Measuring and Correcting the Effects of Data Mining and Model Selection. Journal of the American Statistical Association, 93(441), 120–131..

Continue reading



Multivariate Adaptive Regression Splines

May 10, 2021 0 Comments

This note is for Friedman, J. H. (1991). Multivariate Adaptive Regression Splines. The Annals of Statistics, 19(1), 1–67.

Continue reading



Hypothesis Testing on A Nuisance Parameter

May 11, 2021 0 Comments

This note is for DAVIES, R. B. (1987). Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika, 74(1), 33–43.

Continue reading



Permutation Tests and Randomization Tests

May 22, 2021 0 Comments

This note is for Hemerik, J., & Goeman, J. J. (2020). Another look at the Lady Tasting Tea and differences between permutation tests and randomization tests. International Statistical Review, insr.12431.

Continue reading



End-to-End Instance Segmentation

May 27, 2021 0 Comments

This note is for ISTR: End-to-End Instance Segmentation with Transformers.

Continue reading



Unsupervised Multi-granular Chinese Word Segmentation via Graph Partition

June 03, 2021 0 Comments

This note is for Yuan, Z., Liu, Y., Yin, Q., Li, B., Feng, X., Zhang, G., & Yu, S. (2020). Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition. Journal of Biomedical Informatics, 110, 103542.

Continue reading



Word Segmentation and Medical Concept Recognition for Chinese Medical Texts

June 10, 2021 0 Comments

This note is for Liu, Y., Tian, Y., Chang, T.-H., Wu, S., Wan, X., & Song, Y. (2021). Exploring Word Segmentation and Medical Concept Recognition for Chinese Medical Texts. Proceedings of the 20th Workshop on Biomedical Language Processing, 213–220.

Continue reading



Summarize Medical Conversations

June 10, 2021 0 Comments

This note is for Song, Y., Tian, Y., Wang, N., & Xia, F. (2020). Summarizing Medical Conversations via Identifying Important Utterances. Proceedings of the 28th International Conference on Computational Linguistics, 717–729.

Continue reading



Biomedical Named Entity Recognition

June 10, 2021 0 Comments

This note is for Tian, Y., Shen, W., Song, Y., Xia, F., He, M., & Li, K. (2020). Improving biomedical named entity recognition with syntactic information. BMC Bioinformatics, 21(1), 539.

Continue reading



Self-organized Maps of Document Collections

June 12, 2021 0 Comments

This note is for Kaski, S., Honkela, T., Lagus, K., & Kohonen, T. (1998). WEBSOM – Self-organizing maps of document collections

Continue reading



Knowledge Graph and Electronic Medical Records

June 29, 2021 0 Comments

This note covers several papers on Knowledge Graph and Electronic Medical Records.

Continue reading



Multiple Object Tracking via Minimizing Energy

July 05, 2021 0 Comments

The note is for Milan, Anton, Stefan Roth, and Konrad Schindler. “Continuous Energy Minimization for Multitarget Tracking.” IEEE Transactions on Pattern Analysis and Machine Intelligence 36, no. 1 (January 2014): 58–72.

Continue reading



Bayesian Sparse Multiple Regression

September 16, 2021 0 Comments

This note is for Chakraborty, A., Bhattacharya, A., & Mallick, B. K. (2020). Bayesian sparse multiple regression for simultaneous rank reduction and variable selection. Biometrika, 107(1), 205–221.

Continue reading



Exploring DNN via Layer-Peeled Model

September 25, 2021 0 Comments

This note is for Fang, C., He, H., Long, Q., & Su, W. J. (2021). Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training. ArXiv:2101.12699 [Cs, Math, Stat].

Continue reading



Multiple Descent of Minimum-Norm Interpolants

October 11, 2021 0 Comments

This note is for Liang, T., Rakhlin, A., & Zhai, X. (2020). On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels. ArXiv:1908.10292 [Cs, Math, Stat].

Continue reading



Benign Overfitting in Linear Regression

October 11, 2021 0 Comments

This note is for Bartlett, P. L., Long, P. M., Lugosi, G., & Tsigler, A. (2020). Benign Overfitting in Linear Regression. ArXiv:1906.11300 [Cs, Math, Stat].

Continue reading



Bayesian Leave-One-Out Cross Validation

October 20, 2021 0 Comments

This note is for Magnusson, M., Andersen, M., Jonasson, J., & Vehtari, A. (2019). Bayesian leave-one-out cross-validation for large data. Proceedings of the 36th International Conference on Machine Learning, 4244–4253.

Continue reading



Asymptotic Properties of High-Dimensional Random Forests

November 09, 2021 0 Comments

This note is Chi, C.-M., Vossler, P., Fan, Y., & Lv, J. (2021). Asymptotic Properties of High-Dimensional Random Forests. ArXiv:2004.13953 [Math, Stat]..

Continue reading



Biclustering on Gene Expression Data

November 10, 2021 0 Comments

The note is based on Padilha, V. A., & Campello, R. J. G. B. (2017). A systematic comparative evaluation of biclustering techniques. BMC Bioinformatics, 18(1), 55.

Continue reading



Debiased ML via NN for GLM

November 16, 2021 0 Comments

This is the note for Chernozhukov, V., Newey, W. K., Quintas-Martinez, V., & Syrgkanis, V. (2021). Automatic Debiased Machine Learning via Neural Nets for Generalized Linear Regression. ArXiv:2104.14737 [Econ, Math, Stat].

Continue reading



Multidimensional Monotone Bayesian Additive Regression Trees

November 17, 2021 0 Comments

This note is for Chipman, H. A., George, E. I., McCulloch, R. E., & Shively, T. S. (2021). mBART: Multidimensional Monotone BART. ArXiv:1612.01619 [Stat].

Continue reading



Causal Inference by Invariant Prediction

November 19, 2021 0 Comments

This note is for Peters, J., Bühlmann, P., & Meinshausen, N. (2016). Causal inference by using invariant prediction: Identification and confidence intervals. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(5), 947–1012.

Continue reading



Invariant Risk Minimization

November 19, 2021 0 Comments

This note is for Arjovsky, M., Bottou, L., Gulrajani, I., & Lopez-Paz, D. (2020). Invariant Risk Minimization. ArXiv:1907.02893 [Cs, Stat].

Continue reading



Regularization-Free Principal Curves

November 21, 2021

The note is for Gerber, S., & Whitaker, R. (2013). Regularization-Free Principal Curve Estimation. 18.

Continue reading



Probabilistic Principal Curves

November 22, 2021

This note is for Chang, K.-Y., & Ghosh, J. (2001). A unified model for probabilistic principal surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(1), 22–41., but only involves the principal curves.

Continue reading



Review on Random Matrix Theory

December 01, 2021

This note is for Paul, D., & Aue, A. (2014). Random matrix theory in statistics: A review. Journal of Statistical Planning and Inference, 150, 1–29.

Continue reading



Asymptotics of Cross Validation

December 03, 2021

This note is for Austern, M., & Zhou, W. (2020). Asymptotics of Cross-Validation. ArXiv:2001.11111 [Math, Stat].

Continue reading



Additive Model with Linear Smoother

December 07, 2021

This note is for Buja, A., Hastie, T., & Tibshirani, R. (1989). Linear Smoothers and Additive Models. The Annals of Statistics, 17(2), 453–510. JSTOR.

Continue reading



Gaussian Processes for Regression

December 13, 2021

This note is for Chapter 4 of Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. MIT Press.

Continue reading



Generalizing Ridge Regression

December 14, 2021

This note is for Chapter 3 of van Wieringen, W. N. (2021). Lecture notes on ridge regression. ArXiv:1509.09169 [Stat].

Continue reading



Empirical Bayes

January 16, 2022

This note is based on Sec. 4.6 of Lehmann, E. L., & Casella, G. (1998). Theory of point estimation (2nd ed). Springer.

Continue reading



Neuronized Priors for Bayesian Sparse Linear Regression

January 16, 2022

This note is for Shin, M., & Liu, J. S. (2021). Neuronized Priors for Bayesian Sparse Linear Regression. Journal of the American Statistical Association, 1–16.

Continue reading



Leave-one-out CV for Lasso

March 14, 2022

This note is for Homrighausen, D., & McDonald, D. J. (2013). Leave-one-out cross-validation is risk consistent for lasso. ArXiv:1206.6128 [Math, Stat].

Continue reading



Applications with Scale Parameters

March 22, 2022

This note contains several papers related to scale parameter.

Continue reading



Equivariance

March 22, 2022

This post is for Chapter 3 of Lehmann, E. L., & Casella, G. (1998). Theory of point estimation (2nd ed). Springer.

Continue reading



Prediction Risk for the Horseshoe Regression

March 24, 2022

The note is for Bhadra, A., Datta, J., Li, Y., Polson, N. G., & Willard, B. (2019). Prediction Risk for the Horseshoe Regression. 39.

Continue reading



Scale Mixture Models

March 25, 2022

This note is for scale mixture models.

Continue reading



Mixture of Location-Scale Families

March 25, 2022

This note is for Chen, J., Li, P., & Liu, G. (2020). Homogeneity testing under finite location-scale mixtures. Canadian Journal of Statistics, 48(4), 670–684.

Continue reading



Adaptive Ridge Estimate

March 30, 2022

This note is for Grandvalet, Y. (1998). Least Absolute Shrinkage is Equivalent to Quadratic Penalization. In L. Niklasson, M. Bodén, & T. Ziemke (Eds.), ICANN 98 (pp. 201–206). Springer London.

Continue reading



Big Data Paradox

April 07, 2022

This note is for Meng, X.-L. (2018). Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. The Annals of Applied Statistics, 12(2).

Continue reading



Test of Monotonicity

April 20, 2022

This note is for Chetverikov, D. (2019). TESTING REGRESSION MONOTONICITY IN ECONOMETRIC MODELS. Econometric Theory, 35(4), 729–776.

Continue reading



Monotonicity in Asset Returns

April 20, 2022

This note is for Patton, A. J., & Timmermann, A. (2010). Monotonicity in asset returns: New tests with applications to the term structure, the CAPM, and portfolio sorts. Journal of Financial Economics, 98(3), 605–625.

Continue reading



Test of Monotonicity by U-processes

April 23, 2022

This note is for Ghosal, S., Sen, A., & van der Vaart, A. W. (2000). Testing Monotonicity of Regression. The Annals of Statistics, 28(4), 1054–1082.

Continue reading



Test of Monotonicity and Convexity by Splines

April 23, 2022

This note is for Wang, J. C., & Meyer, M. C. (2011). Testing the monotonicity or convexity of a function using regression splines. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 39(1), 89–107.

Continue reading



Monotone Multi-Layer Perceptron

July 04, 2022

This note is for monotonic Multi-Layer Perceptron Neural network, and the references are from the R package monmlp.

Continue reading



Review on Multi-omics Data

July 14, 2022

This note is based on Subramanian, I., Verma, S., Kumar, S., Jere, A., & Anamika, K. (2020). Multi-omics Data Integration, Interpretation, and Its Application. Bioinformatics and Biology Insights, 14, 1177932219899051.

Continue reading



Fitting to Future Observations

July 21, 2022

This note is for Jiang, Y., & Liu, C. (2022). Estimation of Over-parameterized Models via Fitting to Future Observations (arXiv:2206.01824). arXiv.

Continue reading



Debiased Inverse-Variance Weighted Estimator in Mendelian Randomization

September 20, 2022

This post is for the talk at Yale given by Prof. Ting Ye based on the paper Ye, T., Shao, J., & Kang, H. (2020). Debiased Inverse-Variance Weighted Estimator in Two-Sample Summary-Data Mendelian Randomization (arXiv:1911.09802). arXiv.

Continue reading



Contrastive Learning: A Simple Framework and A Theoretical Analysis

October 06, 2022

This note is based on

Continue reading



Single-cell Graph Neural Network

October 08, 2022

This note is for Prof. Dong Xu’s talk on Wang, J., Ma, A., Chang, Y., Gong, J., Jiang, Y., Qi, R., Wang, C., Fu, H., Ma, Q., & Xu, D. (2021). ScGNN is a novel graph neural network framework for single-cell RNA-Seq analyses. Nature Communications, 12(1), Article 1.

Continue reading



ADAM and AMSGrad for Stochastic Optimization

October 09, 2022

This post is based on

Continue reading



scDesign3: A Single-cell Simulator

October 10, 2022

This note is based on Jingyi Jessica Li’s talk on Song, D., Wang, Q., Yan, G., Liu, T., & Li, J. J. (2022). A unified framework of realistic in silico data generation and statistical model inference for single-cell and spatial omics (p. 2022.09.20.508796). bioRxiv.

Continue reading



Simultaneous Estimation of Cell Type Proportions and Cell Type-specific Gene Expressions

October 12, 2022

This note is for Tang, D., Park, S., & Zhao, H. (2022). SCADIE: Simultaneous estimation of cell type proportions and cell type-specific gene expressions using SCAD-based iterative estimating procedure. Genome Biology, 23(1), 129.

Continue reading



Bayesian Hierarchical Varying-Sparsity Regression Models with Application to Cancer Proteogenomics.

October 29, 2022

This note is for Ni, Y., Stingo, F. C., Ha, M. J., Akbani, R., & Baladandayuthapani, V. (2019). Bayesian Hierarchical Varying-Sparsity Regression Models with Application to Cancer Proteogenomics. Journal of the American Statistical Association, 114(525), 48–60.

Continue reading



Integrative Bayesian Analysis of High-dimensional Multiplatform Genomics Data

October 30, 2022

This note is for Wang, W., Baladandayuthapani, V., Morris, J. S., Broom, B. M., Manyam, G., & Do, K.-A. (2013). iBAG: Integrative Bayesian analysis of high-dimensional multiplatform genomics data. Bioinformatics, 29(2), 149–159.

Continue reading



Joint Bayesian Variable and DAG Selection

October 31, 2022

This note is for Cao, X., & Lee, K. (2021). Joint Bayesian Variable and DAG Selection Consistency for High-dimensional Regression Models with Network-structured Covariates. Statistica Sinica.

Continue reading



First Glance at KEGGgraph

November 21, 2022

This post is based on

Continue reading



Tutorial on Polygenic Risk Score

January 24, 2023

This note is based on Choi, S. W., Mak, T. S.-H., & O’Reilly, P. F. (2020). Tutorial: A guide to performing polygenic risk score analyses. Nature Protocols, 15(9), Article 9.

Continue reading



Predictive Degrees of Freedom

February 10, 2023

This note is for Luan, B., Lee, Y., & Zhu, Y. (2021). Predictive Model Degrees of Freedom in Linear Regression. ArXiv:2106.15682 [Math].

Continue reading



Model Selection for Cox Models with Time-Varying Coefficients

March 28, 2023

This note is for Yan, J., & Huang, J. (2012). Model Selection for Cox Models with Time-Varying Coefficients. Biometrics, 68(2), 419–428.

Continue reading



Cox Models with Time-Varying Covariates vs Time-Varying Coefficients

March 28, 2023

This note is for Zhang, Z., Reinikainen, J., Adeleke, K. A., Pieterse, M. E., & Groothuis-Oudshoorn, C. G. M. (2018). Time-varying covariates and coefficients in Cox regression models. Annals of Translational Medicine, 6(7), 121.

Continue reading



Age-dependency of PRS for Prostate Cancer

April 21, 2023

This note is for Schaid, D. J., Sinnwell, J. P., Batzler, A., & McDonnell, S. K. (2022). Polygenic risk for prostate cancer: Decreasing relative risk with age but little impact on absolute risk. American Journal of Human Genetics, 109(5), 900–908.

Continue reading



C-index for Time-varying Risk

May 05, 2023

This post is for Gandy, A., & Matcham, T. J. (2022). On concordance indices for models with time-varying risk (arXiv:2208.03213). arXiv.

Continue reading



Deep Generative Modeling for Single-cell Transcriptomics

June 29, 2023

The post is for Lopez, R., Regier, J., Cole, M. B., Jordan, M. I., & Yosef, N. (2018). Deep generative modeling for single-cell transcriptomics. Nature Methods, 15(12), Article 12.

Continue reading



Single Cell Generative Pre-trained Transformer

June 30, 2023

This post is for Cui, H., Wang, C., Maan, H., & Wang, B. (2023). scGPT: Towards Building a Foundation Model for Single-cell Multi-omics Using Generative AI (p. 2023.04.30.538439). bioRxiv.

Continue reading



XGBoost for IPF Biomarker

July 10, 2023

This post is for Fanidis, D., Pezoulas, V. C., Fotiadis, D. Ι., & Aidinis, V. (2023). An explainable machine learning-driven proposal of pulmonary fibrosis biomarkers. Computational and Structural Biotechnology Journal, 21, 2305–2315.

Continue reading



Cluster Analysis of Transcriptomic Datasets of IPF

July 10, 2023

Kraven, L. M., Taylor, A. R., Molyneaux, P. L., Maher, T. M., McDonough, J. E., Mura, M., Yang, I. V., Schwartz, D. A., Huang, Y., Noth, I., Ma, S. F., Yeo, A. J., Fahy, W. A., Jenkins, R. G., & Wain, L. V. (2023). Cluster analysis of transcriptomic datasets to identify endotypes of idiopathic pulmonary fibrosis. Thorax, 78(6), 551–558.

Continue reading



Cell type-specific and disease-associated eQTL in the human lung

July 13, 2023

This post is for Natri, H. M., Azodi, C. B. D., Peter, L., Taylor, C. J., Chugh, S., Kendle, R., Chung, M., Flaherty, D. K., Matlock, B. K., Calvi, C. L., Blackwell, T. S., Ware, L. B., Bacchetta, M., Walia, R., Shaver, C. M., Kropski, J. A., McCarthy, D. J., & Banovich, N. E. (2023). Cell type-specific and disease-associated eQTL in the human lung (p. 2023.03.17.533161). bioRxiv.

Continue reading



scMDC: Single-cell Multi-omics Data Clustering Analysis

July 27, 2023

This post is for Lin, X., Tian, T., Wei, Z., & Hakonarson, H. (2022). Clustering of single-cell multi-omics data with a multimodal deep learning method. Nature Communications, 13(1), Article 1.

Continue reading



PseudotimeDE: Differential Gene Expression along Cell Pseudotime

July 27, 2023

The note is for Song, D., & Li, J. J. (2021). PseudotimeDE: Inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data. Genome Biology, 22(1), 124.

Continue reading



tradeSeq: Trajectory-based differential expression analysis for single-cell sequencing data

July 31, 2023

This post is for Van den Berge, K., Roux de Bézieux, H., Street, K., Saelens, W., Cannoodt, R., Saeys, Y., Dudoit, S., & Clement, L. (2020). Trajectory-based differential expression analysis for single-cell sequencing data. Nature Communications, 11(1), Article 1.

Continue reading



Six Statistical Senses

August 28, 2023

This note is for Craiu, R. V., Gong, R., & Meng, X.-L. (2023). Six Statistical Senses. Annual Review of Statistics and Its Application, 10(1), 699–725.

Continue reading



In-Context Learning via Transformers

September 14, 2023

This note is for Garg, S., Tsipras, D., Liang, P., & Valiant, G. (2023). What Can Transformers Learn In-Context? A Case Study of Simple Function Classes (arXiv:2208.01066). arXiv.

Continue reading



condiments: Trajectory Inference across Multiple Conditions

September 14, 2023

The note is for Van den Berge, K., Roux de Bézieux, H., Street, K., Saelens, W., Cannoodt, R., Saeys, Y., Dudoit, S., & Clement, L. (2020). Trajectory-based differential expression analysis for single-cell sequencing data. Nature Communications, 11(1), Article 1.

Continue reading



Confidence Intervals of Smoothed Isotonic Regression

September 21, 2023

This note is for Groeneboom, P., & Jongbloed, G. (2023). Confidence intervals in monotone regression (arXiv:2303.17988). arXiv.

Continue reading



Fast and Flexible methods for monotone polynomial fitting

September 21, 2023

This note is for Murray, K., Müller, S., & Turlach, B. (2016). Fast and flexible methods for monotone polynomial fitting. Journal of Statistical Computation and Simulation, 86, 1–21.

Continue reading



Shape-Constrained Estimation Using Nonnegative Splines

September 21, 2023

This note is for Papp, D., & Alizadeh, F. (2014). Shape-Constrained Estimation Using Nonnegative Splines. Journal of Computational and Graphical Statistics, 23(1), 211–231.

Continue reading



An Iterative Procedure for Shape-constrained Smoothing using Smoothing Splines

September 21, 2023

This note is for Turlach, B. A. (2005). Shape constrained smoothing using smoothing splines. Computational Statistics, 20(1), 81–104.

Continue reading



Constrained Smoothing and Out-of-range Prediction using P-splines

September 22, 2023

This note is for Navarro-García, M., Guerrero, V., & Durban, M. (2023). On constrained smoothing and out-of-range prediction using P-splines: A conic optimization approach. Applied Mathematics and Computation, 441, 127679.

Continue reading



scHOT: Investigate higher-order interactions in single-cell data

October 13, 2023

This note is for Ghazanfar, Shila, Yingxin Lin, Xianbin Su, David Ming Lin, Ellis Patrick, Ze-Guang Han, John C. Marioni, and Jean Yee Hwa Yang. “Investigating Higher-Order Interactions in Single-cell Data with scHOT.” Nature Methods 17, no. 8 (August 2020): 799–806.

Continue reading



Consistent Probabilities along GO Structure

November 16, 2023

This note is for Obozinski, Guillaume, Gert Lanckriet, Charles Grant, Michael I. Jordan, and William Stafford Noble. “Consistent Probabilistic Outputs for Protein Function Prediction.” Genome Biology 9 Suppl 1, no. Suppl 1 (2008): S6.

Continue reading



Hierarchical Multi-Label Classification

November 20, 2023

This post is for two papers on Hierarchical multi-label classification (HMC), which imposes a hierarchy constraint on the classes.

Continue reading



Hierarchical Multi-label Contrastive Learning

November 25, 2023

This post is for Zhang, Shu, Ran Xu, Caiming Xiong, and Chetan Ramaiah. “Use All the Labels: A Hierarchical Multi-Label Contrastive Learning Framework,” 16660–69, 2022.

Continue reading



Approximation to Log-likelihood of Nonlinear Mixed-effects Model

November 26, 2023

This post is for Pinheiro, José C., and Douglas M. Bates. “Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model.” Journal of Computational and Graphical Statistics 4, no. 1 (1995): 12–35.

Continue reading



ClusterDE: a post-clustering DE method

December 04, 2023

This post is for Song, Dongyuan, Kexin Li, Xinzhou Ge, and Jingyi Jessica Li. “ClusterDE: A Post-Clustering Differential Expression (DE) Method Robust to False-Positive Inflation Caused by Double Dipping,” 2023

Continue reading



Uncertainty of Pseudotime Trajectory

December 04, 2023

This post is for Tenha, Lovemore, and Mingzhou Song. “Statistical Evidence for the Presence of Trajectory in Single-cell Data.” BMC Bioinformatics 23, no. Suppl 8 (August 16, 2022): 340.

Continue reading



Exact Post-Selection Inference for Sequential Regression Procedures

January 19, 2024

This post is for Tibshirani, R. J., Taylor, J., Lockhart, R., & Tibshirani, R. (2016). Exact Post-Selection Inference for Sequential Regression Procedures. Journal of the American Statistical Association, 111(514), 600–620.

Continue reading



Statistical Learning and Selective Inference

January 19, 2024

This post is for Taylor, J., & Tibshirani, R. J. (2015). Statistical learning and selective inference. Proceedings of the National Academy of Sciences of the United States of America, 112(25), 7629–7634.

Continue reading



SuSiE: Sum of Single Effects Model

January 22, 2024

This note is for Wang, G., Sarkar, A., Carbonetto, P., & Stephens, M. (2020). A Simple New Approach to Variable Selection in Regression, with Application to Genetic Fine Mapping. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(5), 1273–1300.

Continue reading



Fine-mapping from Summary Data with SuSiE

January 22, 2024

This post is for Zou, Y., Carbonetto, P., Wang, G., & Stephens, M. (2022). Fine-mapping from summary data with the “Sum of Single Effects” model. PLOS Genetics, 18(7), e1010299.

Continue reading



t-Test for Mixture Normal Data

January 23, 2024

The post is for Lee, A. F. S., & Gurland, J. (1977). One-Sample t-Test When Sampling from a Mixture of Normal Distributions. The Annals of Statistics, 5(4), 803–807.

Continue reading



Edgeworth Expansion

January 24, 2024

This note is based on Shao, J. (2003). Mathematical statistics (2nd ed). Springer. and Hwang, J. (2019). Note on Edgeworth Expansions and Asymptotic Refinements of Percentile t-Bootstrap Methods. Bootstrap Methods.

Continue reading



Contrasting Genetic Architectures using Fast Variance Components Analysis

February 07, 2024

This note is for Loh, P.-R., Bhatia, G., Gusev, A., Finucane, H. K., Bulik-Sullivan, B. K., Pollack, S. J., de Candia, T. R., Lee, S. H., Wray, N. R., Kendler, K. S., O’Donovan, M. C., Neale, B. M., Patterson, N., & Price, A. L. (2015). Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance components analysis. Nature Genetics, 47(12), 1385–1392.

Continue reading



Selective Inference for Hierarchical Clustering

February 08, 2024

This note is for Gao, L. L., Bien, J., & Witten, D. (2022). Selective Inference for Hierarchical Clustering (arXiv:2012.02936). arXiv.

Continue reading



Bipartitle eQTL Network Construction

February 08, 2024

This post is for Gaynor, S. M., Fagny, M., Lin, X., Platig, J., & Quackenbush, J. (2022). Connectivity in eQTL networks dictates reproducibility and genomic properties. Cell Reports Methods, 2(5), 100218.

Continue reading



Post-clustering Inference under Dependency

February 08, 2024

This post is for González-Delgado, J., Cortés, J., & Neuvial, P. (2023). Post-clustering Inference under Dependency (arXiv:2310.11822). arXiv.

Continue reading



BLiP: Bayesian Linear Programming

February 09, 2024

The note is for Spector, A., & Janson, L. (2023). Controlled Discovery and Localization of Signals via Bayesian Linear Programming (arXiv:2203.17208). arXiv.

Continue reading



Comparisons of transformations for single-cell RNA-seq data

March 26, 2024

This post is for Ahlmann-Eltze, C., & Huber, W. (2023). Comparison of transformations for single-cell RNA-seq data. Nature Methods, 20(5), 665–672.

Continue reading



Selective Inference for K-means

April 12, 2024

This note is for Chen, Y. T., & Witten, D. M. (2022). Selective inference for k-means clustering (arXiv:2203.15267). arXiv.

Continue reading



Test Difference for A Single Feature

April 12, 2024 0 Comments

This note is for Chen, Y. T., & Gao, L. L. (2023). Testing for a difference in means of a single feature after clustering (arXiv:2311.16375). arXiv.

Continue reading



Conditional Independence Test in Single-cell Multiomics

April 17, 2024 0 Comments

This note is for Boyeau, P., Bates, S., Ergen, C., Jordan, M. I., & Yosef, N. (2023). Calibrated Identification of Feature Dependencies in Single-cell Multiomics.

Continue reading



Niche DE

April 30, 2024

This note is for Mason, K., Sathe, A., Hess, P. R., Rong, J., Wu, C.-Y., Furth, E., Susztak, K., Levinsohn, J., Ji, H. P., & Zhang, N. (2024). Niche-DE: Niche-differential gene expression analysis in spatial transcriptomics data identifies context-dependent cell-cell interactions. Genome Biology, 25(1), 14.

Continue reading



GhostKnockoffs: Only Summary Statistics

May 23, 2024 0 Comments

This note is for Chen, Z., He, Z., Chu, B. B., Gu, J., Morrison, T., Sabatti, C., & Candès, E. (2024). Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression (arXiv:2402.12724). arXiv.

Continue reading



Conformal Prediction for Single-cell Spatial Transcriptomics

June 07, 2024 0 Comments

This note is for Sun, E. D., Ma, R., Navarro Negredo, P., Brunet, A., & Zou, J. (2024). TISSUE: Uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses. Nature Methods, 21(3), 444–454.

Continue reading



Data Thinning for Convolution-Closed Distributions

August 29, 2024 0 Comments

This note is for Neufeld, A., Dharamshi, A., Gao, L. L., & Witten, D. (2024). Data Thinning for Convolution-Closed Distributions. Journal of Machine Learning Research, 25(57), 1–35.

Continue reading



XBART: Accelerated Bayesian Additive Regression Trees

October 04, 2024

This post is based on He, J., Yalov, S., & Hahn, P. R. (2019). XBART: Accelerated Bayesian Additive Regression Trees. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, 1130–1138. https://proceedings.mlr.press/v89/he19a.html and He, J., & Hahn, P. R. (2023). Stochastic Tree Ensembles for Regularized Nonlinear Regression. Journal of the American Statistical Association, 118(541), 551–570. https://doi.org/10.1080/01621459.2021.1942012

Continue reading



spaCRT: saddlepoint approximation-based conditional randomization test

November 04, 2024

This note is for Niu, Z., Choudhury, J. R., & Katsevich, E. (2024). Computationally efficient and statistically accurate conditional independence testing with spaCRT (No. arXiv:2407.08911; Version 1). arXiv. https://doi.org/10.48550/arXiv.2407.08911

Continue reading



See all posts →