WeiYa's Work Yard

A dog, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

ClusterDE: a post-clustering DE method

December 04, 2023

This post is for Song, Dongyuan, Kexin Li, Xinzhou Ge, and Jingyi Jessica Li. “ClusterDE: A Post-Clustering Differential Expression (DE) Method Robust to False-Positive Inflation Caused by Double Dipping,” 2023

Continue reading



Approximation to Log-likelihood of Nonlinear Mixed-effects Model

November 26, 2023

This post is for Pinheiro, José C., and Douglas M. Bates. “Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model.” Journal of Computational and Graphical Statistics 4, no. 1 (1995): 12–35.

Continue reading



Hierarchical Multi-label Contrastive Learning

November 25, 2023

This post is for Zhang, Shu, Ran Xu, Caiming Xiong, and Chetan Ramaiah. “Use All the Labels: A Hierarchical Multi-Label Contrastive Learning Framework,” 16660–69, 2022.

Continue reading



Hierarchical Multi-Label Classification

November 20, 2023

This post is for two papers on Hierarchical multi-label classification (HMC), which imposes a hierarchy constraint on the classes.

Continue reading



Consistent Probabilities along GO Structure

November 16, 2023

This note is for Obozinski, Guillaume, Gert Lanckriet, Charles Grant, Michael I. Jordan, and William Stafford Noble. “Consistent Probabilistic Outputs for Protein Function Prediction.” Genome Biology 9 Suppl 1, no. Suppl 1 (2008): S6.

Continue reading



scHOT: Investigate higher-order interactions in single-cell data

October 13, 2023

This note is for Ghazanfar, Shila, Yingxin Lin, Xianbin Su, David Ming Lin, Ellis Patrick, Ze-Guang Han, John C. Marioni, and Jean Yee Hwa Yang. “Investigating Higher-Order Interactions in Single-Cell Data with scHOT.” Nature Methods 17, no. 8 (August 2020): 799–806.

Continue reading



Constrained Smoothing and Out-of-range Prediction using P-splines

September 22, 2023

This note is for Navarro-García, M., Guerrero, V., & Durban, M. (2023). On constrained smoothing and out-of-range prediction using P-splines: A conic optimization approach. Applied Mathematics and Computation, 441, 127679.

Continue reading



An Iterative Procedure for Shape-constrained Smoothing using Smoothing Splines

September 21, 2023

This note is for Turlach, B. A. (2005). Shape constrained smoothing using smoothing splines. Computational Statistics, 20(1), 81–104.

Continue reading



Shape-Constrained Estimation Using Nonnegative Splines

September 21, 2023

This note is for Papp, D., & Alizadeh, F. (2014). Shape-Constrained Estimation Using Nonnegative Splines. Journal of Computational and Graphical Statistics, 23(1), 211–231.

Continue reading



Fast and Flexible methods for monotone polynomial fitting

September 21, 2023

This note is for Murray, K., Müller, S., & Turlach, B. (2016). Fast and flexible methods for monotone polynomial fitting. Journal of Statistical Computation and Simulation, 86, 1–21.

Continue reading



Confidence Intervals of Smoothed Isotonic Regression

September 21, 2023

This note is for Groeneboom, P., & Jongbloed, G. (2023). Confidence intervals in monotone regression (arXiv:2303.17988). arXiv.

Continue reading



Lamian: Differential Pseudotime Analysis

September 14, 2023

This note is for Hou, W., Ji, Z., Chen, Z., Wherry, E. J., Hicks, S. C., & Ji, H. (2021). A statistical framework for differential pseudotime analysis with multiple single-cell RNA-seq samples (p. 2021.07.10.451910). bioRxiv.

Continue reading



condiments: Trajectory Inference across Multiple Conditions

September 14, 2023

The note is for Van den Berge, K., Roux de Bézieux, H., Street, K., Saelens, W., Cannoodt, R., Saeys, Y., Dudoit, S., & Clement, L. (2020). Trajectory-based differential expression analysis for single-cell sequencing data. Nature Communications, 11(1), Article 1.

Continue reading



In-Context Learning via Transformers

September 14, 2023

This note is for Garg, S., Tsipras, D., Liang, P., & Valiant, G. (2023). What Can Transformers Learn In-Context? A Case Study of Simple Function Classes (arXiv:2208.01066). arXiv.

Continue reading



Six Statistical Senses

August 28, 2023

This note is for Craiu, R. V., Gong, R., & Meng, X.-L. (2023). Six Statistical Senses. Annual Review of Statistics and Its Application, 10(1), 699–725.

Continue reading



tradeSeq: Trajectory-based differential expression analysis for single-cell sequencing data

July 31, 2023

This post is for Van den Berge, K., Roux de Bézieux, H., Street, K., Saelens, W., Cannoodt, R., Saeys, Y., Dudoit, S., & Clement, L. (2020). Trajectory-based differential expression analysis for single-cell sequencing data. Nature Communications, 11(1), Article 1.

Continue reading



PseudotimeDE: Differential Gene Expression along Cell Pseudotime

July 27, 2023

The note is for Song, D., & Li, J. J. (2021). PseudotimeDE: Inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data. Genome Biology, 22(1), 124.

Continue reading



scMDC: Single-Cell Multi-omics Data Clustering Analysis

July 27, 2023

This post is for Lin, X., Tian, T., Wei, Z., & Hakonarson, H. (2022). Clustering of single-cell multi-omics data with a multimodal deep learning method. Nature Communications, 13(1), Article 1.

Continue reading



Benchmarking Algorithms for Gene Regulatory Network Inference

July 14, 2023 (Update: )

This note is for Pratapa, A., Jalihal, A. P., Law, J. N., Bharadwaj, A., & Murali, T. M. (2020). Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nature Methods, 17(2), Article 2.

Continue reading



Cell type-specific and disease-associated eQTL in the human lung

July 13, 2023

This post is for Natri, H. M., Azodi, C. B. D., Peter, L., Taylor, C. J., Chugh, S., Kendle, R., Chung, M., Flaherty, D. K., Matlock, B. K., Calvi, C. L., Blackwell, T. S., Ware, L. B., Bacchetta, M., Walia, R., Shaver, C. M., Kropski, J. A., McCarthy, D. J., & Banovich, N. E. (2023). Cell type-specific and disease-associated eQTL in the human lung (p. 2023.03.17.533161). bioRxiv.

Continue reading



Cluster Analysis of Transcriptomic Datasets of IPF

July 10, 2023

Kraven, L. M., Taylor, A. R., Molyneaux, P. L., Maher, T. M., McDonough, J. E., Mura, M., Yang, I. V., Schwartz, D. A., Huang, Y., Noth, I., Ma, S. F., Yeo, A. J., Fahy, W. A., Jenkins, R. G., & Wain, L. V. (2023). Cluster analysis of transcriptomic datasets to identify endotypes of idiopathic pulmonary fibrosis. Thorax, 78(6), 551–558.

Continue reading



XGBoost for IPF Biomarker

July 10, 2023

This post is for Fanidis, D., Pezoulas, V. C., Fotiadis, D. Ι., & Aidinis, V. (2023). An explainable machine learning-driven proposal of pulmonary fibrosis biomarkers. Computational and Structural Biotechnology Journal, 21, 2305–2315.

Continue reading



Single Cell Generative Pre-trained Transformer

June 30, 2023

This post is for Cui, H., Wang, C., Maan, H., & Wang, B. (2023). scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI (p. 2023.04.30.538439). bioRxiv.

Continue reading



Deep Generative Modeling for Single-cell Transcriptomics

June 29, 2023

The post is for Lopez, R., Regier, J., Cole, M. B., Jordan, M. I., & Yosef, N. (2018). Deep generative modeling for single-cell transcriptomics. Nature Methods, 15(12), Article 12.

Continue reading



Time-varying Group Sparse Additive Model for GWAS

June 11, 2023 (Update: )

This post is for Marchetti-Bowick, M., Yin, J., Howrylak, J. A., & Xing, E. P. (2016). A time-varying group sparse additive model for genome-wide association studies of dynamic complex traits. Bioinformatics, 32(19), 2903–2910.

Continue reading



fGWAS: Dynamic Model for GWAS

June 11, 2023 (Update: )

The note is for Das, K., Li, J., Wang, Z., Tong, C., Fu, G., Li, Y., Xu, M., Ahn, K., Mauger, D., Li, R., & Wu, R. (2011). A dynamic model for genome-wide association studies. Human Genetics, 129(6), 629–639.

Continue reading



GWAS of Longitudinal Trajectories at Biobank Scale

June 11, 2023 (Update: )

This post is for Ko, S., German, C. A., Jensen, A., Shen, J., Wang, A., Mehrotra, D. V., Sun, Y. V., Sinsheimer, J. S., Zhou, H., & Zhou, J. J. (2022). GWAS of longitudinal trajectories at biobank scale. The American Journal of Human Genetics, 109(3), 433–445.

Continue reading



C-index for Time-varying Risk

May 05, 2023

This post is for Gandy, A., & Matcham, T. J. (2022). On concordance indices for models with time-varying risk (arXiv:2208.03213). arXiv.

Continue reading



Age-dependency of PRS for Prostate Cancer

April 21, 2023

This note is for Schaid, D. J., Sinnwell, J. P., Batzler, A., & McDonnell, S. K. (2022). Polygenic risk for prostate cancer: Decreasing relative risk with age but little impact on absolute risk. American Journal of Human Genetics, 109(5), 900–908.

Continue reading



Cox Models with Time-Varying Covariates vs Time-Varying Coefficients

March 28, 2023

This note is for Zhang, Z., Reinikainen, J., Adeleke, K. A., Pieterse, M. E., & Groothuis-Oudshoorn, C. G. M. (2018). Time-varying covariates and coefficients in Cox regression models. Annals of Translational Medicine, 6(7), 121.

Continue reading



Model Selection for Cox Models with Time-Varying Coefficients

March 28, 2023

This note is for Yan, J., & Huang, J. (2012). Model Selection for Cox Models with Time-Varying Coefficients. Biometrics, 68(2), 419–428.

Continue reading



Predictive Degrees of Freedom

February 10, 2023

This note is for Luan, B., Lee, Y., & Zhu, Y. (2021). Predictive Model Degrees of Freedom in Linear Regression. ArXiv:2106.15682 [Math].

Continue reading



Tutorial on Polygenic Risk Score

January 24, 2023

This note is based on Choi, S. W., Mak, T. S.-H., & O’Reilly, P. F. (2020). Tutorial: A guide to performing polygenic risk score analyses. Nature Protocols, 15(9), Article 9.

Continue reading



Similarity Network Fusion

December 28, 2022 (Update: )

This post is for Wang, B., Mezlini, A. M., Demir, F., Fiume, M., Tu, Z., Brudno, M., Haibe-Kains, B., & Goldenberg, A. (2014). Similarity network fusion for aggregating data types on a genomic scale. Nature Methods, 11(3), Article 3. and a related paper Ruan, P., Wang, Y., Shen, R., & Wang, S. (2019). Using association signal annotations to boost similarity network fusion. Bioinformatics, 35(19), 3718–3726.

Continue reading



LD Score Regression

December 15, 2022 (Update: )

This note is for Bulik-Sullivan, B. K., Loh, P.-R., Finucane, H. K., Ripke, S., Yang, J., Patterson, N., Daly, M. J., Price, A. L., & Neale, B. M. (2015). LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics, 47(3), 291–295.

Continue reading



First Glance at KEGGgraph

November 21, 2022

This post is based on

Continue reading



Joint Local False Discovery Rate in GWAS

November 12, 2022 (Update: )

This note is for Jiang, W., & Yu, W. (2017). Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies. Bioinformatics, 33(4), 500–507.

Continue reading



Differentiable Sorting and Ranking

November 04, 2022 (Update: )

This note is for Blondel, M., Teboul, O., Berthet, Q., & Djolonga, J. (2020). Fast Differentiable Sorting and Ranking (arXiv:2002.08871). arXiv.

Continue reading



Joint Bayesian Variable and DAG Selection

October 31, 2022

This note is for Cao, X., & Lee, K. (2021). Joint Bayesian Variable and DAG Selection Consistency for High-dimensional Regression Models with Network-structured Covariates. Statistica Sinica.

Continue reading



Integrative Bayesian Analysis of High-dimensional Multiplatform Genomics Data

October 30, 2022

This note is for Wang, W., Baladandayuthapani, V., Morris, J. S., Broom, B. M., Manyam, G., & Do, K.-A. (2013). iBAG: Integrative Bayesian analysis of high-dimensional multiplatform genomics data. Bioinformatics, 29(2), 149–159.

Continue reading



Bayesian Hierarchical Varying-Sparsity Regression Models with Application to Cancer Proteogenomics.

October 29, 2022

This note is for Ni, Y., Stingo, F. C., Ha, M. J., Akbani, R., & Baladandayuthapani, V. (2019). Bayesian Hierarchical Varying-Sparsity Regression Models with Application to Cancer Proteogenomics. Journal of the American Statistical Association, 114(525), 48–60.

Continue reading



Simultaneous Estimation of Cell Type Proportions and Cell Type-specific Gene Expressions

October 12, 2022

This note is for Tang, D., Park, S., & Zhao, H. (2022). SCADIE: Simultaneous estimation of cell type proportions and cell type-specific gene expressions using SCAD-based iterative estimating procedure. Genome Biology, 23(1), 129.

Continue reading



scDesign3: A Single-cell Simulator

October 10, 2022

This note is based on Jingyi Jessica Li’s talk on Song, D., Wang, Q., Yan, G., Liu, T., & Li, J. J. (2022). A unified framework of realistic in silico data generation and statistical model inference for single-cell and spatial omics (p. 2022.09.20.508796). bioRxiv.

Continue reading



ADAM and AMSGrad for Stochastic Optimization

October 09, 2022

This post is based on

Continue reading



Single-cell Graph Neural Network

October 08, 2022

This note is for Prof. Dong Xu’s talk on Wang, J., Ma, A., Chang, Y., Gong, J., Jiang, Y., Qi, R., Wang, C., Fu, H., Ma, Q., & Xu, D. (2021). ScGNN is a novel graph neural network framework for single-cell RNA-Seq analyses. Nature Communications, 12(1), Article 1.

Continue reading



Contrastive Learning: A Simple Framework and A Theoretical Analysis

October 06, 2022

This note is based on

Continue reading



Joint Model of Longitudinal and Survival Data

October 02, 2022 (Update: )

This post is based on Rizopoulos, D. (2017). An Introduction to the Joint Modeling of Longitudinal and Survival Data, with Applications in R. 235.

Continue reading



Debiased Inverse-Variance Weighted Estimator in Mendelian Randomization

September 20, 2022

This post is for the talk at Yale given by Prof. Ting Ye based on the paper Ye, T., Shao, J., & Kang, H. (2020). Debiased Inverse-Variance Weighted Estimator in Two-Sample Summary-Data Mendelian Randomization (arXiv:1911.09802). arXiv.

Continue reading



Multicenter IPF-PRO Registry Cohort

August 25, 2022 (Update: )

This note is for Todd, J. L., Vinisko, R., Liu, Y., Neely, M. L., Overton, R., Flaherty, K. R., Noth, I., Newby, L. K., Lasky, J. A., Olman, M. A., Hesslinger, C., Leonard, T. B., Palmer, S. M., & Belperio, J. A. (2020). Circulating matrix metalloproteinases and tissue metalloproteinase inhibitors in patients with idiopathic pulmonary fibrosis in the multicenter IPF-PRO Registry cohort. BMC Pulmonary Medicine, 20(1), 64.

Continue reading



Fitting to Future Observations

July 21, 2022

This note is for Jiang, Y., & Liu, C. (2022). Estimation of Over-parameterized Models via Fitting to Future Observations (arXiv:2206.01824). arXiv.

Continue reading



See all posts →