WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

spaCRT: saddlepoint approximation-based conditional randomization test

November 04, 2024

This note is for Niu, Z., Choudhury, J. R., & Katsevich, E. (2024). Computationally efficient and statistically accurate conditional independence testing with spaCRT (No. arXiv:2407.08911; Version 1). arXiv. https://doi.org/10.48550/arXiv.2407.08911

Continue reading



Benchopt: Benchmarks for ML Optimizations

November 01, 2024 (Update: )

This is the note for Moreau, T., Massias, M., Gramfort, A., Ablin, P., Bannier, P.-A., Charlier, B., Dagréou, M., Tour, T. D. la, Durif, G., Dantas, C. F., Klopfenstein, Q., Larsson, J., Lai, E., Lefort, T., Malézieux, B., Moufad, B., Nguyen, B. T., Rakotomamonjy, A., Ramzi, Z., … Vaiter, S. (2022). Benchopt: Reproducible, efficient and collaborative optimization benchmarks (No. arXiv:2206.13424). arXiv. https://doi.org/10.48550/arXiv.2206.13424

Continue reading



XBART: Accelerated Bayesian Additive Regression Trees

October 04, 2024

This post is based on He, J., Yalov, S., & Hahn, P. R. (2019). XBART: Accelerated Bayesian Additive Regression Trees. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, 1130–1138. https://proceedings.mlr.press/v89/he19a.html and He, J., & Hahn, P. R. (2023). Stochastic Tree Ensembles for Regularized Nonlinear Regression. Journal of the American Statistical Association, 118(541), 551–570. https://doi.org/10.1080/01621459.2021.1942012

Continue reading



scDRS: single-cell disease relevance score

September 10, 2024 (Update: ) 0 Comments

This note is for Zhang, M. J., Hou, K., Dey, K. K., Sakaue, S., Jagadeesh, K. A., Weinand, K., Taychameekiatchai, A., Rao, P., Pisco, A. O., Zou, J., Wang, B., Gandal, M., Raychaudhuri, S., Pasaniuc, B., & Price, A. L. (2022). Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nature Genetics, 54(10), 1572–1580. https://doi.org/10.1038/s41588-022-01167-z

Continue reading



Guarantees of Lloyd’s Algorithm

September 10, 2024 (Update: ) 0 Comments

This note is for Lu, Y., & Zhou, H. H. (2016). Statistical and Computational Guarantees of Lloyd’s Algorithm and its Variants (No. arXiv:1612.02099). arXiv. http://arxiv.org/abs/1612.02099

Continue reading



Data Thinning for Convolution-Closed Distributions

August 29, 2024 0 Comments

This note is for Neufeld, A., Dharamshi, A., Gao, L. L., & Witten, D. (2024). Data Thinning for Convolution-Closed Distributions. Journal of Machine Learning Research, 25(57), 1–35.

Continue reading



Data Fission

August 05, 2024 (Update: ) 0 Comments

This note is for the discussion paper Leiner, J., Duan, B., Wasserman, L., & Ramdas, A. (2023). Data fission: Splitting a single data point (arXiv:2112.11079). arXiv. http://arxiv.org/abs/2112.11079 in the JASA invited session at JSM 2024

Continue reading



Watermarks in Large Language Models

August 05, 2024 (Update: ) 0 Comments

This is the note for the talk Statistical Inference in Large Language Models: A Statistical Framework of Watermarks given by Weijie Su at JSM 2024

Continue reading



Training in Large Language Models

August 05, 2024 (Update: )

This is the note for the talk LLMs training given by Linjun Zhang at JSM 2024

Continue reading



Perference Matching in RLHF

August 05, 2024 (Update: )

This is the note for the talk Statistical Inference in Large Language Models: Alignment and Copyright given by Weijie Su at JSM 2024

Continue reading



Talagrand Concentration

July 30, 2024 (Update: ) 0 Comments

This note is for Wainwright, M. J. (n.d.). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. 604.

Continue reading



Approximating Bayes

July 04, 2024 (Update: ) 0 Comments

This is the note for Martin, G. M., Frazier, D. T., & Robert, C. P. (2024). Approximating Bayes in the 21st Century. Statistical Science, 39(1), 20–45. https://doi.org/10.1214/22-STS875

Continue reading



Conformal Prediction for Single-cell Spatial Transcriptomics

June 07, 2024 0 Comments

This note is for Sun, E. D., Ma, R., Navarro Negredo, P., Brunet, A., & Zou, J. (2024). TISSUE: Uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses. Nature Methods, 21(3), 444–454.

Continue reading



GhostKnockoffs: Only Summary Statistics

May 23, 2024 0 Comments

This note is for Chen, Z., He, Z., Chu, B. B., Gu, J., Morrison, T., Sabatti, C., & Candès, E. (2024). Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression (arXiv:2402.12724). arXiv.

Continue reading



Niche DE

April 30, 2024

This note is for Mason, K., Sathe, A., Hess, P. R., Rong, J., Wu, C.-Y., Furth, E., Susztak, K., Levinsohn, J., Ji, H. P., & Zhang, N. (2024). Niche-DE: Niche-differential gene expression analysis in spatial transcriptomics data identifies context-dependent cell-cell interactions. Genome Biology, 25(1), 14.

Continue reading



Model-X Knockoffs

April 20, 2024 (Update: )

This note is for Candes, E., Fan, Y., Janson, L., & Lv, J. (2017). Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection. arXiv:1610.02351 [Math, Stat].

Continue reading



Conditional Independence Test in Single-cell Multiomics

April 17, 2024 0 Comments

This note is for Boyeau, P., Bates, S., Ergen, C., Jordan, M. I., & Yosef, N. (2023). Calibrated Identification of Feature Dependencies in Single-cell Multiomics.

Continue reading



Test Difference for A Single Feature

April 12, 2024 0 Comments

This note is for Chen, Y. T., & Gao, L. L. (2023). Testing for a difference in means of a single feature after clustering (arXiv:2311.16375). arXiv.

Continue reading



Selective Inference for K-means

April 12, 2024

This note is for Chen, Y. T., & Witten, D. M. (2022). Selective inference for k-means clustering (arXiv:2203.15267). arXiv.

Continue reading



Comparisons of transformations for single-cell RNA-seq data

March 26, 2024

This post is for Ahlmann-Eltze, C., & Huber, W. (2023). Comparison of transformations for single-cell RNA-seq data. Nature Methods, 20(5), 665–672.

Continue reading



sctransform: Normalization using Regularized Negative Binomial Regression

February 24, 2024 (Update: )

The note is for Hafemeister, C., & Satija, R. (2019). Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biology, 20(1), 296.

Continue reading



Causal Inference on Distribution Functions

February 20, 2024 (Update: )

This post is for Lin, Z., Kong, D., & Wang, L. (2023). Causal inference on distribution functions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(2), 378–398.

Continue reading



BLiP: Bayesian Linear Programming

February 09, 2024

The note is for Spector, A., & Janson, L. (2023). Controlled Discovery and Localization of Signals via Bayesian Linear Programming (arXiv:2203.17208). arXiv.

Continue reading



Post-clustering Inference under Dependency

February 08, 2024

This post is for González-Delgado, J., Cortés, J., & Neuvial, P. (2023). Post-clustering Inference under Dependency (arXiv:2310.11822). arXiv.

Continue reading



Bipartitle eQTL Network Construction

February 08, 2024

This post is for Gaynor, S. M., Fagny, M., Lin, X., Platig, J., & Quackenbush, J. (2022). Connectivity in eQTL networks dictates reproducibility and genomic properties. Cell Reports Methods, 2(5), 100218.

Continue reading



Selective Inference for Hierarchical Clustering

February 08, 2024

This note is for Gao, L. L., Bien, J., & Witten, D. (2022). Selective Inference for Hierarchical Clustering (arXiv:2012.02936). arXiv.

Continue reading



Contrasting Genetic Architectures using Fast Variance Components Analysis

February 07, 2024

This note is for Loh, P.-R., Bhatia, G., Gusev, A., Finucane, H. K., Bulik-Sullivan, B. K., Pollack, S. J., de Candia, T. R., Lee, S. H., Wray, N. R., Kendler, K. S., O’Donovan, M. C., Neale, B. M., Patterson, N., & Price, A. L. (2015). Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance components analysis. Nature Genetics, 47(12), 1385–1392.

Continue reading



Joint Model in High Dimension

January 31, 2024 (Update: )

This post is for Liu, M., Sun, J., Herazo-Maya, J. D., Kaminski, N., & Zhao, H. (2019). Joint Models for Time-to-Event Data and Longitudinal Biomarkers of High Dimension. Statistics in Biosciences, 11(3), 614–629.

Continue reading



BAMLSS: Flexible Bayesian Additive Joint Model

January 31, 2024 (Update: )

This post is for Köhler, M., Umlauf, N., Beyerlein, A., Winkler, C., Ziegler, A.-G., & Greven, S. (2017). Flexible Bayesian additive joint models with an application to type 1 diabetes research. Biometrical Journal, 59(6), 1144–1165.

Continue reading



Effective Gene Expression Prediction

January 26, 2024 (Update: )

This note is for Avsec, Ž., Agarwal, V., Visentin, D., Ledsam, J. R., Grabska-Barwinska, A., Taylor, K. R., Assael, Y., Jumper, J., Kohli, P., & Kelley, D. R. (2021). Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, 18(10), 1196–1203.

Continue reading



Edgeworth Expansion

January 24, 2024

This note is based on Shao, J. (2003). Mathematical statistics (2nd ed). Springer. and Hwang, J. (2019). Note on Edgeworth Expansions and Asymptotic Refinements of Percentile t-Bootstrap Methods. Bootstrap Methods.

Continue reading



t-Test for Mixture Normal Data

January 23, 2024

The post is for Lee, A. F. S., & Gurland, J. (1977). One-Sample t-Test When Sampling from a Mixture of Normal Distributions. The Annals of Statistics, 5(4), 803–807.

Continue reading



Fine-mapping from Summary Data with SuSiE

January 22, 2024

This post is for Zou, Y., Carbonetto, P., Wang, G., & Stephens, M. (2022). Fine-mapping from summary data with the “Sum of Single Effects” model. PLOS Genetics, 18(7), e1010299.

Continue reading



SuSiE: Sum of Single Effects Model

January 22, 2024

This note is for Wang, G., Sarkar, A., Carbonetto, P., & Stephens, M. (2020). A Simple New Approach to Variable Selection in Regression, with Application to Genetic Fine Mapping. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(5), 1273–1300.

Continue reading



Statistical Learning and Selective Inference

January 19, 2024

This post is for Taylor, J., & Tibshirani, R. J. (2015). Statistical learning and selective inference. Proceedings of the National Academy of Sciences of the United States of America, 112(25), 7629–7634.

Continue reading



Exact Post-Selection Inference for Sequential Regression Procedures

January 19, 2024

This post is for Tibshirani, R. J., Taylor, J., Lockhart, R., & Tibshirani, R. (2016). Exact Post-Selection Inference for Sequential Regression Procedures. Journal of the American Statistical Association, 111(514), 600–620.

Continue reading



FDR Control in GLM

January 15, 2024 (Update: )

This post is for Dai, C., Lin, B., Xing, X., & Liu, J. S. (2023). A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models. Journal of the American Statistical Association, 118(543), 1551–1565.

Continue reading



MMRM: Mixed-Models for Repeated Measures

January 10, 2024 (Update: )

This post is based on vignettes of MMRM R package: https://openpharma.github.io/mmrm/main/index.html

Continue reading



One-way Matching with Low Rank

January 06, 2024 (Update: )

This post is for Chen, Shuxiao, Sizun Jiang, Zongming Ma, Garry P. Nolan, and Bokai Zhu. “One-Way Matching of Datasets with Low Rank Signals.” arXiv, October 3, 2022.

Continue reading



CountSplit for scRNA Data

December 08, 2023 (Update: )

The post is for Neufeld, Anna, Lucy L Gao, Joshua Popp, Alexis Battle, and Daniela Witten. “Inference after Latent Variable Estimation for Single-cell RNA Sequencing Data.” Biostatistics, December 13, 2022, kxac047.

Continue reading



Uncertainty of Pseudotime Trajectory

December 04, 2023

This post is for Tenha, Lovemore, and Mingzhou Song. “Statistical Evidence for the Presence of Trajectory in Single-cell Data.” BMC Bioinformatics 23, no. Suppl 8 (August 16, 2022): 340.

Continue reading



ClusterDE: a post-clustering DE method

December 04, 2023

This post is for Song, Dongyuan, Kexin Li, Xinzhou Ge, and Jingyi Jessica Li. “ClusterDE: A Post-Clustering Differential Expression (DE) Method Robust to False-Positive Inflation Caused by Double Dipping,” 2023

Continue reading



Approximation to Log-likelihood of Nonlinear Mixed-effects Model

November 26, 2023

This post is for Pinheiro, José C., and Douglas M. Bates. “Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model.” Journal of Computational and Graphical Statistics 4, no. 1 (1995): 12–35.

Continue reading



Hierarchical Multi-label Contrastive Learning

November 25, 2023

This post is for Zhang, Shu, Ran Xu, Caiming Xiong, and Chetan Ramaiah. “Use All the Labels: A Hierarchical Multi-Label Contrastive Learning Framework,” 16660–69, 2022.

Continue reading



Hierarchical Multi-Label Classification

November 20, 2023

This post is for two papers on Hierarchical multi-label classification (HMC), which imposes a hierarchy constraint on the classes.

Continue reading



Consistent Probabilities along GO Structure

November 16, 2023

This note is for Obozinski, Guillaume, Gert Lanckriet, Charles Grant, Michael I. Jordan, and William Stafford Noble. “Consistent Probabilistic Outputs for Protein Function Prediction.” Genome Biology 9 Suppl 1, no. Suppl 1 (2008): S6.

Continue reading



scHOT: Investigate higher-order interactions in single-cell data

October 13, 2023

This note is for Ghazanfar, Shila, Yingxin Lin, Xianbin Su, David Ming Lin, Ellis Patrick, Ze-Guang Han, John C. Marioni, and Jean Yee Hwa Yang. “Investigating Higher-Order Interactions in Single-cell Data with scHOT.” Nature Methods 17, no. 8 (August 2020): 799–806.

Continue reading



Constrained Smoothing and Out-of-range Prediction using P-splines

September 22, 2023

This note is for Navarro-García, M., Guerrero, V., & Durban, M. (2023). On constrained smoothing and out-of-range prediction using P-splines: A conic optimization approach. Applied Mathematics and Computation, 441, 127679.

Continue reading



An Iterative Procedure for Shape-constrained Smoothing using Smoothing Splines

September 21, 2023

This note is for Turlach, B. A. (2005). Shape constrained smoothing using smoothing splines. Computational Statistics, 20(1), 81–104.

Continue reading



Shape-Constrained Estimation Using Nonnegative Splines

September 21, 2023

This note is for Papp, D., & Alizadeh, F. (2014). Shape-Constrained Estimation Using Nonnegative Splines. Journal of Computational and Graphical Statistics, 23(1), 211–231.

Continue reading



See all posts →