WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

FDR Control for Byzantine Machines

December 13, 2024

This is the note for Qian, C., Wang, M., Ren, H., & Zou, C. (2024). ByMI: Byzantine Machine Identification with False Discovery Rate Control. Proceedings of the 41st International Conference on Machine Learning, 41357–41382. https://proceedings.mlr.press/v235/qian24b.html

Continue reading



FDR Control under General Dependence via Symmetrization

December 13, 2024

This note is for Du, L., Guo, X., Sun, W., & Zou, C. (2023). False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation. Journal of the American Statistical Association, 118(541), 607–621. https://doi.org/10.1080/01621459.2021.1945459

Continue reading



Personalized Federated Learning with Robust and Sparse Regressions

December 10, 2024 (Update: )

This note is for Liu, W., Mao, X., Zhang, X., & Zhang, X. (2024). Robust Personalized Federated Learning with Sparse Penalization. Journal of the American Statistical Association, 0(0), 1–12. https://doi.org/10.1080/01621459.2024.2321652

Continue reading



Biomarker Variability in Joint Model

December 10, 2024

This note is for Wang, C., Shen, J., Charalambous, C., & Pan, J. (2024). Modeling biomarker variability in joint analysis of longitudinal and time-to-event data. Biostatistics, 25(2), 577–596. https://doi.org/10.1093/biostatistics/kxad009 and Wang, C., Shen, J., Charalambous, C., & Pan, J. (2024). Weighted biomarker variability in joint analysis of longitudinal and time-to-event data. The Annals of Applied Statistics, 18(3), 2576–2595. https://doi.org/10.1214/24-AOAS1896

Continue reading



Derandomised Knockoffs from E-values

December 09, 2024

This note is for Ren, Z., & Barber, R. F. (2024). Derandomised knockoffs: Leveraging e-values for false discovery rate control. Journal of the Royal Statistical Society Series B: Statistical Methodology, 86(1), 122–154. https://doi.org/10.1093/jrsssb/qkad085

Continue reading



Derandomised Knockoffs from E-values

December 05, 2024

This note is for Wang, R., & Ramdas, A. (2022). False Discovery Rate Control with E-values. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(3), 822–852. https://doi.org/10.1111/rssb.12489 and Aaditya’s talk at ISSI on October 25, 2023

Continue reading



Task-Agnostic Machine-Learning-Assisted Inference

November 22, 2024

This note is for Miao, J., & Lu, Q. (2024). Task-Agnostic Machine-Learning-Assisted Inference (No. arXiv:2405.20039). arXiv. https://doi.org/10.48550/arXiv.2405.20039

Continue reading



Review on Normalizing Flows

November 22, 2024 (Update: )

This note is for Kobyzev, I., Prince, S. J. D., & Brubaker, M. A. (2020). Normalizing Flows: An Introduction and Review of Current Methods (No. arXiv:1908.09257). arXiv. http://arxiv.org/abs/1908.09257

Continue reading



C-SIDE for Cell-type-specific Spatial DE

November 12, 2024 (Update: )

This note is for Cable, D. M., Murray, E., Shanmugam, V., Zhang, S., Zou, L. S., Diao, M., Chen, H., Macosko, E. Z., Irizarry, R. A., & Chen, F. (2022). Cell type-specific inference of differential expression in spatial transcriptomics. Nature Methods, 19(9), 1076–1087. https://doi.org/10.1038/s41592-022-01575-3

Continue reading



spaCRT: saddlepoint approximation-based conditional randomization test

November 04, 2024

This note is for Niu, Z., Choudhury, J. R., & Katsevich, E. (2024). Computationally efficient and statistically accurate conditional independence testing with spaCRT (No. arXiv:2407.08911; Version 1). arXiv. https://doi.org/10.48550/arXiv.2407.08911

Continue reading



Benchopt: Benchmarks for ML Optimizations

November 01, 2024 (Update: )

This is the note for Moreau, T., Massias, M., Gramfort, A., Ablin, P., Bannier, P.-A., Charlier, B., Dagréou, M., Tour, T. D. la, Durif, G., Dantas, C. F., Klopfenstein, Q., Larsson, J., Lai, E., Lefort, T., Malézieux, B., Moufad, B., Nguyen, B. T., Rakotomamonjy, A., Ramzi, Z., … Vaiter, S. (2022). Benchopt: Reproducible, efficient and collaborative optimization benchmarks (No. arXiv:2206.13424). arXiv. https://doi.org/10.48550/arXiv.2206.13424

Continue reading



XBART: Accelerated Bayesian Additive Regression Trees

October 04, 2024

This post is based on He, J., Yalov, S., & Hahn, P. R. (2019). XBART: Accelerated Bayesian Additive Regression Trees. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, 1130–1138. https://proceedings.mlr.press/v89/he19a.html and He, J., & Hahn, P. R. (2023). Stochastic Tree Ensembles for Regularized Nonlinear Regression. Journal of the American Statistical Association, 118(541), 551–570. https://doi.org/10.1080/01621459.2021.1942012

Continue reading



scDRS: single-cell disease relevance score

September 10, 2024 (Update: ) 0 Comments

This note is for Zhang, M. J., Hou, K., Dey, K. K., Sakaue, S., Jagadeesh, K. A., Weinand, K., Taychameekiatchai, A., Rao, P., Pisco, A. O., Zou, J., Wang, B., Gandal, M., Raychaudhuri, S., Pasaniuc, B., & Price, A. L. (2022). Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nature Genetics, 54(10), 1572–1580. https://doi.org/10.1038/s41588-022-01167-z

Continue reading



Guarantees of Lloyd’s Algorithm

September 10, 2024 (Update: ) 0 Comments

This note is for Lu, Y., & Zhou, H. H. (2016). Statistical and Computational Guarantees of Lloyd’s Algorithm and its Variants (No. arXiv:1612.02099). arXiv. http://arxiv.org/abs/1612.02099

Continue reading



Data Thinning for Convolution-Closed Distributions

August 29, 2024 0 Comments

This note is for Neufeld, A., Dharamshi, A., Gao, L. L., & Witten, D. (2024). Data Thinning for Convolution-Closed Distributions. Journal of Machine Learning Research, 25(57), 1–35.

Continue reading



Data Fission

August 05, 2024 (Update: ) 0 Comments

This note is for the discussion paper Leiner, J., Duan, B., Wasserman, L., & Ramdas, A. (2023). Data fission: Splitting a single data point (arXiv:2112.11079). arXiv. http://arxiv.org/abs/2112.11079 in the JASA invited session at JSM 2024

Continue reading



Watermarks in Large Language Models

August 05, 2024 (Update: ) 0 Comments

This is the note for the talk Statistical Inference in Large Language Models: A Statistical Framework of Watermarks given by Weijie Su at JSM 2024

Continue reading



Training in Large Language Models

August 05, 2024 (Update: )

This is the note for the talk LLMs training given by Linjun Zhang at JSM 2024

Continue reading



Perference Matching in RLHF

August 05, 2024 (Update: )

This is the note for the talk Statistical Inference in Large Language Models: Alignment and Copyright given by Weijie Su at JSM 2024

Continue reading



Talagrand Concentration

July 30, 2024 (Update: ) 0 Comments

This note is for Wainwright, M. J. (n.d.). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. 604.

Continue reading



Approximating Bayes

July 04, 2024 (Update: ) 0 Comments

This is the note for Martin, G. M., Frazier, D. T., & Robert, C. P. (2024). Approximating Bayes in the 21st Century. Statistical Science, 39(1), 20–45. https://doi.org/10.1214/22-STS875

Continue reading



Conformal Prediction for Single-cell Spatial Transcriptomics

June 07, 2024 0 Comments

This note is for Sun, E. D., Ma, R., Navarro Negredo, P., Brunet, A., & Zou, J. (2024). TISSUE: Uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses. Nature Methods, 21(3), 444–454.

Continue reading



GhostKnockoffs: Only Summary Statistics

May 23, 2024 0 Comments

This note is for Chen, Z., He, Z., Chu, B. B., Gu, J., Morrison, T., Sabatti, C., & Candès, E. (2024). Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression (arXiv:2402.12724). arXiv.

Continue reading



Niche DE

April 30, 2024

This note is for Mason, K., Sathe, A., Hess, P. R., Rong, J., Wu, C.-Y., Furth, E., Susztak, K., Levinsohn, J., Ji, H. P., & Zhang, N. (2024). Niche-DE: Niche-differential gene expression analysis in spatial transcriptomics data identifies context-dependent cell-cell interactions. Genome Biology, 25(1), 14.

Continue reading



Model-X Knockoffs

April 20, 2024 (Update: )

This note is for Candes, E., Fan, Y., Janson, L., & Lv, J. (2017). Panning for Gold: Model-X Knockoffs for High-dimensional Controlled Variable Selection. arXiv:1610.02351 [Math, Stat].

Continue reading



Conditional Independence Test in Single-cell Multiomics

April 17, 2024 0 Comments

This note is for Boyeau, P., Bates, S., Ergen, C., Jordan, M. I., & Yosef, N. (2023). Calibrated Identification of Feature Dependencies in Single-cell Multiomics.

Continue reading



Test Difference for A Single Feature

April 12, 2024 0 Comments

This note is for Chen, Y. T., & Gao, L. L. (2023). Testing for a difference in means of a single feature after clustering (arXiv:2311.16375). arXiv.

Continue reading



Selective Inference for K-means

April 12, 2024

This note is for Chen, Y. T., & Witten, D. M. (2022). Selective inference for k-means clustering (arXiv:2203.15267). arXiv.

Continue reading



Comparisons of transformations for single-cell RNA-seq data

March 26, 2024

This post is for Ahlmann-Eltze, C., & Huber, W. (2023). Comparison of transformations for single-cell RNA-seq data. Nature Methods, 20(5), 665–672.

Continue reading



sctransform: Normalization using Regularized Negative Binomial Regression

February 24, 2024 (Update: )

The note is for Hafemeister, C., & Satija, R. (2019). Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biology, 20(1), 296.

Continue reading



Causal Inference on Distribution Functions

February 20, 2024 (Update: )

This post is for Lin, Z., Kong, D., & Wang, L. (2023). Causal inference on distribution functions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(2), 378–398.

Continue reading



BLiP: Bayesian Linear Programming

February 09, 2024

The note is for Spector, A., & Janson, L. (2023). Controlled Discovery and Localization of Signals via Bayesian Linear Programming (arXiv:2203.17208). arXiv.

Continue reading



Post-clustering Inference under Dependency

February 08, 2024

This post is for González-Delgado, J., Cortés, J., & Neuvial, P. (2023). Post-clustering Inference under Dependency (arXiv:2310.11822). arXiv.

Continue reading



Bipartitle eQTL Network Construction

February 08, 2024

This post is for Gaynor, S. M., Fagny, M., Lin, X., Platig, J., & Quackenbush, J. (2022). Connectivity in eQTL networks dictates reproducibility and genomic properties. Cell Reports Methods, 2(5), 100218.

Continue reading



Selective Inference for Hierarchical Clustering

February 08, 2024

This note is for Gao, L. L., Bien, J., & Witten, D. (2022). Selective Inference for Hierarchical Clustering (arXiv:2012.02936). arXiv.

Continue reading



Contrasting Genetic Architectures using Fast Variance Components Analysis

February 07, 2024

This note is for Loh, P.-R., Bhatia, G., Gusev, A., Finucane, H. K., Bulik-Sullivan, B. K., Pollack, S. J., de Candia, T. R., Lee, S. H., Wray, N. R., Kendler, K. S., O’Donovan, M. C., Neale, B. M., Patterson, N., & Price, A. L. (2015). Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance components analysis. Nature Genetics, 47(12), 1385–1392.

Continue reading



Joint Model in High Dimension

January 31, 2024 (Update: )

This post is for Liu, M., Sun, J., Herazo-Maya, J. D., Kaminski, N., & Zhao, H. (2019). Joint Models for Time-to-Event Data and Longitudinal Biomarkers of High Dimension. Statistics in Biosciences, 11(3), 614–629.

Continue reading



BAMLSS: Flexible Bayesian Additive Joint Model

January 31, 2024 (Update: )

This post is for Köhler, M., Umlauf, N., Beyerlein, A., Winkler, C., Ziegler, A.-G., & Greven, S. (2017). Flexible Bayesian additive joint models with an application to type 1 diabetes research. Biometrical Journal, 59(6), 1144–1165.

Continue reading



Effective Gene Expression Prediction

January 26, 2024 (Update: )

This note is for Avsec, Ž., Agarwal, V., Visentin, D., Ledsam, J. R., Grabska-Barwinska, A., Taylor, K. R., Assael, Y., Jumper, J., Kohli, P., & Kelley, D. R. (2021). Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, 18(10), 1196–1203.

Continue reading



Edgeworth Expansion

January 24, 2024

This note is based on Shao, J. (2003). Mathematical statistics (2nd ed). Springer. and Hwang, J. (2019). Note on Edgeworth Expansions and Asymptotic Refinements of Percentile t-Bootstrap Methods. Bootstrap Methods.

Continue reading



t-Test for Mixture Normal Data

January 23, 2024

The post is for Lee, A. F. S., & Gurland, J. (1977). One-Sample t-Test When Sampling from a Mixture of Normal Distributions. The Annals of Statistics, 5(4), 803–807.

Continue reading



Fine-mapping from Summary Data with SuSiE

January 22, 2024

This post is for Zou, Y., Carbonetto, P., Wang, G., & Stephens, M. (2022). Fine-mapping from summary data with the “Sum of Single Effects” model. PLOS Genetics, 18(7), e1010299.

Continue reading



SuSiE: Sum of Single Effects Model

January 22, 2024

This note is for Wang, G., Sarkar, A., Carbonetto, P., & Stephens, M. (2020). A Simple New Approach to Variable Selection in Regression, with Application to Genetic Fine Mapping. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(5), 1273–1300.

Continue reading



Statistical Learning and Selective Inference

January 19, 2024

This post is for Taylor, J., & Tibshirani, R. J. (2015). Statistical learning and selective inference. Proceedings of the National Academy of Sciences of the United States of America, 112(25), 7629–7634.

Continue reading



Exact Post-Selection Inference for Sequential Regression Procedures

January 19, 2024

This post is for Tibshirani, R. J., Taylor, J., Lockhart, R., & Tibshirani, R. (2016). Exact Post-Selection Inference for Sequential Regression Procedures. Journal of the American Statistical Association, 111(514), 600–620.

Continue reading



FDR Control in GLM

January 15, 2024 (Update: )

This post is for Dai, C., Lin, B., Xing, X., & Liu, J. S. (2023). A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models. Journal of the American Statistical Association, 118(543), 1551–1565.

Continue reading



MMRM: Mixed-Models for Repeated Measures

January 10, 2024 (Update: )

This post is based on vignettes of MMRM R package: https://openpharma.github.io/mmrm/main/index.html

Continue reading



One-way Matching with Low Rank

January 06, 2024 (Update: )

This post is for Chen, Shuxiao, Sizun Jiang, Zongming Ma, Garry P. Nolan, and Bokai Zhu. “One-Way Matching of Datasets with Low Rank Signals.” arXiv, October 3, 2022.

Continue reading



CountSplit for scRNA Data

December 08, 2023 (Update: )

The post is for Neufeld, Anna, Lucy L Gao, Joshua Popp, Alexis Battle, and Daniela Witten. “Inference after Latent Variable Estimation for Single-cell RNA Sequencing Data.” Biostatistics, December 13, 2022, kxac047.

Continue reading



Uncertainty of Pseudotime Trajectory

December 04, 2023

This post is for Tenha, Lovemore, and Mingzhou Song. “Statistical Evidence for the Presence of Trajectory in Single-cell Data.” BMC Bioinformatics 23, no. Suppl 8 (August 16, 2022): 340.

Continue reading



See all posts →