DNA copy number profiling: from bulk tissue to single cells
Copy number variation (CNV)
CNV detection by next-generation DNA-seq
- Whole-genome sequencing (WGS)
- Whole-exome sequencing (WES) & targeted sequencing
- Conventional whole-genome amplification
- 10X Genomics Chromium Single Cell CNV Solution
Use bulk/single-cell DNA sequencing to accurately detect CNV
- Based on depth of coverage, i.e., number of times a genomic region is “read”
- GC content
- Capture and amplification efficiency
- Sequencing bias
- Latent factors
- Batch effect
- Population stratification
Fromer et al. (2012) said that
Because exome sequencing takes aim at a sparse (~1%) set of noncontiguous genomic targets (the exons), most CNV breakpoints will not be sequenced, leaving read depth as the predominant indicator of CNVs. However, the quantitative relationship between true copy number and depth is distorted by target- and sample-specific biases in exome hybridization (“capture”), PCR amplification, sequencing efficiency, and in silico read mapping, all of which are in turn affected by GC content of the targets, target size and sequence complexity, proximity to segmental duplications, nucleotide-level variation (SNPs), DNA concentration, hybridization temperature, experimental sample batching, and the complex interplay among these and various indeterminate factors.
They proposed a method called XHMM, and introduced that
XHMM extracts copy-number signal from noisy read depth by leveraging the large-scale nature of sequencing projects to discern patterns of read-depth biases. Specifically, we ran a principal-component analysis (PCA) on the sample-by-target-depth matrix by “rotating” the high-dimensional data to find the main modes in which depth varies across multiple samples and targets, and we removed the largest of such effects.
Here is a diagram illustrating some basic concepts of DNA, which adapted from 潘高的小站 – 外显子、内含子、mRNA、CDS、ORF的区别与联系
GC content bias
GC content is calculated as
The theoretical distribution would be a Normal distribution (refer to 小L生信学习日记-4丨原始数据质量如何判断？-下).
Poisson latent factor model for normalization
CODEX: COpy number Detection by EXome-seq
Lower sensitivity for common CNVs
The speaker mentioned that the common CNV signals are attenuated by the Poisson latent factors, and he proposed the following solution.
CODEX2: full-spectrum CNV detection by NGS
Profiling somatic copy number aberrations by scDNA-seq
As for the difference of somatic mutations and germline mutations,
scDNA-seq data breast cancer patients
How to identify normal cells?
The Gini coefficient is calculated as
A short but clear introduction for the Gini coefficient: Gini coefficient and Lorenz curve explained
SCOPE: Single-cell COPy number Estimation
10x Genomics single-cell CNV datasets
10x Genomics is a company!! Refer to 10x Genomics: 昨天今天明天 for more details.
Fromer, M., Moran, J. L., Chambert, K., Banks, E., Bergen, S. E., Ruderfer, D. M., Handsaker, R. E., McCarroll, S. A., O’Donovan, M. C., Owen, M. J., Kirov, G., Sullivan, P. F., Hultman, C. M., Sklar, P., & Purcell, S. M. (2012). Discovery and Statistical Genotyping of Copy-Number Variation from Whole-Exome Sequencing Depth. The American Journal of Human Genetics, 91(4), 597–607.