Selecting Classification Methods for Small Samples of Next-Generation Sequencing Data
Overdispersion
Binomial distribution
DOI:
10.3389/fgene.2021.642227
Publication Date:
2021-03-04T06:24:53Z
AUTHORS (6)
ABSTRACT
Next-generation sequencing has emerged as an essential technology for the quantitative analysis of gene expression. In medical research, RNA (RNA-seq) data are commonly used to identify which type disease a patient has. Because discrete nature RNA-seq data, existing statistical methods that have been developed microarray cannot be directly applied data. Existing usually model by distribution, such Poisson, negative binomial, or mixture distribution with point mass at zero and Poisson further allow excess zeros. Consequently, analytic tools corresponding above three distributions developed: linear discriminant (PLDA), binomial (NBLDA), zero-inflated logistic (ZIPLDA). However, it is unclear what real would these classifications when new dataset. Considering count datasets frequently characterized zeros overdispersion, this paper extends proposes (ZINBLDA) classification. More importantly, we compare four classification from perspective parameters, understanding parameters necessary selecting optimal method Furthermore, determine could transform into each other in some cases. Using simulation studies, evaluate performance wide range settings, also present decision tree created help us select classifier The results two coincide theory results. work implemented open-scource R scripts, source code freely available https://github.com/FocusPaka/ZINBLDA .
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (20)
CITATIONS (3)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....