Transformation and model choice for RNA-seq co-expression analysis
Male
0301 basic medicine
Swine
[SDV]Life Sciences [q-bio]
Neocortex
510
Mice
03 medical and health sciences
data transformation
Fetus
Intestine, Small
Animals
Humans
mixture models
[SDV.GEN]Life Sciences [q-bio]/Genetics
0303 health sciences
Models, Statistical
500
Computational Biology
High-Throughput Nucleotide Sequencing
Embryo, Mammalian
co-expression
[STAT]Statistics [stat]
[SDV.GEN.GA]Life Sciences [q-bio]/Genetics/Animal genetics
Drosophila melanogaster
Female
RNA-seq
DOI:
10.1101/065607
Publication Date:
2016-07-25T05:10:48Z
AUTHORS (2)
ABSTRACT
AbstractAlthough a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA-seq data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be quite strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters, and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq datasets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose an R package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (46)
CITATIONS (2)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....