Predictability of human differential gene expression

Graft Rejection Lung Neoplasms neoplasms specificity 3102 Bioinformatics and Computational Biology transcriptomics computational biology Essential Recurrence replicability analysis and processing Gene Regulatory Networks anzsrc-for: 31 Biological Sciences RNA structure genes Cancer modification 0303 health sciences Tumor Genes, Essential cancer types bioinformatics Genomics 3. Good health PNAS Plus diseases & disorders biomarker Female anzsrc-for: 3102 Bioinformatics and Computational Biology metaanalysis 610 Breast Neoplasms 612 Adenocarcinoma Sensitivity and Specificity differential expression 03 medical and health sciences breast cancer Rare Diseases transcriptomes Breast Cancer Genetics Biomarkers, Tumor cancer Humans genetics & nucleic acid processing Probability genomics and proteomics function Electronic Data Processing Gene Expression Profiling Human Genome Investigative techniques and equipment Human Genetics DNA Kidney Transplantation Genes Gene Expression Regulation ROC Curve gene expression Women's Health structure and function Transcriptome Biomarkers 31 Biological Sciences
DOI: 10.1073/pnas.1802973116 Publication Date: 2019-03-08T00:25:55Z
ABSTRACT
Differential expression (DE) is commonly used to explore molecular mechanisms of biological conditions. While many studies report significant results between their groups of interest, the degree to which results are specific to the question at hand is not generally assessed, potentially leading to inaccurate interpretation. This could be particularly problematic for metaanalysis where replicability across datasets is taken as strong evidence for the existence of a specific, biologically relevant signal, but which instead may arise from recurrence of generic processes. To address this, we developed an approach to predict DE based on an analysis of over 600 studies. A predictor based on empirical prior probability of DE performs very well at this task (mean area under the receiver operating characteristic curve, ∼0.8), indicating that a large fraction of DE hit lists are nonspecific. In contrast, predictors based on attributes such as gene function, mutation rates, or network features perform poorly. Genes associated with sex, the extracellular matrix, the immune system, and stress responses are prominent within the “DE prior.” In a series of control studies, we show that these patterns reflect shared biology rather than technical artifacts or ascertainment biases. Finally, we demonstrate the application of the DE prior to data interpretation in three use cases: (i) breast cancer subtyping, (ii) single-cell genomics of pancreatic islet cells, and (iii) metaanalysis of lung adenocarcinoma and renal transplant rejection transcriptomics. In all cases, we find hallmarks of generic DE, highlighting the need for nuanced interpretation of gene phenotypic associations.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (91)
CITATIONS (117)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....