Complementary feature selection from alternative splicing events and gene expression for phenotype prediction

Gene prediction Biomarker Discovery
DOI: 10.1093/bioinformatics/btw430 Publication Date: 2016-09-01T07:53:39Z
ABSTRACT
Abstract Motivation A central task of bioinformatics is to develop sensitive and specific means providing medical prognoses from biomarker patterns. Common methods predict phenotypes in RNA-Seq datasets utilize machine learning algorithms trained via gene expression. Isoforms, however, generated alternative splicing, may provide a novel complementary set transcripts for phenotype prediction. In contrast expression, the number isoforms increases significantly due numerous splicing patterns, resulting prioritization problem many algorithms. This study identifies empirically optimal transcript quantification, feature engineering filtering steps using prediction accuracy as metric. At same time, nature isoform data analyzed feasibility identifying candidates examined. Results Isoform features are features, non-redundant information enhanced predictive power when prioritized filtered. univariate algorithm, which selects up N highest ranking described evaluated this study. An empirical comparison pipelines quantification reported by performing cross-validation tests with human non-small cell lung cancer (NSCLC) patients, patients chronic obstructive pulmonary disease (COPD) amyotrophic lateral sclerosis (ALS) transgenic mice, each including samples diseased non-diseased phenotypes. Availability Implementation https://github.com/clabuzze/Phenotype-Prediction-Pipeline.git Contact clabuzze@iastate.edu, antoniom@bc.edu, watsondk@musc.edu, andersonpe2@cofc.edu
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (21)
CITATIONS (12)