Prediction of protein–RNA binding sites by a random forest method with combined features

0301 basic medicine 03 medical and health sciences Artificial Intelligence Sequence Analysis, Protein Computational Biology RNA RNA-Binding Proteins
DOI: 10.1093/bioinformatics/btq253 Publication Date: 2010-05-19T00:47:01Z
ABSTRACT
Abstract Motivation: Protein–RNA interactions play a key role in number of biological processes, such as protein synthesis, mRNA processing, assembly, ribosome function and eukaryotic spliceosomes. As result, reliable identification RNA binding site is important for functional annotation site-directed mutagenesis. Accumulated data experimental protein–RNA reveal that residue with different neighbor amino acids often exhibits preferences its partners, which turn can be assessed by the interacting interdependence acid fragment nucleotide. Results: In this work, we propose novel classification method to identify sites proteins combining new feature (interaction propensity) other sequence- structure-based features. Specifically, interaction propensity represents specificity nucleotide considering two-side neighborhood triplet. The sequence well features residues are combined together discriminate RNA. We predict implementing well-built random forest classifier. experiments show our able detect annotated high accuracy. Our achieves an accuracy 84.5%, F-measure 0.85 AUC 0.92 prediction dataset containing 205 non-homologous proteins, also outperforms several existing predictors, RNABindR, BindN, RNAProB PPRint, some alternative machine learning methods, support vector machine, naive Bayes neural network comparison study. Furthermore, provide insights into roles sequences structures both evaluating importance their contributions predictive analyzing patterns residues. Availability: All source code available at http://www.aporc.org/doc/wiki/PRNA or http://www.sysbio.ac.cn/datatools.asp Contact: lnchen@sibs.ac.cn Supplementary information: Bioinformatics online.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (33)
CITATIONS (134)