Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce

Sequence (biology)
DOI: 10.1186/1472-6807-13-s1-s3 Publication Date: 2013-11-08T12:02:41Z
ABSTRACT
Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, prediction of is widely studied. Our previous research shows that cutting long sequences into shorter chunks, predicting chunks independently using thermodynamic methods, reconstructing entire structure from predicted chunk can yield better accuracy than sequence as a whole. The chunking, prediction, reconstruction use different methods parameters, some which produce more accurate predictions others. In this paper, we study efficiency three chunking seven popular programs apply to two datasets with known structures, include both pseudoknotted non-pseudoknotted sequences, well family viral genome RNAs whose have not been before. modularized MapReduce framework based on Hadoop allows us problem parallel robust environment. On average, maximum retention values larger one our over 50 meaning similar real by whole sequence. We observe results 23 except NUPACK program centered method. performance analysis 14 Nodaviridae virus outlines how coarse-grained mapping exhibits turnaround times short sequences. However, lengths increase, fine-grained surpass performance. By together statistical results, inversion-based outperform chunk-based approach also enables predict very feasible traditional alone.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (32)
CITATIONS (12)