A machine learning approach for viral genome classification

Subtyping
DOI: 10.1186/s12859-017-1602-3 Publication Date: 2017-04-11T03:05:50Z
ABSTRACT
Advances in cloning and sequencing technology are yielding a massive number of viral genomes. The classification annotation these genomes constitute important assets the discovery genomic variability, taxonomic characteristics disease mechanisms. Existing methods often designed for specific well-studied family viruses. Thus, comparative studies could benefit from more generic, fast accurate tools classifying typing newly sequenced strains diverse virus families.Here, we introduce platform, CASTOR, based on machine learning methods. CASTOR is inspired by well-known technique molecular biology: restriction fragment length polymorphism (RFLP). It simulates, silico, digestion material different enzymes into fragments. uses two metrics to construct feature vectors algorithms step. We benchmark distinct datasets human papillomaviruses (HPV), hepatitis B viruses (HBV) immunodeficiency type 1 (HIV-1). Results reveal true positive rates 99%, 99% 98% HPV Alpha species, HBV genotyping HIV-1 M subtyping, respectively. Furthermore, shows competitive performance compared classifiers (REGA COMET) whole pol fragments.The its genericity robustness permit perform novel large scale studies. web platform provides an open access, collaborative reproducible classifiers. can be accessed at http://castor.bioinfo.uqam.ca .
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (53)
CITATIONS (48)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....