PhaVIP: Phage VIrion Protein classification based on chaos game representation and Vision Transformer

Leverage (statistics)
DOI: 10.48550/arxiv.2301.12422 Publication Date: 2023-01-01
ABSTRACT
Motivation: As viruses that mainly infect bacteria, phages are key players across a wide range of ecosystems. Analyzing phage proteins is indispensable for understanding phages' functions and roles in microbiomes. High-throughput sequencing enables us to obtain different microbiomes with low cost. However, compared the fast accumulation newly identified phages, protein classification remains difficult. In particular, fundamental need annotate virion proteins, structural such as major tail, baseplate etc. Although there experimental methods identification, they too expensive or time-consuming, leaving large number unclassified. Thus, great demand develop computational method accurate classification. Results: this work, we adapted state-of-the-art image model, Vision Transformer, conduct By encoding sequences into unique images using chaos gaming representation, can leverage Transformer learn both local global features from sequence ``images''. Our method, PhaVIP, has two main functions: classifying PVP non-PVP annotating types PVP, capsid tail. We tested PhaVIP on several datasets increasing difficulty benchmarked it against alternative tools. The results show superior performance. After validating performance investigated applications use output PhaVIP: taxonomy host prediction. benefit classified rather than all proteins.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....