Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction

Discriminative model Protein function prediction Similarity (geometry)
DOI: 10.1371/journal.pcbi.1010793 Publication Date: 2022-12-22T18:55:50Z
ABSTRACT
Accurate identification of protein function is critical to elucidate life mechanisms and design new drugs. We proposed a novel deep-learning method, ATGO, predict Gene Ontology (GO) attributes proteins through triplet neural-network architecture embedded with pre-trained language models from sequences. The method was systematically tested on 1068 non-redundant benchmarking 3328 targets the third Critical Assessment Protein Function Annotation (CAFA) challenge. Experimental results showed that ATGO achieved significant increase GO prediction accuracy compared state-of-the-art approaches in all aspects molecular function, biological process, cellular component. Detailed data analyses major advantage lies utilization transformer which can extract discriminative functional pattern feature embeddings. Meanwhile, network helps enhance association similarity sequence embedding space. In addition, it found combination scores complementary homology-based inferences could further improve predicted models. These demonstrated avenue for high-accuracy applicable large-scale annotations alone.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (61)
CITATIONS (43)