Accurate and efficient protein embedding using multi-teacher distillation learning

Representation
DOI: 10.1093/bioinformatics/btae567 Publication Date: 2024-09-24T19:00:57Z
ABSTRACT
Abstract Motivation Protein embedding, which represents proteins as numerical vectors, is a crucial step in various learning-based protein annotation/classification problems, including gene ontology prediction, protein-protein interaction and structure prediction. However, existing embedding methods are often computationally expensive due to their large number of parameters, can reach millions or even billions. The growing availability large-scale datasets the need for efficient analysis tools have created pressing demand methods. Results We propose novel approach based on multi-teacher distillation learning, leverages knowledge multiple pre-trained models learn compact informative representation proteins. Our method achieves comparable performance state-of-the-art while significantly reducing computational costs resource requirements. Specifically, our reduces time by ˜70% maintains ±1.5% accuracy original models. This makes well-suited enables bioinformatics community perform tasks more efficiently. Availability source code MTDP available via https://github.com/KennthShang/MTDP
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (10)
CITATIONS (1)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....