NFDI4DS | UHH-SEMS - Publication Details

Accurate and efficient protein embedding using multi-teacher distillation learning

Representation

DOI: 10.1093/bioinformatics/btae567 Publication Date: 2024-09-24T19:00:57Z

Abstract Supplemental Material References Cited by

AUTHORS (7)

Jiayu Shang

Cheng Peng

Yongxin Ji

Jiaojiao Guan

Dehan Cai

Xubo Tang

Yanni Sun

ABSTRACT

Abstract Motivation Protein embedding, which represents proteins as numerical vectors, is a crucial step in various learning-based protein annotation/classification problems, including gene ontology prediction, protein-protein interaction and structure prediction. However, existing embedding methods are often computationally expensive due to their large number of parameters, can reach millions or even billions. The growing availability large-scale datasets the need for efficient analysis tools have created pressing demand methods. Results We propose novel approach based on multi-teacher distillation learning, leverages knowledge multiple pre-trained models learn compact informative representation proteins. Our method achieves comparable performance state-of-the-art while significantly reducing computational costs resource requirements. Specifically, our reduces time by ˜70% maintains ±1.5% accuracy original models. This makes well-suited enables bioinformatics community perform tasks more efficiently. Availability source code MTDP available via https://github.com/KennthShang/MTDP

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (10)

CITATIONS (1)

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications CROSSREF - Publications

PlumX Metrics

Accurate and efficient protein embedding using multi-teacher distillation learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....