Do Domain-Specific Protein Language Models Outperform General Models on Immunology-Related Tasks?

Leverage (statistics)
DOI: 10.1101/2023.10.17.562795 Publication Date: 2023-10-20T17:20:36Z
ABSTRACT
Abstract Deciphering the antigen recognition capabilities by T cell and B receptors (antibodies) is essential for advancing our understanding of adaptive immune system responses. In recent years, development protein language models (PLMs) has facilitated bioinformatic pipelines where complex amino acid sequences are transformed into vectorized embeddings, which then applied to a range downstream analytical tasks. With their success, we have witnessed emergence domain-specific PLMs tailored specific proteins, such as receptors. Domain-specific often assumed possess enhanced representation targeted applications, however, this assumption not been thoroughly evaluated. manuscript, assess efficacy both generalist transformer-based embeddings in characterizing Specifically, accuracy that leverage these predict specificity elucidate evolutionary changes cells undergo during an response. We demonstrate prevailing notion outperforming general requires more nuanced examination. also observe remarkable differences between PLMs, only terms performance but manner they encode information. Finally, choice size embedding layer model hyperparameters different Overall, analyzes reveal promising potential modeling function while providing insights information-handling capabilities. discuss crucial factors should be taken account when selecting PLM particular task.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (64)
CITATIONS (1)