NFDI4DS | UHH-SEMS - Publication Details

CATHe: detection of remote homologues for CATH superfamilies using embeddings from protein language models

Sequence (biology) Protein sequencing Code (set theory)

DOI: 10.1093/bioinformatics/btad029 Publication Date: 2023-01-17T22:40:51Z

Abstract Supplemental Material References Cited by

AUTHORS (9)

Vamsi Nallapareddy

Nicola Bordin

Ian Sillitoe

Michael Heinzinger

Maria Littmann

Vaishali P Waman

Neeladri Sen

Burkhard Rost

Christine Orengo

ABSTRACT

CATH is a protein domain classification resource that exploits an automated workflow of structure and sequence comparison alongside expert manual curation to construct hierarchical evolutionary structural relationships. The aim this study was develop algorithms for detecting remote homologues missed by state-of-the-art hidden Markov model (HMM)-based approaches. method developed (CATHe) combines neural network with representations obtained from language models. It assessed using dataset having less than 20% identity any in the training set.The CATHe models trained on 1773 largest 50 superfamilies had accuracy 85.6 ± 0.4% 98.2 0.3%, respectively. As further test power detect more HMMs derived domains, we used consisting domains annotations Pfam, but not CATH. By highly reliable predictions (expected error rate <0.5%), were able provide 4.62 million Pfam domains. For subset these Homo sapiens, structurally validated 90.86% comparing their corresponding AlphaFold2 structures which they assigned.The code available https://github.com/vam-sin/CATHe, datasets can be accessed https://zenodo.org/record/6327572.Supplementary data are at Bioinformatics online.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (45)

CITATIONS (25)

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

CATHe: detection of remote homologues for CATH superfamilies using embeddings from protein language models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....