T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors
T4SS
0303 health sciences
03 medical and health sciences
T4SE Prediction
Helicobacter pylori T4SEs
Deep learning
Protein language model
T4SEpp
TP248.13-248.65
Biotechnology
Research Article
DOI:
10.1016/j.csbj.2024.01.015
Publication Date:
2024-01-23T07:36:04Z
AUTHORS (10)
ABSTRACT
Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification is a crucial step in understanding mechanisms bacterial pathogenicity, but this remains major challenge. In study, we used full-length embedding features generated by six pre-trained protein language models train classifiers predicting T4SEs and compared their performance. We integrated three modules model called T4SEpp. first module searched for homologs known T4SEs, signal sequences, effector domains; second fine-tuned machine learning using data sequence feature; third best-performing models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at high specificity ∼0.99, based on assessment an independent validation dataset. predicted 13 from
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (92)
CITATIONS (8)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....