T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors

T4SS 0303 health sciences 03 medical and health sciences T4SE Prediction Helicobacter pylori T4SEs Deep learning Protein language model T4SEpp TP248.13-248.65 Biotechnology Research Article
DOI: 10.1016/j.csbj.2024.01.015 Publication Date: 2024-01-23T07:36:04Z
ABSTRACT
Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification is a crucial step in understanding mechanisms bacterial pathogenicity, but this remains major challenge. In study, we used full-length embedding features generated by six pre-trained protein language models train classifiers predicting T4SEs and compared their performance. We integrated three modules model called T4SEpp. first module searched for homologs known T4SEs, signal sequences, effector domains; second fine-tuned machine learning using data sequence feature; third best-performing models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at high specificity ∼0.99, based on assessment an independent validation dataset. predicted 13 from
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (92)
CITATIONS (8)