Utilizing Semi-supervised Method in Predicting BRCA1 Pathogenicity Variants

DOI: 10.1016/j.procs.2023.10.500 Publication Date: 2023-11-25T16:58:10Z
ABSTRACT
Quantifying the effect of mutations in BRCA1 gene is useful for understanding their clinical consequences on breast cancer. Machine learning models can be applied to predict landscape protein variant effects that might not always accessible by experiments. In this work, we propose a simple semi-supervised method using Gaussian mixture model ∼90% unlabeled missense variants collected from ClinVar database. High-quality embeddings are used as feature sequences, extracted latest pre-trained transformer-based language model. A statistical test show effective and robust predicting pathogenicity. Further, lower representations features then fed into The prediction performance only labeled testing data achieves an AUC score accuracy 79.27% 71.58%, respectively. Using our defined pathogenic probability score, find ∼94% dataset well-separated either benign or classes according scoring. Our scores obtain moderate Spearman rank correlation with results established unsupervised models. Finally, approach potentially developed more accurate biologically reliable predictions effects.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (29)
CITATIONS (5)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....