Integrative Unsupervised and Supervised Learning Approaches for Breast Cancer Subtype Classification Using Gene Expression Data
Supervised Learning
DOI:
10.20944/preprints202504.2451.v1
Publication Date:
2025-04-30T01:05:33Z
AUTHORS (3)
ABSTRACT
Breast cancer is a heterogeneous disease with distinct molecular subtypes that require precise classification for personalized treatment strategies. This study proposes an integrative methodology combining unsupervised and supervised learning techniques (hybrid learning) to classify breast using gene expression data from the Gene Expression Omnibus (GEO) repository. Hierarchical clustering employed as exploratory approach, both Euclidean distance Pearson correlation reveal intrinsic structures. For classification, four machine models—Logistic Regression, Support Vector Machine (SVM), Random Forest, Multilayer Perceptron (MLP)—are applied. These models are further optimized via Optuna framework enhance performance through hyperparameter tuning. SHAP values used assess importance of features, contributing model interpretability. The results show approaches complementary, offering accuracy insight into subtype differentiation. Notably, by significantly outperformed non-optimized counterparts. findings emphasize potential combined methodologies in supporting early accurate diagnosis subtypes.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....