The Identification of Breast Cancer Subtypes by Raman Spectroscopy Integrated With Machine Learning Algorithms: Analyzing the Influence of Baseline

DOI: 10.1002/jrs.6799 Publication Date: 2025-03-24T22:24:42Z
ABSTRACT
ABSTRACTThe question of how the baseline of Raman spectroscopy impacts data models has remained unexplored. In this research, we utilized three spectral datasets—raw, preprocessed, and baseline data—to construct identification models for breast cancer molecular subtypes using four machine learning algorithms and examined and analyzed the influence of baseline data on the performance of these models. In the identification models for cancer cell molecular subtypes, regardless of whether they pertained to normal or breast cancer cells, preprocessed data consistently yielded the most optimal model performance, trailed by raw data, and ultimately followed by baseline data. Despite the baseline data giving the worst classification performance, when coupled with the artificial neural network, it consistently attained a recognition accuracy of approximately 92.50 ± 5.30% in the binary classification and 90.60 ± 1.52% in the five‐class classification. The results suggested that baseline data held a notable contribution to the performance of data models. Looking ahead, it could potentially harness the concept of food by‐product processing to maximize the utilization of baseline data. Furthermore, when integrated with feature visualization strategies, the UVE‐SPA and ICO approaches, employing merely 30 or 258 variables, respectively, were able to yield model results comparable to those of preprocessed data (with 858 variables), attaining an accuracy of 96.00 ± 1.87%. This underscored the pivotal role of the selected Raman spectral regions in distinguishing breast cancer molecular subtypes. Beyond the standard protein, lipid, and nucleic acid regions, the selected features encompassed cysteine, phenylalanine, and carotenoid, all of which, according to established research, had held crucial significance in the development and progression of cancer. This project delved into the impact of Raman baseline on model outcomes, furnishing valuable data to enhance future Raman spectroscopy modeling techniques and igniting discussions on the untapped potential of baseline data in forthcoming endeavors.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (43)
CITATIONS (0)