Improving Machine Learning–Based Bacterial Discrimination by Learning Single‐Cell Raman Data From Multiple Growth Phases

DOI: 10.1002/jrs.6804 Publication Date: 2025-03-23T17:00:33Z
ABSTRACT
ABSTRACTBacterial discrimination using single‐cell Raman spectroscopy and machine/deep learning techniques has been widely explored for promising applications in medical, environmental, and food sciences. To construct a machine‐learning model that can achieve highly accurate and robust discrimination of bacteria in real‐world samples, data consisting of Raman spectra of bacterial cells acquired under various physiological conditions are essential. Despite much effort to study the effects of growth phase on bacterial discrimination, it is not yet fully elucidated which growth phase(s) needs to be included in training data to efficiently improve discrimination accuracy and what growth phase‐dependent changes in cellular components underlie accurate discrimination. Here, we used random forest (RF), an ensemble machine learning method, to discriminate six model bacterial species, including both Gram‐positive and Gram‐negative bacteria, at five different growth phases ranging from lag to late stationary phases. We compared four RF classification models that were trained on Raman data from one (either midexponential or late stationary), two (midexponential and late stationary), and all five growth phases. The species discrimination accuracy of the model built on the training data consisting of the two distinctly different growth phases exceeded 80% with a marked increase of 24% and 32.5% relative to the models learning data from a single growth phase. This increase was greater than what we found in going from training data with two growth phases to that with all five growth phases (13%). We also revealed that Raman bands that are relatively invariant (e.g., proteins) and specific to the growth phase (e.g., DNA/RNA and intracellular storage materials) are both important for attaining accurate bacterial discrimination. The present study provides a simple yet effective way to construct training data for good discrimination performance, which could be extended to discriminate bacterial cells under other physiological conditions such as nutrient, temperature, and pH.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (37)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....