CSI-GEP: A GPU-based unsupervised machine learning approach for recovering gene expression programs in atlas-scale single-cell RNA-seq data

RNA-Seq
DOI: 10.1016/j.xgen.2024.100739 Publication Date: 2025-01-08T15:39:27Z
ABSTRACT
Exploratory analysis of single-cell RNA sequencing (scRNA-seq) typically relies on hard clustering over two-dimensional projections like uniform manifold approximation and projection (UMAP). However, such methods can severely distort the data have many arbitrary parameter choices. Methods that model scRNA-seq as non-discrete "gene expression programs" (GEPs) better preserve data's structure, but currently, they are often not scalable, consistent across repeated runs, lack an established method for choosing key parameters. Here, we developed a GPU-based unsupervised learning approach, "consensus scalable inference gene (CSI-GEP). We show CSI-GEP recover ground truth GEPs in real simulated atlas-scale datasets, significantly outperforming cutting-edge methods, including GPT-based neural networks. applied to whole mouse brain atlas 2.2 million cells, disentangling endothelial cell types missed by other integrated human tumors lines, discovering mesenchymal-like unique cancer cells growing culture.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (83)
CITATIONS (0)