Gene set proximity analysis: expanding gene set enrichment analysis through learned geometric embeddings, with drug-repurposing applications in COVID-19

0301 basic medicine Original Paper 03 medical and health sciences SARS-CoV-2 Drug Repositioning Humans COVID-19 Reproducibility of Results Retrospective Studies 3. Good health
DOI: 10.1093/bioinformatics/btac735 Publication Date: 2022-11-17T12:45:04Z
ABSTRACT
Abstract Motivation Gene set analysis methods rely on knowledge-based representations of genetic interactions in the form of both gene set collections and protein–protein interaction (PPI) networks. However, explicit representations of genetic interactions often fail to capture complex interdependencies among genes, limiting the analytic power of such methods. Results We propose an extension of gene set enrichment analysis to a latent embedding space reflecting PPI network topology, called gene set proximity analysis (GSPA). Compared with existing methods, GSPA provides improved ability to identify disease-associated pathways in disease-matched gene expression datasets, while improving reproducibility of enrichment statistics for similar gene sets. GSPA is statistically straightforward, reducing to a version of traditional gene set enrichment analysis through a single user-defined parameter. We apply our method to identify novel drug associations with SARS-CoV-2 viral entry. Finally, we validate our drug association predictions through retrospective clinical analysis of claims data from 8 million patients, supporting a role for gabapentin as a risk factor and metformin as a protective factor for severe COVID-19. Availability and implementation GSPA is available for download as a command-line Python package at https://github.com/henrycousins/gspa. Supplementary information Supplementary data are available at Bioinformatics online.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (44)
CITATIONS (8)