Topic modeling for untargeted substructure exploration in metabolomics

Substructure Identification Fragmentation
DOI: 10.1073/pnas.1608041113 Publication Date: 2016-11-18T01:21:05Z
ABSTRACT
Significance Tandem MS is a technique for compound identification in untargeted metabolomics experiments. Because of lack reference spectra, most molecules cannot be identified, and many spectra used. We present MS2LDA, an unsupervised method (inspired by text-mining) that extracts common patterns mass fragments neutral losses—Mass2Motifs—from collections fragmentation spectra. Structurally characterized Mass2Motifs can used to annotate which no exist expose biochemical relationships between molecules. For four beer extracts, without training data, we show that, with 30 structurally Mass2Motifs, approximately three times as library matching. These were validated from Global Natural Products Social Molecular Networking (GNPS) MassBank.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (34)
CITATIONS (301)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....