- Metabolomics and Mass Spectrometry Studies
- Analytical Chemistry and Chromatography
- Computational Drug Discovery Methods
- Mass Spectrometry Techniques and Applications
- Machine Learning in Materials Science
- Genomics and Phylogenetic Studies
- Genetic diversity and population structure
- Innovative Microfluidic and Catalytic Techniques Innovation
- Protein Structure and Dynamics
- Traditional Chinese Medicine Studies
- Identification and Quantification in Food
- Machine Learning in Bioinformatics
- Analytical Methods in Pharmaceuticals
- Advanced Chemical Sensor Technologies
Friedrich Schiller University Jena
2021-2025
Taxonomic classification, that is, the assignment to biological clades with shared ancestry, is a common task in genetics, mainly based on genome similarity search of large databases. The classification quality depends heavily database, since representative relatives must be present. Many genomic sequences cannot classified at all or only high misclassification rate. Here we present BERTax, deep neural network program natural language processing precisely classify superkingdom and phylum DNA...
Abstract Small molecule machine learning aims to predict chemical, biochemical, or biological properties from molecular structures, with applications such as toxicity prediction, ligand binding, and pharmacokinetics. A recent trend is developing end-to-end models that avoid explicit domain knowledge. These assume no coverage bias in training evaluation data, meaning the data are representative of true distribution. However, applicability rarely considered models. Here, we investigate how...
Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, technology capable detecting thousands compounds in biological sample. Metabolite annotation is executed using tandem spectrometry. Spectral library search far from comprehensive, and numerous remain unannotated. So-called silico methods allow us to overcome the restrictions spectral libraries, by searching much larger molecular structure databases. Yet, after...
Abstract Small molecule machine learning tries to predict chemical, biochemical or biological properties from the structure of a molecule. Applications include prediction toxicity, ligand binding retention time. A recent trend is develop end-to-end models that avoid explicit integration domain knowledge via inductive bias. central assumption in doing so, there no coverage bias training and evaluation data, meaning these data are representative subset true distribution we want learn. Usually,...
Abstract Taxonomic classification, i.e., the identification and assignment to groups of biological organisms with same origin characteristics, is a common task in genetics. Nowadays, taxonomic classification mainly based on genome similarity search large databases. In this process, quality depends heavily database since representative relatives have be known already. Many genomic sequences cannot classified at all or only high misclassification rate. Here we present BERTax , program that...
The structural identification of metabolites represents one the current bottlenecks in non-targeted liquid chromatography-mass spectrometry (LC-MS) based metabolomics. Metabolomics Standard Initiative has developed a multilevel system to report confidence metabolite identification, which involves use MS, MS/MS and orthogonal data. Limitations due similar or same fragmentation pattern (e.g. isomeric compounds) can be overcome by additional information retention time (RT), since it is property...
Thousands of publications on the prediction small molecule retention times were published during last decades. The ultimate goal is, without doubt, transferable times: We want to train a model certain set compounds from one dataset and then use predict for different another dataset. Unfortunately, may change massively, even nominally identical chromatographic conditions. Retention order is much better retained, yet if conditions vary. Here, we systematically study what result in notable...
Liquid chromatography is frequently employed for the separation of metabolites and other small molecules. Prediction retention times via machine learning methods can assist compound annotation. Yet, transferable predictions are intrinsically complicated novel compounds chromatographic conditions because depend both on structure system. We present RepoRT, first repository time data. RepoRT presently contains 373 datasets, 8809 unique compounds, 88,325 entries measured 49 different columns...
Liquid chromatography is frequently employed for the separation of metabolites and other small molecules. Prediction retention times via machine learning methods can assist compound annotation. Yet, transferable predictions are intrinsically complicated novel compounds chromatographic conditions because depend both on structure system. We present RepoRT, first repository time data. RepoRT presently contains 373 datasets, 8809 unique compounds, 88,325 entries measured 49 different columns...
Abstract Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, technology capable detecting thousands compounds in biological sample. Metabolite annotation is executed using tandem spectrometry. Spectral library search far from comprehensive, and numerous remain unannotated. So-called silico methods allow us to overcome the restrictions spectral libraries, by searching much larger molecular structure databases. Yet,...