- Computational Drug Discovery Methods
- Machine Learning in Materials Science
- Advanced Combustion Engine Technologies
- Combustion and flame dynamics
- Chemical Synthesis and Analysis
- Machine Learning in Bioinformatics
- Vehicle emissions and performance
- Analytical Chemistry and Chromatography
- Protein Structure and Dynamics
- Bioinformatics and Genomic Networks
- Data Visualization and Analytics
- Biomedical Text Mining and Ontologies
- Advanced Text Analysis Techniques
- Microbial Metabolic Engineering and Bioproduction
- Metabolomics and Mass Spectrometry Studies
- Microbial Natural Products and Biosynthesis
- Various Chemistry Research Topics
- Turbomachinery Performance and Optimization
- Biodiesel Production and Applications
- Refrigeration and Air Conditioning Technologies
- Scientific Computing and Data Management
- RNA and protein synthesis mechanisms
- Genomics and Phylogenetic Studies
- Topological and Geometric Data Analysis
- Amino Acid Enzymes and Metabolism
Wageningen University & Research
2025
IBM Research - Zurich
2019-2024
École Polytechnique Fédérale de Lausanne
2022-2024
Convergent Science (United States)
2014-2024
Laboratoire d'Informatique Fondamentale de Lille
2023
Signal Processing (United States)
2023
University of Vienna
2023
University of Bern
1970-2022
Convergence
2018-2019
Argonne National Laboratory
2018
Molecular fingerprints are essential cheminformatics tools for virtual screening and mapping chemical space. Among the different types of fingerprints, substructure perform best small molecules such as drugs, while atom-pair preferable large peptides. However, no available fingerprint achieves good performance on both classes molecules.
Abstract The chemical sciences are producing an unprecedented amount of large, high-dimensional data sets containing structures and associated properties. However, there currently no algorithms to visualize such while preserving both global local features with a sufficient level detail allow for human inspection interpretation. Here, we propose solution this problem new visualization method, TMAP, capable representing up millions points arbitrary high dimensionality as two-dimensional tree (...
Predicting the nature and outcome of reactions using computational methods is a crucial tool to accelerate chemical research. The recent application deep learning-based learned fingerprints reaction classification yield prediction has shown an impressive increase in performance compared previous such as DFT- structure-based fingerprints. However, require large training data sets, are inherently biased, based on complex learning architectures. Here we present differential fingerprint DRFP....
Abstract Enzyme catalysts are an integral part of green chemistry strategies towards a more sustainable and resource-efficient chemical synthesis. However, the use biocatalysed reactions in retrosynthetic planning clashes with difficulties predicting enzymatic activity on unreported substrates enzyme-specific stereo- regioselectivity. As now, only rule-based systems support using biocatalysis, while initial data-driven approaches limited to forward predictions. Here, we extend reaction as...
Among the various molecular fingerprints available to describe small organic molecules, extended connectivity fingerprint, up four bonds (ECFP4) performs best in benchmarking drug analog recovery studies as it encodes substructures with a high level of detail. Unfortunately, ECFP4 requires dimensional representations (≥ 1024D) perform well, resulting nearest neighbor searches very large databases such GDB, PubChem or ZINC slowly due curse dimensionality.Herein we report new called MinHash...
Here we present SmilesDrawer, a dependency-free JavaScript component capable of both parsing and drawing SMILES-encoded molecular structures client-side, developed to be easily integrated into web projects display organic molecules in large numbers fast succession. SmilesDrawer can draw structurally stereochemically complex such as maitotoxin C60 without using templates, yet has an exceptionally small computational footprint low memory usage the requirement for loading images or any other...
During the past decade, big data have become a major tool in scientific endeavors. Although statistical methods and algorithms are well-suited for analyzing summarizing enormous amounts of data, results do not allow visual inspection entire data. Current software, including R packages Python libraries such as ggplot2, matplotlib plot.ly, support interactive visualizations datasets exceeding 100 000 points on web. Other solutions enable web-based visualization only through reduction or...
Organic reactions are usually assigned to classes grouping with similar reagents and mechanisms. Reaction facilitate communication of complex concepts efficient navigation through chemical reaction space. However, the classification process is a tedious task, requiring identification corresponding class template via annotation number molecules in reactions, center distinction between reactants reagents. In this work, we show that transformer-based models can infer from non-annotated, simple...
<div class="section abstract"><div class="htmlview paragraph">A computational fluid dynamics (CFD) guided combustion system optimization was conducted for a heavy-duty diesel engine running with gasoline fuel that has research octane number (RON) of 80. The goal to optimize the compression ignition (GCI) recipe (piston bowl geometry, injector spray pattern, in-cylinder swirl motion, and thermal boundary conditions) improved efficiency while maintaining engine-out...
<p>The chemical sciences are producing an unprecedented amount of large, high-dimensional data sets containing structures and associated properties. However, there currently no algorithms to visualize such while preserving both global local features with a sufficient level detail allow for human inspection interpretation. Here, we propose solution this problem new visualization method, TMAP, capable representing up millions points arbitrary high dimensionality as two-dimensional tree...
In recent years, the use of machine learning-based surrogate models for computational fluid dynamics (CFD) simulations has emerged as a promising technique reducing cost associated with engine design optimization. However, such methods still suffer from drawbacks. One main disadvantage is that default learning (ML) hyperparameters are often severely suboptimal given problem. This been addressed by manually trying out different hyperparameter settings, but this solution ineffective in case...
Recent advances in language modeling have had a tremendous impact on how we handle sequential data science. Language architectures emerged as hotbed of innovation and creativity natural processing over the last decade, since gained prominence proteins chemical processes, elucidating structural relationships from textual/sequential data. Surprisingly, some these refer to three-dimensional features, raising important questions dimensionality information encoded within Here, demonstrate that...
Members of the classical transient receptor potential protein (TRPC) family are considered as key components phospholipase C (PLC)-dependent Ca2+ signaling. Previous results obtained in HEK 293 expression system suggested a physical and functional coupling TRPC3 to cardiac-type Na+/Ca2+ exchanger, NCX1 (sodium calcium exchanger 1). This study was designed test for (transient channel 3) existence native TRPC3/NCX1 signaling complex rat cardiac myocytes.Protein cellular distribution were...
Herein we report the discovery of antimicrobial bridged bicyclic peptides (AMBPs) active against Pseudomonas aeruginosa, a highly problematic Gram negative bacterium in hospital environment. Two these AMBPs show strong biofilm inhibition and dispersal activity enhance polymyxin, currently last resort antibiotic which resistance is emerging. To discover our used concept chemical space, well known area small molecule drug discovery, to define number test compounds for synthesis experimental...
The recent general availability of low-cost virtual reality headsets and accompanying three-dimensional (3D) engine support presents an opportunity to bring the concept chemical space into environments. While applications represent a category widespread tools in other fields, their use visualization exploration abstract data such as spaces has been experimental. In our previous work, we established interactive two-dimensional (2D) maps followed by web-based 3D visualizations, culminating...
Abstract Seven million of the currently 94 entries in PubChem database break at least one four Lipinski constraints for oral bioavailability, 183,185 which are also found ChEMBL database. These non‐Lipinski (NLP) and (NLC) subsets interesting because they contain new modalities that can display biological properties not accessible to small molecule drugs. Unfortunately, current search tools designed molecules well suited explore these subsets, therefore remain poorly appreciated. Herein we...
<p><b>Background</b>: Molecular fingerprints are essential cheminformatics tools for virtual screening and mapping chemical space. Among the different types of fingerprints, substructure perform best small molecules such as drugs, while atom-pair preferable large peptides. However, no available fingerprint achieves good performance on both classes molecules.</p> <p><b>Results</b>: Here we set out to design a new suitable by combining concepts. Our...
We demonstrate and discuss the feasibility of autonomous first-principles mechanistic explorations for providing quantum chemical data to enhance confidence data-driven retrosynthetic synthesis design based on molecular transformers.
Enzymatic reactions are an ecofriendly, selective, and versatile addition, sometimes even alternative to organic for the synthesis of chemical compounds such as pharmaceuticals or fine chemicals. To identify suitable reactions, computational models predict activity enzymes on non-native substrates, perform retrosynthetic pathway searches, outcomes including regio- stereoselectivity becoming increasingly important. However, current approaches substantially hindered by limited amount available...