- Computational Drug Discovery Methods
- Machine Learning in Materials Science
- Protein Structure and Dynamics
- Metabolomics and Mass Spectrometry Studies
- Chemistry and Chemical Engineering
- Chemical Synthesis and Analysis
- Genetics, Bioinformatics, and Biomedical Research
- Pharmacogenetics and Drug Metabolism
- Microbial Natural Products and Biosynthesis
- Analytical Chemistry and Chromatography
- Bioinformatics and Genomic Networks
- Cell Image Analysis Techniques
- Various Chemistry Research Topics
- Process Optimization and Integration
- Synthesis and biological activity
- Receptor Mechanisms and Signaling
- Quality Function Deployment in Product Design
- History and advancements in chemistry
- 14-3-3 protein interactions
- Microbial Metabolic Engineering and Bioproduction
- Plant biochemistry and biosynthesis
- Biofuel production and bioconversion
- Ubiquitin and proteasome pathways
- HIV/AIDS drug development and treatment
- Evaluation Methods in Various Fields
Guangzhou Experimental Station
2022-2025
Guangzhou Regenerative Medicine and Health Guangdong Laboratory
2019-2024
Guangzhou Medical University
2024
Peking University
2024
Huazhong University of Science and Technology
2023-2024
Imperial College London
2022-2023
State Key Laboratory of Respiratory Disease
2020-2021
Guangzhou Institutes of Biomedicine and Health
2020-2021
Chinese Academy of Sciences
1998-2021
Guangdong Laboratory Animals Monitoring Institute
2020-2021
This work introduces a method to tune sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn generate structures with certain specified desirable properties. We demonstrate how this execute range of tasks such as generating analogues query structure and compounds predicted be active against biological target. As proof principle, the is first trained molecules do not contain sulphur. second example, drug Celecoxib, technique could...
Abstract A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate potential use autoencoder, a deep learning methodology, for de novo design. Various generative autoencoders were used to map molecule into continuous latent space vice versa their performance as structure generator was assessed. Our results show that preserves chemical similarity principle thus can be...
In the past few years, we have witnessed a renaissance of field molecular de novo drug design. The advancements in deep learning and artificial intelligence (AI) triggered an avalanche ideas on how to translate such techniques variety domains including A range architectures been devised find optimal way generating chemical compounds by using either graph- or string (SMILES)-based representations. With this application note, aim offer community production-ready tool for design, called...
Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces valid and meaningful structures. Herein we perform an extensive benchmark on models subsets GDB-13 different sizes (1 million, 10,000 1000), variants (canonical, randomized DeepSMILES), two recurrent cell types (LSTM GRU) hyperparameter combinations. To guide benchmarks new metrics were developed that define how well model...
Deep learning methods applied to drug discovery have been used generate novel structures. In this study, we propose a new deep architecture, LatentGAN, which combines an autoencoder and generative adversarial neural network for de novo molecular design. We the method in two scenarios: one random drug-like compounds another target-biased compounds. Our results show that works well both cases. Sampled from trained model can largely occupy same chemical space as training set also substantial...
Chemogenomics data generally refers to the activity of chemical compounds on an array protein targets and represents important source information for building in silico target prediction models. The increasing volume chemogenomics offers exciting opportunities build models based Big Data. Preparing a high quality set is vital step realizing this goal work aims compile such comprehensive dataset. This dataset comprises over 70 million SAR points from publicly available databases (PubChem...
Drug repurposing has become an important branch of drug discovery. Several computational approaches that help to uncover new opportunities and aid the discovery process have been put forward, or adapted from previous applications. A number successful exam-ples are now available. Overall, future developments will greatly benefit integration different methods, disciplines. Steps forward in this direction expected clarify, therefore rationally predict, drug-target, target-disease, ulti-mately...
Recent applications of recurrent neural networks (RNN) enable training models that sample the chemical space. In this study we train RNN with molecular string representations (SMILES) a subset enumerated database GDB-13 (975 million molecules). We show model trained 1 structures (0.1% database) reproduces 68.9% entire after training, when sampling 2 billion molecules. also developed method to assess quality process using negative log-likelihood plots. Furthermore, use mathematical based on...
Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions the chemical space. Unfortunately, due to sequential nature strings, these are not able given a scaffold (i.e., partially-built explicit attachment points). Herein we report new SMILES-based molecular architecture that generates from scaffolds and be any arbitrary set. This approach is possible thanks set pre-processing algorithm exhaustively slices all combinations...
Four of the most well-known, commercially available docking programs, FlexX, GOLD, GLIDE, and ICM, have been examined for their ligand-docking virtual-screening capabilities. The relative performance programs in reproducing native ligand conformation from starting SMILES strings 164 high-resolution protein-ligand complexes is presented compared. Applying only scoring functions, latest versions these four were also used to conduct virtual screening 12 protein targets therapeutic interest,...
The absorption, distribution, metabolism, excretion, and toxicity (ADMET) of a compound is dependent on physicochemical properties such as molecular size, lipophilicity, ionization state. However, much less known regarding the relationship between ADMET topology. In this study two descriptors related to topology have been investigated, fraction framework (fMF) sp3-hybridized carbon atoms (Fsp3). fMF Fsp3, together with standard (molecular state, lipophilicity), were analyzed for set assays....
Abstract Deep learning methods applied to chemistry can be used accelerate the discovery of new molecules. This work introduces GraphINVENT, a platform developed for graph-based molecular design using graph neural networks (GNNs). GraphINVENT uses tiered deep network architecture probabilistically generate molecules single bond at time. All models implemented in quickly learn build resembling training set without any explicit programming chemical rules. The have been benchmarked MOSES...
The increasing volume of biomedical data in chemistry and life sciences requires the development new methods approaches for their handling. Here, we briefly discuss some challenges opportunities this fast growing area research with a focus on those to be addressed within BIGCHEM project. article starts brief description available resources "Big Data" discussion importance quality. We then visualization millions compounds by combining chemical biological data, expectations from mining using...
The human bile salt export pump (BSEP) is a membrane protein expressed on the canalicular plasma domain of hepatocytes, which mediates active transport unconjugated and conjugated salts from liver cells into bile. BSEP activity therefore plays an important role in flow. In humans, genetically inherited defects expression or cause cholestatic injury, many drugs that drug-induced injury (DILI) humans have been shown to inhibit vitro vivo. These findings suggest inhibition by could be one...
A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate potential use autoencoder, a deep learning methodology, for de novo design. Various generative autoencoders were used to map molecule into continuous latent space vice versa their performance as structure generator was assessed. Our results show that preserves chemical similarity principle thus can be analogue...
With this application note we aim to offer the community a production-ready tool for de novo design. It can be effectively applied on drug discovery projects that are striving resolve either exploration or exploitation problems while navigating chemical space. By releasing code aiming facilitate research using generative methods and promote collaborative efforts in area so it used as an interaction point future scientific collaborations.
In recent years, deep molecular generative models have emerged as promising methods for de novo design. Thanks to the rapid advance of learning techniques, architectures such recurrent neural networks, variational autoencoders, and adversarial networks been successfully employed constructing models. Recently, quite a few metrics proposed evaluate these However, many cannot chemical space coverage sampled molecules. This work presents novel complementary metric evaluating The is based on...
Conformal prediction has been proposed as a more rigorous way to define confidence compared other application domain concepts that have earlier used for QSAR modeling. One main advantage of such method is it provides region potentially with multiple predicted labels, which contrasts the single valued (regression) or label (classification) output predictions by standard modeling algorithms. Standard conformal might not be suitable imbalanced data sets. Therefore, Mondrian cross-conformal...