- Computational Drug Discovery Methods
- Machine Learning in Materials Science
- Protein Structure and Dynamics
- Chemical Synthesis and Analysis
- Scientific Computing and Data Management
- Innovative Microfluidic and Catalytic Techniques Innovation
- Cell Image Analysis Techniques
- HIV Research and Treatment
- Chemistry and Chemical Engineering
- RNA and protein synthesis mechanisms
- Various Chemistry Research Topics
- Malaria Research and Control
- Metabolomics and Mass Spectrometry Studies
- Mosquito-borne diseases and control
- Click Chemistry and Applications
- Crystallography and molecular interactions
- Image Processing and 3D Reconstruction
- Signaling Pathways in Disease
- Scientific Research and Discoveries
- Advanced Proteomics Techniques and Applications
- Machine Learning and Data Classification
- Surface Chemistry and Catalysis
- AI in cancer detection
- Environmental Impact and Sustainability
- Advanced Computational Techniques and Applications
Google (United States)
2017-2024
Relay Therapeutics (United States)
2021-2024
Stanford University
2013-2021
We investigate the impact of choosing regressors and molecular representations for construction fast machine learning (ML) models thirteen electronic ground-state properties organic molecules. The performance each regressor/representation/property combination is assessed using curves which report out-of-sample errors as a function training set size with up to $\sim$117k distinct Molecular structures at hybrid density functional theory (DFT) level used testing come from QM9 database...
We present a framework, which we call Molecule Deep $Q$-Networks (MolDQN), for molecule optimization by combining domain knowledge of chemistry and state-of-the-art reinforcement learning techniques (double $Q$-learning randomized value functions). directly define modifications on molecules, thereby ensuring 100\% chemical validity. Further, operate without pre-training any dataset to avoid possible bias from the choice that set. Inspired problems faced during medicinal lead optimization,...
We introduce tensor field neural networks, which are locally equivariant to 3D rotations, translations, and permutations of points at every layer. rotation equivariance removes the need for data augmentation identify features in arbitrary orientations. Our network uses filters built from spherical harmonics; due mathematical consequences this filter choice, each layer accepts as input (and guarantees output) scalars, vectors, higher-order tensors, geometric sense these terms. demonstrate...
Massively multitask neural architectures provide a learning framework for drug discovery that synthesizes information from many distinct biological sources. To train these at scale, we gather large amounts of data public sources to create dataset nearly 40 million measurements across more than 200 targets. We investigate several aspects the by performing series empirical studies and obtain some interesting results: (1) massively networks predictive accuracies significantly better single-task...
Chemical reaction data in journal articles, patents, and even electronic laboratory notebooks are currently stored various formats, often unstructured, which presents a significant barrier to downstream applications, including the training of machine-learning models. We present Open Reaction Database (ORD), an open-access schema infrastructure for structuring sharing organic data, centralized repository. The ORD supports conventional emerging technologies, from benchtop reactions automated...
DNA-encoded small molecule libraries (DELs) have enabled discovery of novel inhibitors for many distinct protein targets therapeutic value through screening with up to billions unique molecules. We demonstrate a new approach applying machine learning DEL selection data by identifying active molecules from large commercial collection and virtual library easily synthesizable compounds. train models using only apply automated or automatable filters chemist review restricted the removal...
Target class-focused drug discovery has a strong track record in pharmaceutical research, yet public domain data indicate that many members of protein families remain unliganded. Here we present systematic approach to scale up the and characterization small molecule ligands for WD40 repeat (WDR) family. We developed comprehensive suite protocols production, crystallography, biophysical, biochemical, cellular assays. A pilot hit-finding campaign using DNA-encoded chemical library selection...
Deep learning methods such as multitask neural networks have recently been applied to ligand-based virtual screening and other drug discovery applications. Using a set of industrial ADMET datasets, we compare standard baseline models analyze effects with both random cross-validation more relevant temporal validation scheme. We confirm that can provide modest benefits over single-task show smaller datasets tend benefit than larger from learning. Additionally, find adding massive amounts side...
Massively-Multitask Regression Models (MMRMs) trained on millions of compounds and many thousands assays can predict bioactivity with accuracy comparable to 4-concentration IC50 experiments. Recent advances in hardware algorithms have produced a variety methods for multitask modeling. This report compares the performance six MMRM algorithms: Profile-QSAR (pQSAR), Alchemite, meta learner (MetaNN), feed-forward neural network (MT-DNN), Bayesian factorization side information (Macau) Inductive...
Massively-Multitask Regression Models (MMRMs) trained on millions of compounds and many thousands assays can predict bioactivity with accuracy comparable to 4-concentration IC50 experiments. Recent advances in hardware algorithms have produced a variety methods for multitask modeling. This report compares the performance six MMRM algorithms: Profile-QSAR (pQSAR), Alchemite, meta learner (MetaNN), feed-forward neural network (MT-DNN), Bayesian factorization side information (Macau) Inductive...
We present RL-VAE, a graph-to-graph variational autoencoder that uses reinforcement learning to decode molecular graphs from latent embeddings. Methods have been described previously for autoencoding, but these approaches require sophisticated decoders increase the complexity of training and evaluation (such as requiring parallel encoders or non-trivial graph matching). Here, we repurpose simple generator enable efficient decoding generation graphs.
Retrosynthesis -- the process of identifying a set reactants to synthesize target molecule is vital importance material design and drug discovery. Existing machine learning approaches based on language models graph neural networks have achieved encouraging results. In this paper, we propose framework that unifies sequence- graph-based methods as energy-based (EBMs) with different energy functions. This unified perspective provides critical insights about EBM variants through comprehensive...
Affordable and effective antiviral therapies are needed worldwide, especially against agents such as dengue virus that endemic in underserved regions. Many compounds have been studied cultured cells but unsuitable for clinical applications due to pharmacokinetic profiles, side effects, or inconsistent efficacy across serotypes. Such tool can, however, aid identifying clinically useful treatments. Here, computational screening (Rapid Overlay of Chemical Structures) was used identify entries...