- Machine Learning in Materials Science
- Protein Structure and Dynamics
- Computational Drug Discovery Methods
- Machine Learning in Bioinformatics
- Embedded Systems Design Techniques
- Parallel Computing and Optimization Techniques
- Scientific Computing and Data Management
- Bioinformatics and Genomic Networks
- Real-Time Systems Scheduling
- Interconnection Networks and Systems
- Genetics, Bioinformatics, and Biomedical Research
- Enzyme Structure and Function
- RNA and protein synthesis mechanisms
- RNA modifications and cancer
- Biomedical Text Mining and Ontologies
- Forensic and Genetic Research
- Microbial Natural Products and Biosynthesis
- Adversarial Robustness in Machine Learning
- Machine Learning and Data Classification
- Cancer-related molecular mechanisms research
- Cell Image Analysis Techniques
- Software Engineering Research
- Various Chemistry Research Topics
- Advanced Optical Network Technologies
- Synthesis and biological activity
Stanford University
2018-2023
Stanford Medicine
2018-2019
Google (United States)
2017-2019
This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu scheduling language with novel commands explicitly manage the complexities that arise when targeting these systems. The is areas of image processing, stencils, linear algebra deep learning. has two main features: it relies on flexible representation based model rich allowing fine-grained control optimizations....
Learning on 3D structures of large biomolecules is emerging as a distinct area in machine learning, but there has yet to emerge unifying network architecture that simultaneously leverages the graph-structured and geometric aspects problem domain. To address this gap, we introduce vector perceptrons, which extend standard dense layers operate collections Euclidean vectors. Graph neural networks equipped with such are able perform both relational reasoning efficient natural representations...
This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu scheduling language with novel extensions explicitly manage the complexities that arise when targeting these systems. The is areas of image processing, stencils, linear algebra deep learning. has two main features: it relies on flexible representation based model rich allowing fine-grained control...
A pervasive challenge in drug design is determining how to expand a ligand─a small molecule that binds target biomolecule─in order improve various properties of the ligand. Adding single chemical groups, known as fragments, important for lead optimization tasks, and adding multiple fragments critical fragment-based design. We have developed comprehensive framework uses machine learning three-dimensional protein–ligand structures address this challenge. Our method, FRAME, iteratively...
A bstract Computationally-aided design of novel molecules has the potential to accelerate drug discovery. Several recent generative models aimed create new for specific protein targets. However, a rate limiting step in development is molecule optimization, which can take several years due challenge optimizing multiple molecular properties at once. We developed method solve optimization problem silico : expanding small, fragment-like starting bound pocket into larger that matches...
Despite an explosion in the number of experimentally determined, atomically detailed structures biomolecules, many critical tasks structural biology remain data-limited. Whether performance such can be improved by using large repositories tangentially related data remains open question. To address this question, we focused on a central problem biology: predicting how proteins interact with one another---that is, which surfaces protein bind to those another protein. We built training dataset,...
Computational methods that operate on three-dimensional molecular structure have the potential to solve important questions in biology and chemistry. In particular, deep neural networks gained significant attention, but their widespread adoption biomolecular domain has been limited by a lack of either systematic performance benchmarks or unified toolkit for interacting with data. To address this, we present ATOM3D, collection both novel existing benchmark datasets spanning several key...
Halide is a domain-specific language for fast image processing that separates pipelines into the algorithm, which defines what values are computed, and schedule, how they computed. Changes to schedule guaranteed not change results. While supports parallelizing vectorizing naturally data-parallel operations, it does support same scheduling reductions. Instead, programmer must create data parallelism by manually factoring reductions multiple stages. This manipulation of algorithm can introduce...
Halide is a domain-specific language for fast image processing that separates pipelines into the algorithm, which defines what values are computed, and schedule, how they computed. Changes to schedule guaranteed not change results. While supports parallelizing vectorizing naturally data-parallel operations, it does support same scheduling reductions. Instead, programmer must create data parallelism by manually factoring reductions multiple stages. This manipulation of algorithm can introduce...
SARS-CoV-2 infection is mediated by interactions between the receptor binding domain (RBD) of viral spike proteins and host cell angiotensin converting enzyme 2 (ACE2) receptors. Mutations in protein are primary cause for neutralizing antibody escape leading to breakthrough infections. We characterize fitness landscape underpinning future variants concern combining supervised machine learning Markov Chain Monte Carlo. Leveraging deep mutational scanning (DMS) data characterizing affinity RBD...
Abstract Machine learning research concerning protein structure has seen a surge in popularity over the last years with promising advances for basic science and drug discovery. Working macromolecular machine context requires an adequate numerical representation, researchers have extensively studied representations such as graphs, discretized 3D grids, distance maps. As part of CASP14, we explored new conceptually simple representation blind experiment: atoms points 3D, each associated...
Proteins are miniature machines whose function depends on their three-dimensional (3D) structure. Determining this structure computationally remains an unsolved grand challenge. A major bottleneck involves selecting the most accurate structural model among a large pool of candidates, task addressed in quality assessment. Here, we present novel deep learning approach to assess protein model. Our network builds point-based representation atomic and rotation-equivariant convolutions at...
This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu scheduling language with novel extensions explicitly manage the complexities that arise when targeting these systems. The is areas of image processing, stencils, linear algebra deep learning. has two main features: it relies on flexible representation based model rich allowing fine-grained control...
Most widely used ligand docking methods assume a rigid protein structure. This leads to problems when the structure of target deforms upon binding. In particular, ligand's true binding pose is often scored very unfavorably due apparent clashes between and atoms, which lead extremely high values calculated van der Waals energy term. Traditionally, this problem has been addressed by explicitly searching for receptor conformations account flexibility in Here we present deep learning model...
Deep learning promises to dramatically improve scoring functions for molecular docking, leading substantial advances in binding pose prediction and virtual screening. To train functions-and perform docking-one must generate a set of candidate ligand poses. Unfortunately, the sampling protocols currently used poses frequently fail produce any close correct, experimentally determined pose, unless information about correct is provided. This limits accuracy learned docking. Here, we describe two...