- Protein Structure and Dynamics
- Machine Learning in Bioinformatics
- Computational Drug Discovery Methods
- RNA and protein synthesis mechanisms
- Genomics and Phylogenetic Studies
- Enzyme Structure and Function
- Bioinformatics and Genomic Networks
- Machine Learning in Materials Science
- SARS-CoV-2 and COVID-19 Research
- vaccines and immunoinformatics approaches
- Genetics, Bioinformatics, and Biomedical Research
- CAR-T cell therapy research
- Viral gastroenteritis research and epidemiology
- CRISPR and Genetic Engineering
- Genomics and Rare Diseases
- Animal Virus Infections Studies
- Advanced Proteomics Techniques and Applications
- DNA and Nucleic Acid Chemistry
- Cell Image Analysis Techniques
- Advanced MEMS and NEMS Technologies
- Nanotechnology research and applications
- Nanopore and Nanochannel Transport Studies
- Single-cell and spatial transcriptomics
- RNA Research and Splicing
- Microbial Metabolic Engineering and Bioproduction
Columbia University
2021-2025
Columbia University Irving Medical Center
2021-2024
Harvard University Press
2023
Center for Systems Biology
2015-2021
Harvard University
2014-2021
Stanford University
2011-2012
Abstract AlphaFold2 revolutionized structural biology with the ability to predict protein structures exceptionally high accuracy. Its implementation, however, lacks code and data required train new models. These are necessary (i) tackle tasks, like protein-ligand complex structure prediction, (ii) investigate process by which model learns, remains poorly understood, (iii) assess model’s generalization capacity unseen regions of fold space. Here we report OpenFold, a fast, memory-efficient,...
AlphaFold2 revolutionized structural biology with the ability to predict protein structures exceptionally high accuracy. Its implementation, however, lacks code and data required train new models. These are necessary (1) tackle tasks, like protein–ligand complex structure prediction, (2) investigate process by which model learns (3) assess model's capacity generalize unseen regions of fold space. Here we report OpenFold, a fast, memory efficient trainable implementation AlphaFold2. We...
Cells are essential to understanding health and disease, yet traditional models fall short of modeling simulating their function behavior. Advances in AI omics offer groundbreaking opportunities create an virtual cell (AIVC), a multi-scale, multi-modal large-neural-network-based model that can represent simulate the behavior molecules, cells, tissues across diverse states. This Perspective provides vision on design how collaborative efforts build AIVCs will transform biological research by...
Rapid progress in deep learning has spurred its application to bioinformatics problems including protein structure prediction and design. In classic machine like computer vision, been driven by standardized data sets that facilitate fair assessment of new methods lower the barrier entry for non-domain experts. While sequence exist, they lack certain components critical learning, high-quality multiple alignments insulated training/validation splits account but only weakly detectable homology...
ABSTRACT AlphaFold2 and related systems use deep learning to predict protein structure from co-evolutionary relationships encoded in multiple sequence alignments (MSAs). Despite dramatic, recent increases accuracy, three challenges remain: (i) prediction of orphan rapidly evolving proteins for which an MSA cannot be generated, (ii) rapid exploration designed structures, (iii) understanding the rules governing spontaneous polypeptide folding solution. Here we report development end-to-end...
Understanding coding mutations is important for many applications in biology and medicine but the vast mutation space makes comprehensive experimental characterisation impossible. Current predictors are often computationally intensive difficult to scale, including recent deep learning models. We introduce Sequence UNET, a highly scalable architecture that classifies predicts variant frequency from sequence alone using multi-scale representations fully convolutional compression/expansion...
Abstract This protocol describes the computational steps necessary to reproduce results described in paper " Unified rational protein engineering with sequence-only deep representation learning by Alley et al.
The functions of most proteins result from their 3D structures, but determining structures experimentally remains a challenge, despite steady advances in crystallography, NMR and single-particle cryoEM. Computationally predicting the structure protein its primary sequence has long been grand challenge bioinformatics, intimately connected with understanding chemistry dynamics. Recent deep learning, combined availability genomic data for inferring co-evolutionary patterns, provide new approach...
Abstract Rational protein engineering requires a holistic understanding of function. Here, we apply deep learning to unlabelled amino acid sequences distill the fundamental features into statistical representation that is semantically rich and structurally, evolutionarily, biophysically grounded. We show simplest models built on top this uni fied rep resentation (UniRep) are broadly applicable generalize unseen regions sequence space. Our data-driven approach reaches near state-of-the-art or...
Proteins power a vast array of functional processes in living cells. The capability to create new proteins with designed structures and functions would thus enable the engineering cellular behavior development protein-based therapeutics materials. Structure-based protein design aims find that are designable (can be realized by sequence), novel (have dissimilar geometry from natural proteins), diverse (span wide range geometries). While advances structure prediction have made it possible...
The emergence of SARS-CoV-2 underscores the need to better understand evolutionary processes that drive and adaptation zoonotic viruses in humans. In betacoronavirus genus, which also includes SARS-CoV MERS-CoV, recombination frequently encompasses receptor binding domain (RBD) Spike protein, is responsible for viral host cell receptors. this work, we reconstruct events have accompanied SARS-CoV-2, with a special emphasis on RBD its receptor, human ACE2.By means phylogenetic analyses, found...
Abstract The emergence of SARS-CoV-2 underscores the need to better understand evolutionary processes that drive and adaptation zoonotic viruses in humans. In betacoronavirus genus, which also includes SARS-CoV MERS-CoV, recombination frequently encompasses Receptor Binding Domain (RBD) Spike protein, which, turn, is responsible for viral binding host cell receptors. Here, we find evidence a event RBD involving ancestral linages both SARS-CoV-2. Although cannot specify recombinant nor...
Kinases have been the focus of drug discovery programs for three decades leading to over 70 therapeutic kinase inhibitors and biophysical affinity measurements 130,000 kinase-compound pairs. Nonetheless, precise target spectrum many kinases remains only partly understood. In this study, we describe a computational approach unlocking qualitative quantitative kinome-wide binding structure-based machine learning. Our study has components: (i) Kinase Inhibitor Complex (KinCo) data set comprising...