- Protein Structure and Dynamics
- RNA and protein synthesis mechanisms
- Machine Learning in Bioinformatics
- Enzyme Structure and Function
- Genomics and Phylogenetic Studies
- Microbial Metabolic Engineering and Bioproduction
- Bioinformatics and Genomic Networks
- Bacterial Genetics and Biotechnology
- Computational Drug Discovery Methods
- Glycosylation and Glycoproteins Research
- Photosynthetic Processes and Mechanisms
- Protein purification and stability
- Microbial Community Ecology and Physiology
- Mass Spectrometry Techniques and Applications
- Advanced Proteomics Techniques and Applications
- Evolution and Genetic Dynamics
- Genetic diversity and population structure
- Peptidase Inhibition and Analysis
- Machine Learning in Materials Science
- Supramolecular Self-Assembly in Materials
- Monoclonal and Polyclonal Antibodies Research
- Bacteriophages and microbial interactions
- Modular Robots and Swarm Intelligence
- Endoplasmic Reticulum Stress and Disease
- Engineering and Environmental Studies
Harvard University Press
2019-2025
Massachusetts Institute of Technology
2024-2025
Harvard University
2018-2024
University of Washington
2013-2023
Center for Systems Biology
2018-2023
Seoul National University
2021
The University of Tokyo
2021
Michigan State University
2021
Max Planck Institute for Biophysical Chemistry
2021
Seattle University
2014-2019
Abstract ColabFold offers accelerated prediction of protein structures and complexes by combining the fast homology search MMseqs2 with AlphaFold2 or RoseTTAFold. ColabFold’s 40−60-fold faster optimized model utilization enables close to 1,000 per day on a server one graphics processing unit. Coupled Google Colaboratory, becomes free accessible platform for folding. is open-source software available at https://github.com/sokrypton/ColabFold its novel environmental databases are...
Deep learning takes on protein folding In 1972, Anfinsen won a Nobel prize for demonstrating connection between protein’s amino acid sequence and its three-dimensional structure. Since 1994, scientists have competed in the biannual Critical Assessment of Structure Prediction (CASP) protein-folding challenge. methods took center stage at CASP14, with DeepMind’s Alphafold2 achieving remarkable accuracy. Baek et al . explored network architectures based DeepMind framework. They used three-track...
The prediction of interresidue contacts and distances from coevolutionary data using deep learning has considerably advanced protein structure prediction. Here, we build on these advances by developing a residual network for predicting orientations, in addition to distances, Rosetta-constrained energy-minimization protocol rapidly accurately generating models guided restraints. In benchmark tests 13th Community-Wide Experiment the Critical Assessment Techniques Protein Structure Prediction...
Abstract There has been considerable recent progress in designing new proteins using deep-learning methods 1–9 . Despite this progress, a general framework for protein design that enables solution of wide range challenges, including de novo binder and higher-order symmetric architectures, yet to be described. Diffusion models 10,11 have had success image language generative modelling but limited when applied modelling, probably due the complexity backbone geometry sequence–structure...
Recently developed methods have shown considerable promise in predicting residue-residue contacts protein 3D structures using evolutionary covariance information. However, these require large numbers of evolutionarily related sequences to robustly assess the extent residue covariation, and larger family, more likely that contact information is unnecessary because a reasonable model can be built based on structure homolog. Here we describe method integrates sequence coevolution structural...
Do the amino acid sequence identities of residues that make contact across protein interfaces covary during evolution? If so, such covariance could be used to predict contacts and assemble models biological complexes. We find residue pairs identified using a pseudo-likelihood-based method protein-protein in 50S ribosomal unit 28 additional bacterial complexes with known structure are almost always complex, provided number aligned sequences is greater than average length two proteins. use...
ColabFold offers accelerated protein structure and complex predictions by combining the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold. ColabFold’s 40 - 60× faster optimized model use allows predicting close to a thousand structures per day on server one GPU. Coupled Google Colaboratory, becomes free accessible platform for folding. is open-source software available at github.com/sokrypton/ColabFold . Its novel environmental databases are colabfold.mmseqs.com Contact...
Filling in the protein fold picture Fewer than a third of 14,849 known families have at least one member with an experimentally determined structure. This leaves more 5000 no structural information. Protein modeling using residue-residue contacts inferred from evolutionary data has been successful unknown structures, but it requires large numbers aligned sequences. Ovchinnikov et al. augmented such sequence alignments metagenome (see Perspective by Söding). They number sequences required to...
Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of cell. Recent developments in computational methods for protein structure predictions have reached accuracy experimentally determined models. Although this has been independently verified, implementation these across structural-biology applications remains to be tested. Here, we evaluate use AlphaFold2 (AF2) study characteristic structural elements; impact missense variants;...
Protein-protein interactions play critical roles in biology, but the structures of many eukaryotic protein complexes are unknown, and there likely not yet identified. We take advantage advances proteome-wide amino acid coevolution analysis deep-learning–based structure modeling to systematically identify build accurate models core within
The binding and catalytic functions of proteins are generally mediated by a small number functional residues held in place the overall protein structure. Here, we describe deep learning approaches for scaffolding such sites without needing to prespecify fold or secondary structure scaffold. first approach, "constrained hallucination," optimizes sequences that their predicted structures contain desired site. second "inpainting," starts from site fills additional sequence create viable...
Residue-residue coevolution has been observed across a number of protein-protein interfaces, but the extent residue between protein families on whole-proteome scale not systematically studied. We investigate 5.4 million pairs proteins in
The prediction of the structures proteins without detectable sequence similarity to any protein known structure remains an outstanding scientific challenge. Here we report significant progress in this area. We first describe de novo blind predictions unprecendented accuracy made for two large families recent CASP11 test methods by incorporating residue–residue co-evolution information Rosetta program. then use method generate models 58 121 prokaryotes which three-dimensional are not...
A bstract Unsupervised contact prediction is central to uncovering physical, structural, and functional constraints for protein structure determination design. For decades, the predominant approach has been infer evolutionary from a set of related sequences. In past year, language models have emerged as potential alternative, but performance fallen short state-of-the-art approaches in bioinformatics. this paper we demonstrate that Transformer attention maps learn contacts unsupervised...
AlphaFold2 (ref. 1) has revolutionized structural biology by accurately predicting single structures of proteins. However, a protein's biological function often depends on multiple conformational substates2, and disease-causing point mutations cause population changes within these substates3,4. We demonstrate that clustering multiple-sequence alignment sequence similarity enables to sample alternative states known metamorphic proteins with high confidence. Using this method, named...
Significance Coevolution-derived contact predictions are enabling accurate protein structure modeling. However, coevolving residues not always in contact, and this is a potential source of error such modeling efforts. To investigate the sources errors and, more generally, origins coevolution structures, we provide global overview contributions to “exceptions” general rule that close three-dimensional structures.
Misfolded luminal endoplasmic reticulum (ER) proteins undergo ER-associated degradation (ERAD-L): They are retrotranslocated into the cytosol, polyubiquitinated, and degraded by proteasome. ERAD-L is mediated Hrd1 complex (composed of Hrd1, Hrd3, Der1, Usa1, Yos9), but mechanism retrotranslocation remains mysterious. Here, we report a structure active complex, as determined cryo-electron microscopy analysis two subcomplexes. Hrd3 Yos9 jointly create binding site that recognizes glycosylated...
Advances in DNA sequencing and machine learning are providing insights into protein sequences structures on an enormous scale1. However, the energetics driving folding invisible these remain largely unknown2. The hidden thermodynamics of can drive disease3,4, shape evolution5-7 guide engineering8-10, new approaches needed to reveal for every sequence structure. Here we present cDNA display proteolysis, a method measuring thermodynamic stability up 900,000 domains one-week experiment. From...
The problem of predicting a protein's 3D structure from its primary amino acid sequence is longstanding challenge in structural biology. Recently, approaches like alphafold have achieved remarkable performance on this task by combining deep learning techniques with coevolutionary data multiple alignments related protein sequences. use information critical to these models' accuracy, and without it their predictive drops considerably. In living cells, however, the fully determined biophysical...
Significance Almost all proteins fold to their lowest free energy state, which is determined by amino acid sequence. Computational protein design has primarily focused on finding sequences that have very low in the target designed structure. However, what most relevant during folding not absolute of folded state but difference between and lowest-lying alternative states. We describe a deep learning approach captures aspects landscape, particular presence structures minima, show it can...