- Protein Structure and Dynamics
- Machine Learning in Bioinformatics
- Enzyme Structure and Function
- RNA and protein synthesis mechanisms
- Bioinformatics and Genomic Networks
- Parasite Biology and Host Interactions
- Genomics and Phylogenetic Studies
- Microbial Metabolic Engineering and Bioproduction
- Clostridium difficile and Clostridium perfringens research
- Epigenetics and DNA Methylation
- Gut microbiota and health
- Protist diversity and phylogeny
- Microbial Community Ecology and Physiology
- Click Chemistry and Applications
- Chemical Synthesis and Analysis
- Bacteriophages and microbial interactions
- Genetics, Bioinformatics, and Biomedical Research
- Polar Research and Ecology
- Computational Drug Discovery Methods
- Fungal and yeast genetics research
- Enzyme Production and Characterization
- Usability and User Interface Design
University of Washington
2021-2025
PDL BioPharma (United States)
2023
Oregon State University
2018-2020
Protein-protein interactions play critical roles in biology, but the structures of many eukaryotic protein complexes are unknown, and there likely not yet identified. We take advantage advances proteome-wide amino acid coevolution analysis deep-learning–based structure modeling to systematically identify build accurate models core within
Deep-learning methods have revolutionized protein structure prediction and design but are presently limited to protein-only systems. We describe RoseTTAFold All-Atom (RFAA), which combines a residue-based representation of amino acids DNA bases with an atomic all other groups model assemblies that contain proteins, nucleic acids, small molecules, metals, covalent modifications, given their sequences chemical structures. By fine-tuning on denoising tasks, we developed RFdiffusion...
Abstract AlphaFold2 and RoseTTAFold predict protein structures with very high accuracy despite substantial architecture differences. We sought to develop an improved method combining features of both. The resulting method, RoseTTAFold2, extends the original three-track over full network, incorporating concepts Frame-aligned point error, recycling during training, use a distillation set from AlphaFold2. also took idea structurally coherent attention in updating pair features, but using more...
Abstract Although AlphaFold2 (AF2) and RoseTTAFold (RF) have transformed structural biology by enabling high-accuracy protein structure modeling, they are unable to model covalent modifications or interactions with small molecules other non-protein that can play key roles in biological function. Here, we describe All-Atom (RFAA), a deep network capable of modeling full assemblies containing proteins, nucleic acids, molecules, metals, given the sequences polymers atomic bonded geometry...
Patescibacteria, also known as the candidate phyla radiation (CPR), are a diverse group of bacteria that constitute disproportionately large fraction microbial dark matter. Its few cultivated members, belonging mostly to Saccharibacteria, grow epibionts on host Actinobacteria. Due lack suitable tools, genetic basis this lifestyle and other unique features Patescibacteira remain unexplored. Here, we show Saccharibacteria exhibit natural competence, exploit property for their manipulation....
The design of enzymes with complex active sites that mediate multistep reactions remains an outstanding challenge. With serine hydrolases as a model system, we combined the generative capabilities RFdiffusion ensemble generation method for assessing site preorganization to starting from minimal descriptions. Experimental characterization revealed catalytic efficiencies ( k cat / K m ) up 2.2x10 5 M −1 s and crystal structures closely match models (Cα RMSDs < 1 Å). Selection structural...
Our understanding of mammalian evolution has become microbiome-aware. While emerging research links biodiversity and the gut microbiome, we lack insight into which microbes potentially impact evolution. Microbes common to diverse species may be strong candidates, as their absence in affect how microbiome functionally contributes physiology adversely fitness. Identifying such conserved is thus important ultimately assessing microbiome’s potential role To advance discovery, developed an...
Abstract Enzymes that proceed through multistep reaction mechanisms often utilize complex, polar active sites positioned with sub-angstrom precision to mediate distinct chemical steps, which makes their de novo construction extremely challenging. We sought overcome this challenge using the classic catalytic triad and oxyanion hole of serine hydrolases as a model system. used RFdiffusion 1 generate proteins housing increasing complexity varying geometry, newly developed ensemble generation...
Abstract The trRosetta structure prediction method employs deep learning to generate predicted residue‐residue distance and orientation distributions from which 3D models are built. We sought improve the by incorporating as inputs (in addition sequence information) both language model embeddings template information weighted similarity target. also developed a refinement pipeline that recombines generated template‐free utilizing versions of guided DeepAccNet accuracy predictor. Both...
Abstract The majority of proteins must form higher-order assemblies to perform their biological functions, yet few machine learning models can accurately and rapidly predict the symmetry involving multiple copies same protein chain. Here, we address this gap by finetuning several classes foundation models, homo-oligomer symmetry. Our best model named Seq2Symm, which utilizes ESM2, outperforms existing template-based deep methods achieving an average AUC-PR 0.47, 0.44 0.49 across symmetries...
<title>Abstract</title> The majority of proteins must form higher-order assemblies to perform their biological functions. Despite the importance protein quaternary structure, there are few machine learning models that can accurately and rapidly predict symmetry involving multiple copies same chain. Here, we address this gap by training several classes foundation models, including ESM-MSA, ESM2, RoseTTAFold2, homo-oligomer symmetry. Our best model named Seq2Symm, which utilizes outperforms...
Protein-protein interactions (PPI) are essential for biological function. Recent advances in coevolutionary analysis and Deep Learning (DL) based protein structure prediction have enabled comprehensive PPI identification bacterial yeast proteomes, but these approaches limited success to date the more complex human proteome. Here, we overcome this challenge by 1) enhancing signals with 7-fold deeper multiple sequence alignments harvested from 30 petabytes of unassembled genomic data, 2)...
For CASP14, we developed deep learning-based methods for predicting homo-oligomeric and hetero-oligomeric contacts used them oligomer modeling. To build structure models, an generation method that utilizes predicted interchain to guide iterative restrained minimization from random backbone structures. We supplemented this gradient-based fold-and-dock with template-based ab initio docking approaches using subunit predictions on 29 assembly targets. These produced models summed Z-scores 5.5...
Abstract Protein-protein interactions play critical roles in biology, but despite decades of effort, the structures many eukaryotic protein complexes are unknown, and there likely that have not yet been identified. Here, we take advantage recent advances proteome-wide amino acid coevolution analysis deep-learning-based structure modeling to systematically identify build accurate models core complexes, as represented within Saccharomyces cerevisiae proteome. We use a combination RoseTTAFold...
Identification of bacterial protein-protein interactions and predicting the structures complexes could aid in understanding pathogenicity mechanisms developing treatments for infectious diseases. Here, we developed a deep learning-based pipeline that leverages residue-residue coevolution protein structure prediction to systematically identify structurally characterize at proteome-wide scale. Using this pipeline, searched through 78 million pairs proteins across 19 human pathogens identified...
The study of bacteria has yielded fundamental insights into cellular biology and physiology, biotechnological advances many therapeutics. Yet due to a lack suitable tools, the significant portion bacterial diversity held within candidate phyla radiation (CPR) remains inaccessible such pursuits. Here we show that CPR belonging phylum Saccharibacteria exhibit natural competence. We exploit this property develop methods for their genetic manipulation, including insertion heterologous sequences...
Abstract While recent research reveals that the gut microbiome drives vertebrate health, little is known about whether mechanisms these microbes employ to interact with physiology are consistent across host species. To help close this knowledge gap, we compared metagenomes 10 species, including biomedical animal models, define inter-species variation in biochemical pathways encoded by microbiota. Doing so revealed gut-enriched conserved vertebrates, as well vary concordantly evolutionary...