- Protein Structure and Dynamics
- RNA and protein synthesis mechanisms
- Genomics and Phylogenetic Studies
- Genomics and Chromatin Dynamics
- Machine Learning in Bioinformatics
- Gene expression and cancer classification
- Enzyme Structure and Function
- Aquatic Ecosystems and Phytoplankton Dynamics
- Bioinformatics and Genomic Networks
- Genetics, Bioinformatics, and Biomedical Research
- Molecular Biology Techniques and Applications
- Protein Degradation and Inhibitors
- Diatoms and Algae Research
- Marine and coastal ecosystems
University of Washington
2016-2021
Abstract We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models uses these predictions to guide Rosetta structure refinement. The network 3D convolutions evaluate local atomic environments followed by 2D provide their global contexts outperforms other methods similarly predict the of models. Overall for X-ray cryoEM structures PDB correlate with resolution, should be broadly useful assessing both...
Abstract The trRosetta structure prediction method employs deep learning to generate predicted residue‐residue distance and orientation distributions from which 3D models are built. We sought improve the by incorporating as inputs (in addition sequence information) both language model embeddings template information weighted similarity target. also developed a refinement pipeline that recombines generated template‐free utilizing versions of guided DeepAccNet accuracy predictor. Both...
Abstract We present the DeepProfile framework, which learns a variational autoencoder (VAE) network from thousands of publicly available gene expression samples and uses this to encode low-dimensional representation (LDR) predict complex disease phenotypes. To our knowledge, is first attempt use deep learning extract feature vast quantity unlabeled (i.e, lacking phenotype information) that are not incorporated into prediction problem. Deep-Profile acute myeloid leukemia patients’ in vitro...
Abstract Critical Assessment of Structure Prediction (CASP) is an organization aimed at advancing the state art in computing protein structure from sequence. In spring 2020, CASP launched a community project to compute structures most structurally challenging proteins coded for SARS‐CoV‐2 genome. Forty‐seven research groups submitted over 3000 three‐dimensional models and 700 sets accuracy estimates on 10 proteins. The resulting were released public. members also worked together provide...
Abstract We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models uses these predictions to guide Rosetta structure refinement. The network 3D convolutions evaluate local atomic environments followed by 2D provide their global contexts outperforms other methods similarly predict the of models. Overall for X-ray cryoEM structures PDB correlate with resolution, should be broadly useful assessing both...
Abstract Sexual reproduction roots the eukaryotic tree of life, although its loss occurs across diverse taxa. Asexual and clonal lineages persist in these taxa despite theoretical arguments suggesting that individual clones should be evolutionarily short-lived due to limited phenotypic diversity. Here, we present quantitative evidence an obligate asexual lineage emerged from a sexual population marine diatom Thalassiosira pseudonana rapidly expanded throughout world’s oceans. Whole genome...
ChIP-seq is a technique to determine binding locations of transcription factors, which remains central challenge in molecular biology. Current practice use 'control' dataset remove background signals from immunoprecipitation (IP) 'target' dataset. We introduce the AIControl framework, eliminates need obtain control and instead identifies peaks by estimating distributions many publicly available datasets. thereby avoid cost running experiments while simultaneously increasing accuracy location...
Abstract Determining the binding locations of regulatory factors , such as transcription and histone modifications, is essential to both basic biology research many clinical applications. Obtaining genome-wide location maps directly often invasive resource-intensive, so it common impute from DNA sequence or measures chromatin accessibility. We introduce DeepATAC, a deep-learning approach for imputing that uses accessibility measured by ATAC-seq. DeepATAC significantly outperforms current...
Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) is a widely used method to determine the binding positions of various proteins on genome in population cells. A typical ChIP-seq protocol involves two experiments: one designed capture target signals ('target' experiment) and other background noise ('control' experiment). peak calling algorithm then examines difference between experiment data control where protein interest binds along genome. Our approach, named...