- RNA and protein synthesis mechanisms
- Gene expression and cancer classification
- Plant-Microbe Interactions and Immunity
- Machine Learning in Bioinformatics
- Genomics and Phylogenetic Studies
- Gut microbiota and health
- Bioinformatics and Genomic Networks
- Protein Structure and Dynamics
- vaccines and immunoinformatics approaches
- Plant Pathogenic Bacteria Studies
- Plant Molecular Biology Research
- Statistical Distribution Estimation and Applications
- Viral Infectious Diseases and Gene Expression in Insects
- Statistical Methods and Bayesian Inference
- Photosynthetic Processes and Mechanisms
- Neural Networks and Applications
- Plant Gene Expression Analysis
- Cancer Research and Treatments
- Light effects on plants
- CRISPR and Genetic Engineering
- Legume Nitrogen Fixing Symbiosis
- Advanced Statistical Methods and Models
- 3D Shape Modeling and Analysis
- Time Series Analysis and Forecasting
- Molecular Biology Techniques and Applications
Indian Institute of Technology Kharagpur
2022-2025
Harvard University
2016-2022
Inspire
2019-2020
Indian Institute of Technology Bhubaneswar
2019
University of North Carolina at Chapel Hill
2012-2017
Howard Hughes Medical Institute
2017
Sainsbury Laboratory
2016-2017
University of Cambridge
2016-2017
Indian Institute of Technology Kanpur
1991
Plants are responsive to temperature, and some species can distinguish differences of 1°C. In Arabidopsis, warmer temperature accelerates flowering increases elongation growth (thermomorphogenesis). However, the mechanisms perception largely unknown. We describe a major thermosensory role for phytochromes (red light receptors) during night. Phytochrome null plants display constitutive warm-temperature response, consistent with this, we show in this background that transcriptome becomes...
Plants have significantly more transcription factor (TF) families than animals and fungi, plant TF tend to contain genes; these expansions are linked adaptation environmental stressors. Many family members bind similar or identical sequence motifs, such as G-boxes (CACGTG), so it is difficult predict regulatory relationships. We determined that the flanking sequences near help determine in vitro specificity but this insufficient pattern of genes G-boxes. Therefore, we constructed a gene...
ABSTRACT AlphaFold2 and related systems use deep learning to predict protein structure from co-evolutionary relationships encoded in multiple sequence alignments (MSAs). Despite dramatic, recent increases accuracy, three challenges remain: (i) prediction of orphan rapidly evolving proteins for which an MSA cannot be generated, (ii) rapid exploration designed structures, (iii) understanding the rules governing spontaneous polypeptide folding solution. Here we report development end-to-end...
We present JAM, a generative protein design system that enables fully computational of antibodies with therapeutic-grade properties for the first time. JAM generates de novo in both single-domain (VHH) and paired (scFv/mAb) antibody formats achieve double-digit nanomolar affinities, strong early-stage developability profiles, precise epitope targeting without experimental optimization. demonstrate JAM's capabilities across multiple therapeutic contexts, including computationally designed to...
Abstract This protocol describes the computational steps necessary to reproduce results described in paper " Unified rational protein engineering with sequence-only deep representation learning by Alley et al.
Biotrophic phytopathogens are typically limited to their adapted host range. In recent decades, investigations have teased apart the general molecular basis of intraspecific variation for innate immunity plants, involving receptor proteins that enable perception pathogen-associated patterns or avirulence elicitors from pathogen as triggers defense induction. However, consensus concerning evolutionary and factors alter range across closely related phytopathogen isolates has been more elusive....
Many microbes associate with higher eukaryotes and impact their vitality. To engineer microbiomes for host benefit, we must understand the rules of community assembly maintenance that, in large part, demand an understanding direct interactions among members. Toward this end, have developed a Poisson-multivariate normal hierarchical model to learn from count-based output standard metagenomics sequencing experiments. Our controls confounding predictors at Poisson layer captures taxon–taxon...
Abstract Rational protein engineering requires a holistic understanding of function. Here, we apply deep learning to unlabelled amino acid sequences distill the fundamental features into statistical representation that is semantically rich and structurally, evolutionarily, biophysically grounded. We show simplest models built on top this uni fied rep resentation (UniRep) are broadly applicable generalize unseen regions sequence space. Our data-driven approach reaches near state-of-the-art or...
Abstract Protein engineering has enormous academic and industrial potential. However, it is limited by the lack of experimental assays that are consistent with design goal sufficiently high-throughput to find rare, enhanced variants. Here we introduce a machine learning-guided paradigm can use as few 24 functionally assayed mutant sequences build an accurate virtual fitness landscape screen ten million via in silico directed evolution. As demonstrated two highly dissimilar proteins, avGFP...
Abstract Proteins—molecular machines that underpin all biological life—are of significant therapeutic and industrial value. Directed evolution is a high-throughput experimental approach for improving protein function, but has difficulty escaping local maxima in the fitness landscape. Here, we investigate how supervised learning closed loop with DNA synthesis screening can be used to improve design. Using green fluorescent (GFP) as an illustrative example, demonstrate opportunities challenges...
ABSTRACT This paper introduces a novel regression model designed for angular response variables with linear predictors, utilizing generalized Möbius transformation to define the curve. By mapping real axis circle, effectively captures relationship between and components. A key innovation is introduction of an area‐based loss function, inspired by geometry curved torus, efficient parameter estimation. The semi‐parametric nature eliminates need specific distributional assumptions about error,...
We present significant advances in de novo antibody design against G protein-coupled receptors (GPCRs) enabled by scaling the test-time compute used our generative protein system, JAM. hundreds of VHH (single domain) antibodies CXCR4 and CXCR7, with top designs showing picomolar to low-nanomolar affinities, high selectivity, favorable early-stage developability profiles, matching or outperforming clinical-stage molecules these dimensions. Further, affinity potently modulate receptor...
Pseudomonas syringae is a phylogenetically diverse species of Gram-negative bacterial plant pathogens responsible for crop diseases around the world. The HrpL sigma factor drives expression major P. virulence regulon. controls genes encoding structural and functional components type III secretion system (T3SS) three secreted effector proteins (T3E) that are collectively essential virulence. also regulates an under-explored suite non-type (non-T3E), including toxin production systems operons...
Abstract Transcript levels are a critical determinant of the proteome and hence cellular function. Because transcriptome is an outcome interactions between genes their products, it may be accurately represented by subset transcript abundances. We develop method, Tradict ( tra nscriptome pre dict ), capable learning using expression measurements small 100 marker to predict transcriptome-wide gene abundances comprehensive, but interpretable list transcriptional programs that represent major...
RNA-seq has become a de facto standard for measuring gene expression. Traditionally, experiments are mathematically averaged -- they sequence the mRNA of individuals from different treatment groups, hoping to correlate phenotype with differences in arithmetic read count averages at shared loci interest. Alternatively, tissue same may be pooled prior sequencing what we refer as biologically design. As mathematical averaging sequences all it controls both biological and technical variation;...