- Genomics and Chromatin Dynamics
- RNA and protein synthesis mechanisms
- Genomics and Phylogenetic Studies
- RNA Research and Splicing
- Chromosomal and Genetic Variations
- Gene expression and cancer classification
- Genetic Mapping and Diversity in Plants and Animals
- Pregnancy and preeclampsia studies
- Gestational Diabetes Research and Management
- Genetic Associations and Epidemiology
- Birth, Development, and Health
- DNA Repair Mechanisms
- Machine Learning in Bioinformatics
- Bioinformatics and Genomic Networks
- Bariatric Surgery and Outcomes
- Artificial Intelligence in Healthcare
- RNA modifications and cancer
- Epigenetics and DNA Methylation
- DNA and Nucleic Acid Chemistry
- Congenital heart defects research
- Cancer-related gene regulation
- Diabetes, Cardiovascular Risks, and Lipoproteins
- Single-cell and spatial transcriptomics
- Bacterial Genetics and Biotechnology
- Cancer and biochemical research
Indian Institute of Science Education and Research Pune
2022-2024
National Chemical Laboratory
2011-2022
Academy of Scientific and Innovative Research
2020-2021
Savitribai Phule Pune University
2010-2012
National Institutes of Health
2009-2011
National Center for Biotechnology Information
2010-2011
National Institute of Environmental Health Sciences
2011
Duke University
2005-2010
Brigham and Women's Hospital
2010
Harvard University
2010
The various organogenic programs deployed during embryonic development rely on the precise expression of a multitude genes in time and space. Identifying cis -regulatory elements responsible for this tightly orchestrated regulation gene is an essential step understanding genetic pathways involved development. We describe strategy to systematically identify tissue-specific that share combinations sequence motifs. Using heart as experimental framework, we employed combination Gibbs sampling...
Finding functional DNA binding sites of transcription factors (TFs) throughout the genome is a crucial step in understanding transcriptional regulation. Unfortunately, these are typically short and degenerate, posing significant statistical challenge: many more matches to known TF motifs occur than actually functional. However, information about chromatin structure may help identify sites. In particular, it has been shown that active regulatory regions usually depleted nucleosomes, thereby...
Abstract Motivation: An important problem in molecular biology is to identify the locations at which a transcription factor (TF) binds DNA, given set of DNA sequences believed be bound by that TF. In previous work, we showed information sequence binding site sufficient predict structural class TF it. particular, this suggests can any are more likely certain classes TFs than others. Here, argue traditional methods for de novo motif finding significantly improved adopting an informative prior...
As an increasing number of eukaryotic genomes are being sequenced, comparative studies aimed at detecting regulatory elements in intergenic sequences becoming more prevalent. Most methods for transcription factor (TF) binding site discovery make use global or local alignments orthologous regions to assess whether a particular DNA is conserved across related organisms, and thus likely be functional. Since sites usually short, sometimes degenerate, often independent orientation, alignment...
The aim of the present study was to identify factors associated with non-attendance immediate postpartum glucose test using a machine learning algorithm following gestational diabetes mellitus (GDM) pregnancy.
Abstract Motivation: A key goal in molecular biology is to understand the mechanisms by which a cell regulates transcription of its genes. One important aspect this transcriptional regulation binding factors (TFs) their specific cis-regulatory counterparts on DNA. TFs recognize and bind DNA according structure DNA-binding domains (e.g. zinc finger, leucine zipper, homeodomain). The these can be used as basis for grouping into classes. Although varies widely across generally, within...
The structural simplicity and ability to capture serial correlations make Markov models a popular modeling choice in several genomic analyses, such as identification of motifs, genes regulatory elements. A critical, yet relatively unexplored, issue is the determination order model. Most biological applications use predetermined for all data sets indiscriminately. Here, we show vast variation performance with order. To identify 'optimal' order, investigated two model selection criteria:...
Genome-wide in vivo protein-DNA interactions are routinely mapped using high-throughput chromatin immunoprecipitation (ChIP). ChIP-reported regions typically investigated for enriched sequence-motifs, which likely to model the DNA-binding specificity of profiled protein and/or co-occurring proteins. However, simple enrichment analyses can miss insights into binding-activity protein. Note that ChIP reports making direct contact with as well those binding through intermediaries. For example,...
High-throughput chromatin immunoprecipitation has become the method of choice for identifying genomic regions bound by a protein.Such are then investigated overrepresented sequence motifs, assumption being that they must correspond to binding specificity profiled protein.However this approach often fails: many do not contain 'expected' motif.This is because DNA directly at its recognition site only way protein can cause region immunoprecipitate.Its change through association with different...
Early onset of type 2 diabetes and cardiovascular disease are common complications for women diagnosed with gestational diabetes. Prediabetes refers to a condition in which blood glucose levels higher than normal, but not yet high enough be as Currently, there is no accurate way knowing likely develop postpartum prediabetes. This study aims predict the risk prediabetes Our sparse logistic regression approach selects only two variables - antenatal fasting at OGTT HbA1c soon after diagnosis...
An important question in biology is how different promoter-architectures contribute to the diversity regulation of transcription initiation. A step forward has been production genome-wide maps start sites (TSSs) using high-throughput sequencing. However, subsequent characterizing promoters and their functions still largely done on basis previously established promoter-elements like TATA-box eukaryotes or -10 box bacteria. Unfortunately, a majority activities cannot be explained by these few...
Abstract Summary: CLARE is a computational method designed to reveal sequence encryption of tissue-specific regulatory elements. Starting with set elements known be active in particular tissue/process, it learns the code input and builds predictive model from features specific those The resulting can then applied user-supplied genomic regions identify novel candidate CLARE's also provides detailed analysis transcription factors that most likely bind elements, making an invaluable tool for...
Introduction: Stigma contributes to a significant part of the burden schizophrenia (SCZ), therefore reducing false positives from diagnosis would be liberating for individuals with SCZ and desirable clinicians. The stigmatization associated advocates need high-precision diagnosis. In this study, we present an ensemble learning-based approach using peripheral blood gene expression profiles. Methodology: machine learning (ML) models, support vector machines (SVM), prediction analysis...
The establishment of centromeric chromatin and its propagation by the centromere-specific histone CENPA is mediated epigenetic mechanisms in most eukaryotes. DNA replication origins, origin binding proteins, timing centromere are important determinants function. epigenetically regulated regional centromeres budding yeast Candida albicans have unique sequences that replicate earliest every chromosome clustered throughout cell cycle. In this study, genome-wide occupancy initiation protein Orc4...
Transcription factors (TFs) and their binding sites have evolved to interact cooperatively or competitively with each other. Here we examine in detail, across multiple cell lines, such cooperation competition among TFs both sequential spatial proximity (using chromatin conformation capture assays), considering vivo data as well TF motifs DNA. We ascertain significantly co-occurring ("attractive") avoiding ("repulsive") pairs using robust randomized models that retain the essential...
Abstract Summary: Promoters have diverse regulatory architectures and thus activate genes differently. For example, some a TATA-box, many others do not. Even the ones with it can differ in its position relative to transcription start site (TSS). No Promoter Left Behind (NPLB) is an efficient, organism-independent method for characterizing such directly from experimentally identified genome-wide TSSs, without relying on known promoter elements. As test case, we show application identifying...
Abstract Spatiotemporal regulation in DNA replication maintains kinetochore stability. The epigenetically regulated centromeres (CENs) the budding yeast Candida albicans have unique sequences, replicate early and are clustered throughout cell cycle. In this study, genome-wide occupancy of initiation protein Orc4 reveals its abundance at all CENs C. . associates with four different motifs, one which coincides tRNA genes. Hi-C combined timing analyses identify enriched interactions among or...
We present THiCweed, a new approach to analyzing transcription factor binding data from high-throughput chromatin immunoprecipitation-sequencing (ChIP-seq) experiments. THiCweed clusters bound regions based on sequence similarity using divisive hierarchical clustering within sliding windows, while exploring both strands. ThiCweed is specially geared toward containing mixtures of motifs, which challenge traditional motif-finders. Our implementation significantly faster than standard...