- Genomics and Rare Diseases
- Genomic variations and chromosomal abnormalities
- Genomics and Phylogenetic Studies
- Genetics and Neurodevelopmental Disorders
- Mental Health and Psychiatry
- RNA modifications and cancer
- Chromosomal and Genetic Variations
- Cancer Genomics and Diagnostics
- Genetic Associations and Epidemiology
- Pharmacogenetics and Drug Metabolism
- Advanced biosensing and bioanalysis techniques
- Hereditary Neurological Disorders
- Autism Spectrum Disorder Research
- Gene expression and cancer classification
- Connective tissue disorders research
- Biomedical Text Mining and Ontologies
- Iron Metabolism and Disorders
- Cellular transport and secretion
- Natural Language Processing Techniques
- Cryptography and Data Security
- Machine Learning in Healthcare
- Advanced Statistical Methods and Models
- Privacy-Preserving Technologies in Data
- BRCA gene mutations in cancer
- Probabilistic and Robust Engineering Design
Stony Brook University
2013-2016
Cold Spring Harbor Laboratory
2013-2016
Applied Biomathematics (United States)
2014-2015
To facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for calling are available, but is unclear how comparable these or what their relative merits in real-world scenarios might be.We sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform Agilent SureSelect version 2 capture kit), with...
INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts.We characterized whole genome (WGS), exome (WES), PCR-free data from same samples to investigate sources errors. We also developed a classification scheme based on coverage composition rank high low quality calls. performed large-scale...
We describe an X-linked genetic syndrome associated with mutations in TAF1 and manifesting global developmental delay, intellectual disability (ID), characteristic facial dysmorphology, generalized hypotonia, variable neurologic features, all male individuals. Simultaneous studies using diverse strategies led to the identification of nine families overlapping clinical presentations affected by de novo or maternally inherited single-nucleotide changes. Two additional harboring large...
<h3>Background</h3> Whole-genome sequencing (WGS) and whole-exome (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge process such data, especially when large family or cohort is sequenced. Our objective was develop big data toolset efficiently manipulate genome-wide variants, functional annotations coverage, together with conducting based analysis. <h3>Methods</h3> Hadoop framework for reliable,...
Confidence structures (c-boxes) are imprecise generalizations of confidence distributions. They encode frequentist intervals at every level for parameters interest and, thereby, characterize the inferential uncertainty about distribution estimated from sparse or sample data. have a purely interpretation that makes them useful in engineering because they offer guarantee statistical performance through repeated use. Unlike traditional intervals, which cannot usually be propagated mathematical...
Autism spectrum disorders (ASDs) are a group of developmental disabilities that affect social interaction and communication characterized by repetitive behaviors. There is now large body evidence suggests complex role genetics in ASDs, which many different loci involved. Although current population-scale genomic studies have been demonstrably fruitful, these generally focus on analyzing limited part the genome or use set bioinformatics tools. These limitations preclude analysis genome-wide...
Abstract Background INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts. Methods We characterized whole genome (WGS), exome (WES), PCR-free data from same samples to investigate sources errors. also developed a classification scheme based on coverage composition rank high low quality...
We present a new open-source algorithm, Scalpel, for sensitive and specific discovery of INDELs in exome-capture data. By combining the power mapping assembly, Scalpel carefully searches de Bruijn graph sequence paths that span each exon. A detailed repeat analysis coupled with self-tuning k -mer strategy allows to outperform other state-of-the-art approaches INDEL discovery. extensively compared battery >10000 simulated >1000 experimentally validated against two recent algorithms:...
There are ~12 billion nucleotides in every cell of the human body, and there ~25-100 trillion cells each body. Given somatic mosaicism, epigenetic changes environmental differences, no two beings same, particularly as only ~7 people on planet. One next great challenges for studying genetics will be to acknowledge embrace complexity. Every is unique, study disease phenotypes (and general) greatly enriched by moving from a deterministic more stochastic/probabilistic model. The dichotomous...
ABSTRACT As the second most common type of variations in human genome, insertions and deletions (indels) have been linked to many diseases, but indels more than a few bases are still challenging discover from short-read sequencing data. Scalpel ( http://scalpel.sourceforge.net ) is open-source software for reliable indel detection based on micro-assembly technique. To date, it has successfully used mutations novel candidate genes autism, extensively other large-scale studies diseases. This...
INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts. We characterized whole genome (WGS), exome (WES), PCR-free data from same samples to investigate sources errors. also developed a classification scheme based on coverage composition rank high low quality calls. performed large-scale...
ABSTRACT This report includes the discovery and analysis of a pedigree with Prader–Willi Syndrome (PWS), hereditary hemochromatosis (HH), dysautonomia-like symptoms. Nine members family participated in whole genome sequencing (WGS), which enabled wide scope variant calling from single-nucleotide polymorphisms to copy number variations. First, 5.5 Mb de novo deletion is identified chromosome region 15q11.2 15q13.1 boy PWS. Second, female invididual HH homozygous for p.C282Y HFE , mutation...
Abstract Autism spectrum disorders (ASD) are a group of developmental disabilities that affect social interaction, communication and characterized by repetitive behaviors. There is now large body evidence suggests complex role genetics in ASD, which many different loci involved. Although current population scale genomic studies have been demonstrably fruitful, these generally focus on analyzing limited part the genome or use set bioinformatics tools. These limitations preclude analysis...