- Cancer Genomics and Diagnostics
- Genomics and Phylogenetic Studies
- Genomics and Rare Diseases
- Genetic factors in colorectal cancer
- RNA and protein synthesis mechanisms
- Glioma Diagnosis and Treatment
- Genetic Associations and Epidemiology
- Protein Degradation and Inhibitors
- Cancer-related gene regulation
- Acute Myeloid Leukemia Research
- Chromatin Remodeling and Cancer
- Genomic variations and chromosomal abnormalities
- Cancer, Hypoxia, and Metabolism
- RNA modifications and cancer
- Algorithms and Data Compression
- Genomics and Chromatin Dynamics
- Epigenetics and DNA Methylation
- Protist diversity and phylogeny
- Radiomics and Machine Learning in Medical Imaging
- Mass Spectrometry Techniques and Applications
- Gene expression and cancer classification
- Genetics and Neurodevelopmental Disorders
- Insect Resistance and Genetics
- Cancer Research and Treatments
- Cancer-related molecular mechanisms research
New York Genome Center
2014-2021
Columbia University Irving Medical Center
2017
Memorial Sloan Kettering Cancer Center
2017
IBM (United States)
2017
Howard Hughes Medical Institute
2017
IBM Research - Thomas J. Watson Research Center
2017
Freie Universität Berlin
2009-2014
Robert Koch Institute
2014
Max Planck Institute for Molecular Genetics
2009-2012
Carnegie Mellon University
2012
Abstract The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood sleep disorders, with ultimate goal improving diagnosis, treatment prevention these diseases. initial phases focused on whole-genome sequencing individuals rich phenotypic data diverse backgrounds. Here we describe TOPMed goals design as well available resources early insights obtained from sequence data. include a variant browser, genotype...
Oncogenic Suspect Exposed It can be difficult logistically to study the genomics of rare variants common cancers. Nevertheless, Honeyman et al. (p. 1010 ) studied fibrolamellar hepatocellular carcinoma (FL-HCC), a and poorly understood liver tumor that affects adolescents young adults for which there is no effective treatment. FL-HCCs from 15 patients all expressed chimeric RNA transcript protein containing sequences molecular chaperone fused in frame with catalytic domain kinase A. The...
Colorectal cancer is the second leading cause of death in United States, with over 50,000 deaths estimated 2014. Molecular profiling for somatic mutations that predict absence response to anti-EGFR therapy has become standard practice treatment metastatic colorectal cancer; however, quantity and type tissue available testing frequently limited. Further, degree which primary tumor a faithful representation disease been questioned. As next-generation sequencing technology becomes more widely...
Summary paragraph The Trans-Omics for Precision Medicine (TOPMed) program seeks to elucidate the genetic architecture and disease biology of heart, lung, blood, sleep disorders, with ultimate goal improving diagnosis, treatment, prevention. initial phases focus on whole genome sequencing individuals rich phenotypic data diverse backgrounds. Here, we describe TOPMed goals design as well resources early insights from sequence data. include a variant browser, genotype imputation panel, sharing...
Second-generation sequencing technologies deliver DNA sequence data at unprecedented high throughput. Common to most biological applications is a mapping of the reads an almost identical or highly similar reference genome. Due large amounts data, efficient algorithms and implementations are crucial for this task. We present read tool called RazerS. It allows user align arbitrary length using either Hamming distance edit distance. Our can work lossless with user-defined loss rate higher...
Reliable detection of somatic variations is critical importance in cancer research. Here we present Lancet, an accurate and sensitive variant caller, which detects SNVs indels by jointly analyzing reads from tumor matched normal samples using colored de Bruijn graphs. We demonstrate, through extensive experimental comparison on synthetic real whole-genome sequencing datasets, that Lancet has better accuracy, especially for indel detection, than widely used callers, such as MuTect, MuTect2,...
Gout is the most common cause of inflammatory arthritis worldwide, particularly in Pacific regions. We aimed to establish prevalence gout and hyperuricaemia French Polynesia, their associations with dietary habits, comorbidities, HLA-B*58:01 allele, current management disease.
Second generation sequencing technologies yield DNA sequence data at ultra high-throughput. Common to most biological applications is a mapping of the reads an almost identical or highly similar reference genome. The assessment quality read results not straightforward and has been formalized so far. Hence, it easy compare different approaches in unified way determine which program best for what task.We present new benchmark method, called Rabema (Read Alignment BEnchMArk), mappers. It...
Abstract Motivation: The reliable detection of genomic variation in resequencing data is still a major challenge, especially for variants larger than few base pairs. Sequencing reads crossing boundaries structural carry the potential their identification, but are difficult to map. Results: Here we present method ‘split’ read mapping, where prefix and suffix match may be interrupted by longer gap read-to-reference alignment. We use this accurately detect medium-sized insertions long deletions...
To identify previously reported disease mutations that are compatible with extraordinary longevity, we screened the coding regions of genomes 44 Ashkenazi Jewish centenarians. Individual genome sequences were generated 30× coverage on Illumina HiSeq 2000 and single-nucleotide variants called analysis toolkit (GATK). We identified 130 annotated as "pathogenic" or "likely pathogenic" based ClinVar database infrequent in general population. These to cause a wide range degenerative, neoplastic,...
Abstract Motivation: The landscape of structural variation (SV) including complex duplication and translocation patterns is far from resolved. SV detection tools usually exhibit low agreement, are often geared toward certain types or size ranges struggle to correctly classify the type exact SVs. Results: We present Gustaf (Generic mUlti-SpliT Alignment Finder), a sound generic multi-split tool that detects classifies deletions, inversions, dispersed duplications translocations ≥30 bp. Our...
Human papillomavirus (HPV) causes 5% of all cancers and frequently integrates into host chromosomes. The HPV oncoproteins E6 E7 are necessary but insufficient for cancer formation, indicating that additional secondary genetic events required. Here, we investigate potential oncogenic impacts virus integration. Analysis 105 HPV-positive oropharyngeal by whole-genome sequencing detects integration in 77%, revealing five statistically significant sites recurrent near genes regulate epithelial...
Abstract While the genomes of normal tissues undergo dynamic changes over time, little is understood about temporal-spatial dynamics in premalignant that progress to cancer compared those remain cancer-free. Here we use whole genome sequencing contrast genomic alterations 427 longitudinal samples from 40 patients with stable Barrett’s esophagus who progressed esophageal adenocarcinoma (ESAD). We show same somatic mutational processes are active tissue regardless outcome, high levels...
Many multiple sequence alignment tools have been developed in the past, progressing either speed or accuracy. Given importance and wide-spread use of tools, progress both categories is a contribution to community has driven research field so far.We introduce graph-based extension consistency-based, progressive strategy. We apply consistency notion segments instead single characters. The main problem we solve this context define sequences such way that possible. implemented algorithm using...
To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each.Tumor DNA was analyzed by commercial targeted panel. In addition, tumor-normal whole-genome sequencing (WGS) RNA (RNA-seq). The WGS RNA-seq data were team of bioinformaticians cancer oncologists, separately IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants identifying drugs.More identified WGS/RNA analysis than panels. WGA completed...
Abstract Motivation: Deep sequencing has become the method of choice for determining small RNA content a cell. Mapping sequenced reads onto their reference genome serves as basis all further analyses, namely identification and quantification. A frequently used is Mega BLAST followed by several filtering steps, even though it slow inefficient this task. Also, none currently available short read aligners established itself particular task mapping. Results: We present MicroRazerS, tool...
Abstract Motivation: Novel high-throughput sequencing technologies pose new algorithmic challenges in handling massive amounts of short-read, high-coverage data. A robust and versatile consensus tool is particular interest for such data since a sound multi-read alignment prerequisite variation analyses, accurate genome assemblies insert sequencing. Results: algorithm de novo or reference-guided assembly presented. The program identifies segments shared by multiple reads then aligns these...
We developed and validated a clinical whole-genome transcriptome sequencing (WGTS) assay that provides comprehensive genomic profile of patient's tumor. The ability to fully capture the mappable genome with sufficient coverage precisely call DNA somatic single nucleotide variants, insertions/deletions, copy number structural RNA gene fusions was analyzed. New York State's Department Health next-generation guidelines were expanded for establishing performance validation applicable sequencing....
Colorectal cancer is the second leading cause of death in United States, with over 50,000 deaths estimated 2014. Molecular profiling for somatic mutations that predict absence response to anti-EGFR therapy has become standard practice treatment metastatic colorectal cancer; however, quantity and type tissue available testing frequently limited. Further, degree which primary tumor a faithful representation disease been questioned. As next-generation sequencing technology becomes more widely...
The elemental composition of peptides results in formation distinct, equidistantly spaced clusters across the mass range. property peptide clustering is used to calibrate lists, identify and remove non-peptide peaks for data reduction.We developed an analytical model cluster centres. Inputs included, amino acid frequencies sequence database, average length proteins cleavage specificity proteolytic enzyme probability. We examined accuracy our by comparing it with based on silico database...
Summary Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many which cannot be easily classified into simple (e.g. deletion, translocation) or complex chromothripsis, chromoplexy) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology junction copy number (JCN) across 2,833 tumor whole sequences (WGS), we introduce three phenomena: pyrgo, rigma , and tyfonas . Pyrgo are “towers” low-JCN duplications associated with...
Abstract Background Historically, geneticists have relied on genotyping arrays and imputation to study human genetic variation. However, an underrepresentation of diverse populations has resulted in that poorly capture global variation, a lack reference panels. This contributed deepening health disparities. Whole genome sequencing (WGS) better captures variation but remains prohibitively expensive. Thus, we explored WGS at “mid-pass” 1-7x coverage. Results Here, developed benchmarked methods...