André Kahles
- Genomics and Phylogenetic Studies
- Cancer Genomics and Diagnostics
- RNA modifications and cancer
- Algorithms and Data Compression
- RNA and protein synthesis mechanisms
- Gene expression and cancer classification
- RNA Research and Splicing
- Molecular Biology Techniques and Applications
- Cancer-related molecular mechanisms research
- Chromosomal and Genetic Variations
- Single-cell and spatial transcriptomics
- Genomics and Chromatin Dynamics
- Bioinformatics and Genomic Networks
- Pancreatic and Hepatic Oncology Research
- Plant Molecular Biology Research
- Evolution and Genetic Dynamics
- Genomics and Rare Diseases
- Epigenetics and DNA Methylation
- Genetic factors in colorectal cancer
- Microbial Community Ecology and Physiology
- Genetic Mapping and Diversity in Plants and Animals
- DNA and Biological Computing
- Genetics, Bioinformatics, and Biomedical Research
- Plant nutrient uptake and metabolism
- Cancer-related gene regulation
SIB Swiss Institute of Bioinformatics
2017-2024
University Hospital of Zurich
2017-2024
ETH Zurich
2017-2024
Memorial Sloan Kettering Cancer Center
2014-2023
Universidad del Desarrollo
2021
University of Zurich
2021
Kyiv Academic University
2021
Broad Institute
2020
Cornell University
2017
École Polytechnique Fédérale de Lausanne
2017
Our comprehensive analysis of alternative splicing across 32 The Cancer Genome Atlas cancer types from 8,705 patients detects events and tumor variants by reanalyzing RNA whole-exome sequencing data. Tumors have up to 30% more than normal samples. Association somatic with confirmed known trans associations in SF3B1 U2AF1 identified additional trans-acting (e.g., TADA1, PPP2R1A). Many tumors thousands not detectable samples; on average, we ≈930 exon-exon junctions ("neojunctions") typically...
We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete components with high success rates but assembly complete isoform structures poses a major challenge even when all constituent elements identified. Expression-level estimates also varied widely across methods, based on similar models. Consequently,...
Genetic differences between Arabidopsis thaliana accessions underlie the plant's extensive phenotypic variation, and until now these have been interpreted largely in context of annotated reference accession Col-0. Here we report sequencing, assembly annotation genomes 18 natural A. accessions, their transcriptomes. When assessed on basis annotation, one-third protein-coding genes are predicted to be disrupted at least one accession. However, re-annotation each genome revealed that...
The discovery of drivers cancer has traditionally focused on protein-coding genes
Authors compare RNA-seq aligners on mouse and human data sets using benchmarks such as alignment yield, splice junction accuracy suitability for transcript reconstruction. The work highlights the strength of each program discusses outstanding needs in analysis. High-throughput RNA sequencing is an increasingly accessible method studying gene structure activity a genome-wide scale. A critical step analysis partial reads to reference genome sequence. To assess performance current mapping...
Epigenome modulation potentially provides a mechanism for organisms to adapt, within and between generations. However, neither the extent which this occurs, nor mechanisms involved are known. Here we investigate DNA methylation variation in Swedish Arabidopsis thaliana accessions grown at two different temperatures. Environmental effects were limited transposons, where CHH was found increase with temperature. Genome-wide association studies (GWAS) revealed that extensive strongly associated...
Abstract Transcript alterations often result from somatic changes in cancer genomes 1 . Various forms of RNA have been described cancer, including overexpression 2 , altered splicing 3 and gene fusions 4 ; however, it is difficult to attribute these underlying genomic owing heterogeneity among patients tumour types, the relatively small cohorts for whom samples analysed by both transcriptome whole-genome sequencing. Here we present, our knowledge, most comprehensive catalogue...
We present a global atlas of 4,728 metagenomic samples from mass-transit systems in 60 cities over 3 years, representing the first systematic, worldwide catalog urban microbial ecosystem. This provides an annotated, geospatial profile strains, functional characteristics, antimicrobial resistance (AMR) markers, and genetic elements, including 10,928 viruses, 1,302 bacteria, 2 archaea, 838,532 CRISPR arrays not found reference databases. identified 4,246 known species microorganisms consistent...
Natural microbial communities are phylogenetically and metabolically diverse. In addition to underexplored organismal groups1, this diversity encompasses a rich discovery potential for ecologically biotechnologically relevant enzymes biochemical compounds2,3. However, studying identify genomic pathways the synthesis of such compounds4 assigning them their respective hosts remains challenging. The biosynthetic microorganisms in open ocean largely uncharted owing limitations analysis...
Although disinfection is key to infection control, the colonization patterns and resistomes of hospital-environment microbes remain underexplored. We report first extensive genomic characterization microbiomes, pathogens antibiotic resistance cassettes in a tertiary-care hospital, from repeated sampling (up 1.5 years apart) 179 sites associated with 45 beds. Deep shotgun metagenomics unveiled distinct ecological niches genes characterized by biofilm-forming human-microbiome-influenced...
Abstract Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for resource lncRNAs with validated roles. Furthermore, it remains debated whether mutated can drive tumorigenesis, and such functions could be conserved during evolution. Here, as part ICGC/TCGA Pan-Cancer Analysis Whole Genomes (PCAWG) Consortium, we introduce Cancer LncRNA Census (CLC), compilation 122 GENCODE causal roles in phenotypes. In contrast to existing databases, CLC requires...
Multi-omics datasets represent distinct aspects of the central dogma molecular biology. Such high-dimensional profiles pose challenges to data interpretation and hypothesis generation. ActivePathways is an integrative method that discovers significantly enriched pathways across multiple using statistical fusion, rationalizes contributing evidence highlights associated genes. As part ICGC/TCGA Pan-Cancer Analysis Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing from...
The nonsense-mediated decay (NMD) surveillance pathway can recognize erroneous transcripts and physiological mRNAs, such as precursor mRNA alternative splicing (AS) variants. Currently, information on the global extent of coupled AS NMD remains scarce even absent for any plant species. To address this, we conducted transcriptome-wide studies using Arabidopsis thaliana mutants in factor homologs UP FRAMESHIFT1 (UPF1) UPF3 well wild-type samples treated with translation inhibitor...
Abstract Motivation: Understanding the occurrence and regulation of alternative splicing (AS) is a key task towards explaining regulatory processes that shape complex transcriptomes higher eukaryotes. With advent high-throughput sequencing RNA (RNA-Seq), diversity AS transcripts could be measured at an unprecedented depth. Although catalog known events has grown ever since, novel are commonly observed when working with less well annotated organisms, in context disease, or within large...
Abstract The catalog of cancer driver mutations in protein-coding genes has greatly expanded the past decade. However, non-coding are less well-characterized and only a handful recurrent mutations, most notably TERT promoter have been reported. Here, as part ICGC/TCGA Pan-Cancer Analysis Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 across 38 tumor types, we perform multi-faceted pathway network analyses 2583 genomes 27 types compiled by PCAWG...
Alternative splicing (AS) generates transcript variants by variable exon/intron definition and massively expands transcriptome diversity. Changes in AS patterns have been found to be linked manifold biological processes, yet fundamental aspects, such as the regulation of its functional implications, largely remain addressed. In this work, widespread Arabidopsis thaliana Polypyrimidine tract binding protein homologs (PTBs) was revealed. total, 452 events derived from 307 distinct genes were...
Plants use light as source of energy and information to detect diurnal rhythms seasonal changes. Sensing changing conditions is critical adjust plant metabolism initiate developmental transitions. Here, we analyzed transcriptome-wide alterations in gene expression alternative splicing (AS) etiolated seedlings undergoing photomorphogenesis upon exposure blue, red, or white light. Our analysis revealed massive transcriptome reprogramming reflected by differential ∼20% all genes changes several...
Abstract The impact of somatic structural variants (SVs) on gene expression in cancer is largely unknown. Here, as part the ICGC/TCGA Pan-Cancer Analysis Whole Genomes (PCAWG) Consortium, which aggregated whole-genome sequencing data and RNA from a common set 1220 cases, we report hundreds genes for presence within 100 kb an SV breakpoint associates with altered expression. For majority these genes, increases rather than decreases corresponding events. Up-regulated cancer-associated impacted...
Recent technological advances have led to an increase in the production and availability of single-cell data. The ability integrate a set multi-technology measurements would allow identification biologically or clinically meaningful observations through unification perspectives afforded by each technology. In most cases, however, profiling technologies consume used cells thus pairwise correspondences between datasets are lost. Due sheer size can acquire, scalable algorithms that able...
Understanding the complex background of cancer requires genotype-phenotype information in single-cell resolution. Here, we perform long-read RNA sequencing (scRNA-seq) on clinical samples from three ovarian patients presenting with omental metastasis and increase PacBio depth to 12,000 reads per cell. Our approach captures 152,000 isoforms, which over 52,000 were not previously reported. Isoform-level analysis accounting for non-coding isoforms reveals 20% overestimation protein-coding gene...
Abstract Motivation: High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection expressed genes and reconstruction RNA transcripts. However, extensive dynamic range gene expression, technical limitations biases, as well observed complexity transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. Results: We present novel framework MITIE (Mixed Integer Transcript IdEntification) simultaneous transcript...
Abstract The discovery of driver mutations is one the key motivations for cancer genome sequencing. Here , as part ICGC/TCGA Pan-Cancer Analysis Whole Genomes (PCAWG) Consortium which aggregated whole sequencing data from 2658 cancers across 38 tumour types, we describe DriverPower, a software package that uses mutational burden and functional impact evidence to identify in coding non-coding sites within genomes. Using total 1373 genomic features derived public sources, DriverPower’s...
The amount of biological sequencing data available in public repositories is growing exponentially, forming an invaluable biomedical research resource. Yet, making it full-text searchable and easily accessible to researchers life science unsolved problem. In this work, we take advantage recently developed, very efficient structures algorithms for representing sequence sets. We make Petabases DNA sequences across all clades life, including viruses, bacteria, fungi, plants, animals, humans,...