David Juan
- Genomics and Phylogenetic Studies
- Epigenetics and DNA Methylation
- Bioinformatics and Genomic Networks
- Genomics and Chromatin Dynamics
- Genomic variations and chromosomal abnormalities
- Cancer Genomics and Diagnostics
- Genomics and Rare Diseases
- Chromosomal and Genetic Variations
- Protein Structure and Dynamics
- RNA and protein synthesis mechanisms
- Machine Learning in Bioinformatics
- RNA Research and Splicing
- Microbial Metabolic Engineering and Bioproduction
- Genetic diversity and population structure
- CRISPR and Genetic Engineering
- Single-cell and spatial transcriptomics
- RNA modifications and cancer
- Gene expression and cancer classification
- Genetic Associations and Epidemiology
- Biomedical Text Mining and Ontologies
- T-cell and B-cell Immunology
- Cancer-related molecular mechanisms research
- Pleistocene-Era Hominins and Archaeology
- Bat Biology and Ecology Studies
- Renal and related cancers
Universitat Pompeu Fabra
2016-2025
Institut de Biologia Evolutiva
2016-2025
Consejo Superior de Investigaciones Científicas
2005-2024
Barcelona Biomedical Research Park
2016-2024
Centro Nacional de Biotecnología
2003-2024
Texas Tech University
2023
Max Planck Institute of Molecular Cell Biology and Genetics
2023
Max Planck Institute for the Physics of Complex Systems
2023
Center for Systems Biology Dresden
2023
Lubbock Christian University
2023
Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank INTERVAL studies, testing 29.5 million genetic for 36 red cell, white platelet properties 173,480 European-ancestry participants. This effort yielded hundreds low frequency (<5%) rare (<1%) strong impact on blood cell phenotypes. Our data highlight general allelic architecture complex...
Determining the full complement of protein-coding genes is a key goal genome annotation. The most powerful approach for confirming potential detection cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% GENCODE annotation human genome. We found strong relationship between experiments and both gene family age cross-species conservation. Most which were highly conserved. >96%...
INTRODUCTION Improved understanding of how the developing human nervous system differs from that closely related nonhuman primates is fundamental for teasing out human-specific aspects behavior, cognition, and disorders. RATIONALE The shared unique functional properties are rooted in complex transcriptional programs governing development distinct cell types, neural circuits, regions. However, precise molecular mechanisms underlying features have been only minimally characterized. RESULTS We...
The rich diversity of morphology and behavior displayed across primate species provides an informative context in which to study the impact genomic on fundamental biological processes. Analysis that insight into long-standing questions evolutionary conservation biology is urgent given severe threats these are facing. Here, we present high-coverage whole-genome data from 233 representing 86% genera all 16 families. This dataset was used, together with fossil calibration, create a nuclear DNA...
Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding their clinical relevance remains largely incomplete. To systematically decipher the effects human variants, we obtained whole-genome data for 809 individuals from 233 primate species and identified 4.3 million common protein-altering variants with orthologs in humans. We show that these can be inferred to have nondeleterious humans based on presence at high allele...
Annotating coding genes and inferring orthologs are two classical challenges in genomics evolutionary biology that have traditionally been approached separately, limiting scalability. We present TOGA (Tool to infer Orthologs from Genome Alignments), a method integrates structural gene annotation orthology inference. implements different paradigm orthologous loci, improves ortholog detection of conserved compared with state-of-the-art methods, handles even highly fragmented assemblies. scales...
The precise pattern and timing of speciation events that gave rise to all living placental mammals remain controversial. We provide a comprehensive phylogenetic analysis genetic variation across an alignment 241 mammal genome assemblies, addressing prior concerns regarding limited genomic sampling species. compared neutral genome-wide phylogenomic signals using concatenation coalescent-based approaches, interrogated chromosomes, analyzed extensive catalogs structural variants. Interordinal...
Thousands of genomic regions have been associated with heritable human diseases, but attempts to elucidate biological mechanisms are impeded by an inability discern which positions functionally important. Evolutionary constraint is a powerful predictor function, agnostic cell type or disease mechanism. Single-base phyloP scores from 240 mammals identified 3.3% the genome as significantly constrained and likely functional. We compared annotation, association studies, copy-number variation,...
We examined transposable element (TE) content of 248 placental mammal genome assemblies, the largest de novo TE curation effort in eukaryotes to date. found that although mammals resemble one another total and diversity, they show substantial differences with regard recent accumulation. This includes multiple expansion quiescence events across mammalian tree. Young TEs, particularly long interspersed elements, drive increases size, whereas DNA transposons are associated smaller genomes....
Human accelerated regions (HARs) are conserved genomic loci that evolved at an rate in the human lineage and may underlie human-specific traits. We generated HARs chimpanzee with automated pipeline alignment of 241 mammalian genomes. Combining deep learning chromatin capture experiments neural progenitor cells, we discovered a significant enrichment topologically associating domains containing variants change three-dimensional (3D) genome organization. Differential gene expression between...
Understanding the regulatory landscape of human genome is a long-standing objective modern biology. Using reference-free alignment across 241 mammalian genomes produced by Zoonomia Consortium, we charted evolutionary trajectories for 0.92 million candidate cis-regulatory elements (cCREs) and 15.6 transcription factor binding sites (TFBSs). We identified 439,461 cCREs 2,024,062 TFBSs under constraint. Genes near constrained perform fundamental cellular processes, whereas genes...
Species persistence can be influenced by the amount, type, and distribution of diversity across genome, suggesting a potential relationship between historical demography resilience. In this study, we surveyed genetic variation single genomes 240 mammals that compose Zoonomia alignment to evaluate how effective population size (
Protein-coding differences between species often fail to explain phenotypic diversity, suggesting the involvement of genomic elements that regulate gene expression such as enhancers. Identifying associations enhancers and phenotypes is challenging because enhancer activity can be tissue-dependent functionally conserved despite low sequence conservation. We developed Tissue-Aware Conservation Inference Toolkit (TACIT) associate candidate with species' using predictions from machine learning...
Abstract Background MicroRNA regulate mRNA levels in a tissue specific way, either by inducing degradation of the transcript or inhibiting translation transcription. Putative targets microRNA identified from seed sequence matches are available many databases. However, such have high false positive rate and cannot identify specificity regulation. Results We describe simple method to direct dysregulated cancers expression level measurements patient matched tumor/normal samples. The word...
Interacting or functionally related protein families tend to have similar phylogenetic trees. Based on this observation, techniques been developed predict interaction partners. The observed degree of similarity between the trees two proteins is result many different factors besides actual functional relationship them. Such influence performance predictions. One aspect that can fact a given interacts with others, and hence it must adapt all Accordingly, coadaptation signal within its tree...
The divergence accumulated during the evolution of protein families translates into their internal organization as subfamilies, and it is directly reflected in characteristic patterns differentially conserved residues. These specifically positions subfamilies are known “specificity determining positions” (SDPs). Previous studies have limited analysis to study relationship between these ligand-binding specificity, demonstrating significant yet predictive capacity. We systematically extended...
A healthy immune system requires cells that adapt rapidly to environmental challenges. This phenotypic plasticity can be mediated by transcriptional and epigenetic variability.We apply a novel analytical approach measure compare variability genome-wide across CD14+CD16- monocytes, CD66b+CD16+ neutrophils, CD4+CD45RA+ naïve T from the same 125 individuals. We discover substantially increased in neutrophils compared monocytes cells. In genes with hypervariable expression are found implicated...
Abstract Mammalian Y chromosomes are often neglected from genomic analysis. Due to their inherent assembly difficulties, high repeat content, and large ampliconic regions, only a handful of species have chromosome properly characterized. To date, just single human reference quality chromosome, European ancestry, is available due lack accessible methodology. facilitate the such complicated territory, we developed novel strategy sequence native, unamplified flow sorted DNA on MinION nanopore...
Introns can be extraordinarily large and they account for the majority of DNA sequence in human genes. However, little is known about their population patterns structural variation functional implication. By combining most extensive maps CNVs populations, we have found that intronic losses are frequent copy number variants (CNVs) protein-coding genes human, with 12,986 deletions, affecting 4,147 (including 1,154 essential 1,638 disease-related genes). This length results dozens showing...
Decrypting the rearrangements that drive mammalian chromosome evolution is critical to understanding molecular bases of speciation, adaptation, and disease susceptibility. Using 8 scaffolded 26 chromosome-scale genome assemblies representing 23/26 mammal orders, we computationally reconstructed ancestral karyotypes syntenic relationships at 16 nodes along phylogeny. Three different reference genomes (human, sloth, cattle) phylogenetically distinct superorders were used assess bias in expand...
We analyzed 131 human brains (44 neurotypical, 19 with Tourette syndrome, 9 schizophrenia, and 59 autism) for somatic mutations after whole genome sequencing to a depth of more than 200×. Typically, had 20 60 detectable single-nucleotide mutations, but ~6% harbored hundreds mutations. Hypermutability was associated age damaging in genes implicated cancers and, some brains, reflected vivo clonal expansions. Somatic duplications, likely arising during development, were found ~5% normal...