Flavia Villani
- Genomics and Phylogenetic Studies
- Genetic Mapping and Diversity in Plants and Animals
- Chromosomal and Genetic Variations
- Epigenetics and DNA Methylation
- CRISPR and Genetic Engineering
- Bioinformatics and Genomic Networks
- Genomics and Chromatin Dynamics
- Genetic diversity and population structure
- RNA and protein synthesis mechanisms
- Genomic variations and chromosomal abnormalities
- Genomics and Rare Diseases
- Adipose Tissue and Metabolism
- Molecular Biology Techniques and Applications
- Genetic Associations and Epidemiology
- Genetics, Aging, and Longevity in Model Organisms
- Biomedical Text Mining and Ontologies
- SARS-CoV-2 detection and testing
- Gene expression and cancer classification
- Ubiquitin and proteasome pathways
- Evolution and Genetic Dynamics
- Biomedical and Engineering Education
- Scientific Computing and Data Management
- Genetic Neurodegenerative Diseases
- Fungal and yeast genetics research
- Artificial Intelligence in Healthcare and Education
University of Tennessee Health Science Center
2021-2025
Institute of Genetics and Biophysics
2020-2021
National Research Council
2021
University of Naples Federico II
2020
Abstract Here the Human Pangenome Reference Consortium presents a first draft of human pangenome reference. The contains 47 phased, diploid assemblies from cohort genetically diverse individuals 1 . These cover more than 99% expected sequence in each genome and are accurate at structural base pair levels. Based on alignments assemblies, we generate that captures known variants haplotypes reveals new alleles structurally complex loci. We also add 119 million pairs euchromatic polymorphic...
Abstract Pangenome graphs can represent all variation between multiple reference genomes, but current approaches to build them exclude complex sequences or are based upon a single reference. In response, we developed the PanGenome Graph Builder (PGGB), pipeline for constructing pangenome without bias exclusion. PGGB uses all-to-all alignments graph in which identify variation, measure conservation, detect recombination events, and infer phylogenetic relationships.
Abstract The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. contains 47 phased, diploid assemblies from cohort of genetically diverse individuals. These cover more than 99% the expected sequence and are accurate at structural base-pair levels. Based on alignments assemblies, we generated that captures known variants haplotypes, reveals novel alleles structurally complex loci, adds 119 million base pairs euchromatic polymorphic 1,529 gene...
The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold increases contiguity 290-fold compared with its predecessor. Gene annotations are now more complete, improving mapping precision genomic, transcriptomic, proteomics datasets. We jointly analyzed 163 short-read whole-genome sequencing datasets representing 120 laboratory rat strains substrains using mRatBN7.2. defined...
The HXB/BXH family of recombinant inbred rat strains is a unique genetic resource that has been extensively phenotyped over 25 years, resulting in vast dataset quantitative molecular and physiological phenotypes. We built pangenome graph from 10x Genomics Linked-Read data for 31 rats to study variation association mapping. includes 0.2Gb sequence not present the reference mRatBN7.2, confirming capture substantial additional variation. validated variants challenging regions, including complex...
Pangenomics is a growing field within computational genomics. Many pangenomic analyses use bidirected sequence graphs as their core data model. However, implementing and correctly using this model can be difficult, the scale of datasets challenging to work at. These challenges have impeded progress in field.Here, we present stack two C++ libraries, libbdsg libhandlegraph, which simple, field-proven interface, designed expose elementary features these while preventing common graph...
Abstract The BXD recombinant inbred (RI) mouse strains are the largest and most deeply phenotyped panel of vertebrate organisms. RIs allow phenotyping isogenic individuals across virtually any environment or treatment. We performed whole genome sequencing generated a compendium SNPs, indels, short tandem repeats, structural variants in these used them to analyze phenomic data accumulated over past 50 years. show that BXDs segregate >6 million with high minor allele which dervied from...
The HXB/BXH family of recombinant inbred rat strains is a unique genetic resource that has been extensively phenotyped over 25 years, resulting in vast dataset quantitative molecular and physiological phenotypes. We built pangenome graph from 10x Genomics Linked-Read data for 31 rats to study variation association mapping. includes 0.2Gb sequence not present the reference mRatBN7.2, confirming capture substantial additional variation. validated variants challenging regions, including complex...
The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed k-mer indexing strategy for comparative analysis across multiple assemblies, including pangenome reference, GRCh38, and CHM13, telomere-to-telomere assembly. Our approach enabled us to identify valuable collection universally conserved sequences all referred as...
Short tandem repeats (STRs) are a class of rapidly mutating genetic elements typically characterized by repeated units 1–6 bp. We leveraged whole-genome sequencing data for 152 recombinant inbred (RI) strains from the BXD family mice to map loci that modulate genome-wide patterns new mutations arising during parent-to-offspring transmission at STRs. defined quantitative phenotypes describing numbers and types germline STR in each strain performed trait locus (QTL) analyses these phenotypes....
The HXB/BXH family of recombinant inbred rat strains is a unique genetic resource that has been extensively phenotyped over 25 years, resulting in vast dataset quantitative molecular and physiological phenotypes. We built pangenome graph from 10x Genomics Linked-Read data for 31 rats to study variation association mapping. length was on average 2.4 times greater than the corresponding reference mRatBN7.2, confirming capture substantial additional variation. validated variants challenging...
The seventh iteration of the reference genome assembly for
Abstract The ability of SARS-CoV-2 to rapidly mutate represents a remarkable complicancy. Quantitative evaluations the effects that these mutations have on virus structure/function is great relevance and availability large number sequences since early phases pandemic unique opportunity follow adaptation humans. Here, we evaluated amino acid their progression by analyzing publicly available viral genomes at three stages (2020 March 15th October 7th, 2021 February 7th). Mutations were...
Abstract Motivation Pangenomics is a growing field within computational genomics. Many pangenomic analyses use bidirected sequence graphs as their core data model. However, implementing and correctly using this model can be difficult, the scale of sets challenging to work at. These challenges have impeded progress in field. Results Here we present stack two C++ libraries, libbdsg libhandlegraph , which simple, field-proven interface, designed expose elementary features these while preventing...
Abstract Short tandem repeats (STRs) are a class of rapidly mutating genetic elements characterized by repeated units 1 or more nucleotides. We leveraged whole genome sequencing data for 152 recombinant inbred (RI) strains from the BXD family derived C57BL/6J and DBA/2J mice to study effects background on genome-wide patterns new mutations at STRs. defined quantitative phenotypes describing numbers types germline STR in each strain identified locus chromosome 13 associated with propensity...
Abstract Genetic variations in protein expression are implicated a broad spectrum of common diseases and complex traits. However, the fundamental genetic architecture variation have received comparatively less attention than either mRNA or classical phenotypes. In this study, we systematically quantified proteins brains large family rats using tandem mass tag (TMT)-based quantitative mass-spectrometry (MS) technology. We identified comprehensive proteome 8,119 from Spontaneously Hypertensive...
We created GNQA, a generative pre-trained transformer (GPT) knowledge base driven by performant retrieval augmented generation (RAG) with focus on aging, dementia, Alzheimer’s and diabetes. uploaded corpus of three thousand peer reviewed publications these topics into the RAG. To address concerns about inaccurate responses GPT ‘hallucinations’, we implemented context provenance tracking mechanism that enables researchers to validate against original material get references papers. assess...
Abstract DNA methylation is influenced by genetic and non-genetic factors. Here, we chart quantitative trait loci (QTLs) that modulate levels of at highly conserved CpGs using liver methylome data from mouse strains belonging to the BXD Family. A regulatory hotspot on chromosome 5 had highest density trans-acting QTLs (trans-meQTLs) associated with multiple distant CpGs. We refer this locus as meQTL.5a. The trans-modulated showed age-dependent changes, were enriched in developmental genes,...
DNA methylation is influenced by genetic and non-genetic factors. Here, we chart quantitative trait loci (QTLs) that modulate levels of at highly conserved CpGs using liver methylome data from mouse strains belonging to the BXD family. A regulatory hotspot on chromosome 5 had highest density trans-acting QTLs (trans-meQTLs) associated with multiple distant CpGs. We refer this locus as meQTL.5a. Trans-modulated showed age-dependent changes were enriched in developmental genes, including...
Abstract Linked-read whole genome sequencing methods, such as the 10x Chromium, attach a unique molecular barcode to each high weight DNA molecule. The samples are then sequenced using short-read technology. During analysis, sequence reads sharing same aligned adjacent genomic locations. pattern of between regions allows discovery large structural variants (SVs) in range 1 Kb few Mb. Most SV calling methods for these data, LongRanger, analyze one sample at time and often produces...