- Genomics and Phylogenetic Studies
- Microbial Community Ecology and Physiology
- Protist diversity and phylogeny
- Environmental DNA in Biodiversity Studies
- Genetic diversity and population structure
- Evolution and Paleontology Studies
- Species Distribution and Climate Change
- Gene expression and cancer classification
- Scientific Computing and Data Management
- Evolution and Genetic Dynamics
- Plant and animal studies
- Ecology and Vegetation Dynamics Studies
- Gut microbiota and health
- SARS-CoV-2 and COVID-19 Research
- Fractal and DNA sequence analysis
- Parasitic Infections and Diagnostics
- Video Analysis and Summarization
- Music and Audio Processing
- Natural Language Processing Techniques
- Ancient and Medieval Archaeology Studies
- Speech and dialogue systems
- Marine and environmental studies
- Plant Disease Resistance and Genetics
- Speech Recognition and Synthesis
- Linguistics and language evolution
Carnegie Department of Plant Biology
2020-2024
Carnegie Institution for Science
2020-2024
University of Copenhagen
2024
Heidelberg Institute for Theoretical Studies
2015-2021
Next generation sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification sequences obtained from diverse microbial environments. Phylogenetic placement methods determine how these fit into an evolutionary context. Previous implementations phylogenetic algorithms, such as the algorithm (EPA) included RAxML, or PPLACER, are being increasingly used for this purpose....
We present genesis, a library for working with phylogenetic data, and gappa, an accompanying command-line tool conducting typical analyses on such data. The tools target trees placements, sequences, taxonomies other relevant data types, offer high-level simplicity as well low-level customizability, are computationally efficient, well-tested field-proven.
Anthropogenic habitat loss and climate change are reducing species' geographic ranges, increasing extinction risk losses of genetic diversity. Although preserving diversity is key to maintaining adaptability, we lack predictive tools global estimates across ecosystems. We introduce a mathematical framework that bridges biodiversity theory population genetics understand the naturally occurring DNA mutations with decreasing habitat. By analyzing genomic variation 10,095 georeferenced...
Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including regularly updated phylogeny nextstrain.org. Here, we review the difficulties inferring reliable phylogenies by example snapshot comprising quality-filtered subset 8,736 out all 16,453 virus sequences available May 5, 2020 from gisaid.org. We find that it is difficult to infer these due large number in conjunction with low mutations. further rooting inferred degree confidence...
Abstract Some protists with microsporidian‐like cell biological characters, including Mitosporidium , Paramicrosporidium and Nucleophaga have SSU rRNA gene sequences that are much less divergent than canonical Microsporidia. We analysed the phylogenetic placement environmental diversity of lineages group near base fungal radiation show they in a clade metchnikovellids microsporidians, to exclusion Rozella line what is currently known their morphology biology. These results scope...
High-throughput DNA metabarcoding of amplicon sizes below 500 bp has revolutionized the analysis environmental microbial diversity. However, these short regions contain limited phylogenetic signal, which makes it impractical to use in full inferences. This lesser resolution amplicons may be overcome by new long-read sequencing technologies. To test this idea, we amplified soil and used PacBio Circular Consensus Sequencing (CCS) obtain an ~4500-bp region spanning most eukaryotic small subunit...
Background The exponential decrease in molecular sequencing cost generates unprecedented amounts of data. Hence, scalable methods to analyze these data are required. Phylogenetic (or Evolutionary) Placement identify the evolutionary provenance anonymous sequences with respect a given reference phylogeny. This increasingly popular method is deployed for scrutinizing metagenomic samples from environments such as water, soil, or human gut. Novel Here, we present novel and, more importantly,...
Abstract Motivation Previously we presented swarm, an open-source amplicon clustering programme that produces fine-scale molecular operational taxonomic units (OTUs) are free of arbitrary global thresholds. Here, present swarm v3 to address issues contemporary datasets growing towards tera-byte sizes. Results When compared with previous versions, has modernized C++ source code, reduced memory footprint by up 50%, optimized CPU-usage and multithreading (more than 7 times faster default...
Abstract Summary Pool sequencing is an efficient method for capturing genome-wide allele frequencies from multiple individuals, with broad applications such as studying adaptation in Evolve-and-Resequence experiments, monitoring of genetic diversity wild populations, and genotype-to-phenotype mapping. Here, we present grenedalf, a command line tool written C++ that implements common population statistics θ, Tajima’s D, FST sequencing. It orders magnitude faster than current tools, focused on...
Priority effects, where arrival order and initial relative abundance modulate local species interactions, can exert taxonomic, functional, evolutionary influences on ecological communities by driving them to alternative states. It remains unclear if these wide-ranging consequences of priority effects be explained systematically a common underlying factor. Here, we identify such factor in an empirical system. In series field laboratory studies, focus how pH affects nectar-colonizing microbes...
Incongruence, or topological conflict, is prevalent in genome-scale data sets. Internode certainty (IC) and related measures were recently introduced to explicitly quantify the level of incongruence a given internal branch among set phylogenetic trees complement regular support (e.g., bootstrap, posterior probability) that instead assess statistical confidence inference. Since most phylogenomic studies contain partitions genes) with missing taxa IC scores stem from frequencies bipartitions...
Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including regularly updated phylogeny nextstrain.org . Here, we review the difficulties inferring reliable phylogenies by example snapshot comprising all virus sequences available May 5, 2020 from gisaid.org We find that it is difficult to infer these due large number in conjunction with low mutations. further rooting inferred degree confidence either via bat and pangolin outgroups or...
In most metagenomic sequencing studies, the initial analysis step consists in assessing evolutionary provenance of sequences. Phylogenetic (or Evolutionary) Placement methods can be employed to determine position sequences with respect a given reference phylogeny. These placement do however face certain limitations: The manual selection is labor-intensive; computational effort infer phylogenies substantially larger than for that rely on sequence similarity; number taxa phylogeny should small...
Pennycress ( Thlaspi arvense ) is a promising intermediate oilseed crop, producing oil suitable for conversion to biofuels—including aviation fuels. While domestication efforts are ongoing, deeper understanding of the genetic architecture traits crucial informing future breeding efforts. Here, we conducted largest genomic and phenotypic survey pennycress date, analyzing 739 accessions collected across four continents. Leveraging whole-genome sequencing field-collected phenotypes,...
We developed grenepipe, an all-in-one Snakemake workflow to streamline the data processing from raw high-throughput sequencing of individuals or populations genotype variant calls. Our pipeline offers a range popular software tools within single configuration file, automatically installs dependencies, is highly optimized for scalability in cluster environments and runs with command.grenepipe published under GPLv3 freely available at github.com/moiexpositoalonsolab/grenepipe.
Abstract The change in allele frequencies within a population over time represents fundamental process of evolution. By monitoring frequencies, we can analyze the effects natural selection and genetic drift on populations. To efficiently track time-resolved change, large experimental or wild populations be sequenced as pools individuals sampled using high-throughput genome sequencing (called Evolve & Resequence approach, E&R). Here, present set experiments hundreds genotypes model...
Abstract Next Generation Sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification sequences obtained from diverse microbial environments. To achieve this, phylogenetic placement methods determine how these fit into an evolutionary context. Previous implementations algorithms, such as the Evolutionary Placement Algorithm (EPA) included RAxML, or pplacer , are...
Summary We present GENESIS, a library for working with phylogenetic data, and GAPPA, an accompanying command line tool conducting typical analyses on such data. The tools target trees placements, sequences, taxonomies, other relevant data types, offer high-level simplicity as well low-level customizability, are computationally efficient, well-tested, field-proven. Availability Implementation Both GENESIS GAPPA written in modern C++11, freely available under GPLv3 at...
Abstract Incongruence, or topological conflict, is prevalent in genome-scale data sets but relatively few measures have been developed to quantify it. Internode Certainty (IC) and related were recently introduced explicitly the level of incongruence a given internode (or internal branch) among set phylogenetic trees complement regular branch support statistics assessing confidence inferred relationships. Since most phylogenomic studies contain partitions (e.g., genes) with missing taxa IC...
Abstract Dinophytes are widely distributed in marine‐ and fresh‐waters, but have yet to be conclusively documented terrestrial environments. Here, we evaluated the presence of these protists from an environmental DNA metabarcoding dataset Neotropical rainforest soils. Using a phylogenetic placement approach with reference alignment tree, showed that numerous sequencing reads were phylogenetically placed as dinophytes did not correlate taxonomic assignment, preference, nutritional mode, or...
1 Abstract The exponential decrease in molecular sequencing cost generates unprecedented amounts of data. Hence, scalable methods to analyze these data are required. Phylogenetic (or Evolutionary) Placement identify the evolutionary provenance anonymous sequences with respect a given reference phylogeny. This increasingly popular method is deployed for scrutinizing metagenomic samples from environments such as water, soil, or human gut. Here, we present novel and, more importantly, highly...