NFDI4DS | UHH-SEMS - Publication Details

A Survey of K-mer Methods and Applications in Bioinformatics

OPENALEX - Publications

Camille Moeckel Manvita Mareboina Maxwell A. Konnaris Candace S. Y. Chan Ioannis Mouratidis and 4 more

The rapid progression of genomics and proteomics has been driven by the advent advanced sequencing technologies, large, diverse, readily available omics datasets, evolution computational data processing capabilities. vast amount generated these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large offering several advantages in speed memory efficiency carrying potential for intrinsic biological functionality....

10.1016/j.csbj.2024.05.025 article EN cc-by-nc-nd Computational and Structural Biotechnology Journal 2024-05-21

Ribosomal DNA arrays are the most H-DNA rich element in the human genome

OPENALEX - Publications

Nikol Chantzi Candace S. Y. Chan Michail Patsakis Akshatha Nayak Austin Montgomery and 2 more

Repetitive DNA sequences can form noncanonical structures such as H-DNA. The new telomere-to-telomere genome assembly for the human has eliminated gaps, enabling examination of highly repetitive regions including centromeric and pericentromeric repeats ribosomal arrays. We find that H-DNA appears once every 25 000 base pairs in genome. Its distribution is inhomogeneous with motif hotspots being detectable acrocentric chromosomes. Ribosomal arrays are genomic element a 40.94-fold enrichment....

10.1093/nargab/lqaf012 article EN cc-by NAR Genomics and Bioinformatics 2025-01-07

Quasi-prime peptides: identification of the shortest peptide sequences unique to a species

OPENALEX - Publications

Ioannis Mouratidis Candace S. Y. Chan Nikol Chantzi Georgios Christos Tsiatsianis Martin Hemberg and 2 more

Determining the organisms present in a biosample has many important applications agriculture, wildlife conservation, and healthcare. Here, we develop universal fingerprint based on identification of short peptides that are unique to specific organism. We define quasi-prime as sequences found only one species, analyzed proteomes from 21 875 viruses humans, annotated smallest peptide kmer species absent all other proteomes. also perform simulations across reference observe lower than expected...

10.1093/nargab/lqad039 article EN cc-by-nc NAR Genomics and Bioinformatics 2023-03-29

MAFin: Motif Detection in Multiple Alignment Files

OPENALEX - Publications

Michail Patsakis Kimonas Provatas Fotis A. Baltoumas Nikol Chantzi Ioannis Mouratidis and 2 more

Whole Genome and Proteome Alignments, represented by the Multiple Alignment File (MAF) format, have become a standard approach in comparative genomics proteomics. These often require identifying conserved motifs, which is crucial for understanding functional evolutionary relationships. However, current approaches lack direct method motif detection within MAF files. We present MAFin, novel tool that enables efficient conservation analysis files to address this gap, streamlining genomic...

10.1093/bioinformatics/btaf125 article EN cc-by Bioinformatics 2025-03-19

Identification of the shortest species-specific oligonucleotide sequences

OPENALEX - Publications

Ioannis Mouratidis Maxwell A. Konnaris Nikol Chantzi Candace S. Y. Chan Michail Patsakis and 8 more

Despite the exponential increase in sequencing information driven by massively parallel DNA technologies, universal and succinct genomic fingerprints for each organism are still missing. Identifying shortest species-specific nucleotide sequences offers insights into species evolution holds potential practical applications agriculture, wildlife conservation, healthcare. We propose a new method sequence analysis termed nucleic “quasi-primes,” occurring of 45,076 organismal reference genomes,...

10.1101/gr.280070.124 article EN Genome Research 2025-01-02

Unraveling diversity by isolating peptide sequences specific to distinct taxonomic groups

OPENALEX - Publications

Eleftherios Bochalis Michail Patsakis Nikol Chantzi Ioannis Mouratidis Dionysios V. Chartoumpekis and 1 more

Abstract The identification of succinct, universal fingerprints that enable the characterization individual taxonomies can reveal insights into trait development and have widespread applications in pathogen diagnostics, human healthcare, ecology biomes. Here, we investigated existence peptide k-mer sequences are exclusively present a specific taxonomy absent every other taxonomic level, termed quasi-primes. By analyzing proteomes across 24,073 species, identified quasi-prime peptides to...

10.1101/2025.02.05.636664 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2025-02-08

ZSeeker: An optimized algorithm for Z-DNA detection in genomic sequences

OPENALEX - Publications

Guliang Wang Ioannis Mouratidis Kimonas Provatas Nikol Chantzi Michail Patsakis and 2 more

Z-DNA is an alternative left-handed helical form of DNA with a zigzag-shaped backbone that differs from the right-handed canonical B-DNA helix. has been implicated in various biological processes, including transcription, replication, and repair, can induce genetic instability. Repetitive sequences alternating purines pyrimidines have potential to adopt structures. ZSeeker novel computational tool developed for accurate detection Z-DNA-forming genomes, addressing limitations prior methods....

10.1101/2025.02.07.637205 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2025-02-10

invertiaDB: a database of inverted repeats across organismal genomes

OPENALEX - Publications

Kimonas Provatas Nikol Chantzi Nafsika Amptazi Michail Patsakis Akshatha Nayak and 4 more

Abstract Inverted repeats are repetitive elements that can form hairpin and cruciform structures. They linked to genomic instability; however, they also have various biological functions. Their distribution differs markedly across taxonomic groups in the tree of life, exhibit high polymorphism due their inherent instability. Advances sequencing technologies declined costs enabled generation an ever-growing number complete genomes for organisms life. However, a comprehensive database...

10.1093/nar/gkaf329 article EN cc-by-nc Nucleic Acids Research 2025-04-15

The repertoire of short tandem repeats across the tree of life

OPENALEX - Publications

Nikol Chantzi Ilias Georgakopoulos-Soares

Abstract Short tandem repeats (STRs) are widespread, dynamic repetitive elements with a number of biological functions and relevance to human diseases. However, their prevalence across taxa remains poorly characterized. Here we examined the impact STRs in genomes 117,253 organisms spanning tree life. We find that there large differences frequencies between organismal these largely driven by taxonomic group an organism belongs to. Using simulated genomes, on average is no enrichment bacterial...

10.1101/2024.08.08.607201 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-08-09

Peptide absent sequences emerging in human cancers

OPENALEX - Publications

Georgios Christos Tsiatsianis Candace S. Y. Chan Ioannis Mouratidis Nikol Chantzi Anna Maria Tsiatsiani and 4 more

10.1016/j.ejca.2023.113421 article EN European Journal of Cancer 2023-11-07

The determinants of the rarity of nucleic and peptide short sequences in nature

OPENALEX - Publications

Nikol Chantzi Manvita Mareboina Maxwell A. Konnaris Austin Montgomery Michail Patsakis and 2 more

Abstract The prevalence of nucleic and peptide short sequences across organismal genomes proteomes has not been thoroughly investigated. We examined 45 785 reference 21 871 proteomes, spanning archaea, bacteria, eukaryotes viruses to calculate the rarity in them. To capture this, we developed a metric each sequence nature, index. find that frequency certain dipeptides rare oligopeptide is hundreds times lower than expected, which case for any dinucleotides. also generate predictive...

10.1093/nargab/lqae029 article EN cc-by NAR Genomics and Bioinformatics 2024-04-04

kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species

OPENALEX - Publications

Ioannis Mouratidis Fotis A. Baltoumas Nikol Chantzi Michail Patsakis Candace S. Y. Chan and 9 more

The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array organisms. Nevertheless, no established repository that details organism-specific genomic proteomic sequences specific lengths, referred to as kmers, exists our knowledge. In this article, we present kmerDB, a database accessible through interactive web interface provides kmer-based information from systematic way. kmerDB currently contains 202,340,859,107 base pairs...

10.1016/j.csbj.2024.04.050 article EN cc-by-nc-nd Computational and Structural Biotechnology Journal 2024-04-21

Characterization of hairpin loops and cruciforms across 118,065 genomes spanning the tree of life

OPENALEX - Publications

Nikol Chantzi Camille Moeckel Candace S. Y. Chan Akshatha Nayak Guliang Wang and 4 more

Inverted repeats (IRs) can form alternative DNA secondary structures called hairpins and cruciforms, which have a multitude of functional roles been associated with genomic instability. However, their prevalence across diverse organismal genomes remains only partially understood. Here, we examine the IRs 118,065 complete genomes. Our comprehensive analysis taxonomic subdivisions reveals significant differences in distribution, frequency, biophysical properties perfect among these We identify...

10.1101/2024.09.29.615628 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2024-09-29

Quadrupia: Derivation of G-quadruplexes for organismal genomes across the tree of life

OPENALEX - Publications

Nikol Chantzi Akshatha Nayak Fotis A. Baltoumas Eleni Aplakidou Shiau Wei Liew and 14 more

G-quadruplex DNA structures exhibit a profound influence on essential biological processes, including transcription, replication, telomere maintenance, and genomic stability. These have demonstrably shaped organismal evolution. However, comprehensive, organism-wide map encompassing the diversity of life has remained elusive. Here, we introduce Quadrupia, most extensive well-characterized database to date, facilitating exploration across evolutionary spectrum. Quadrupia identified sequences...

10.1101/2024.07.09.602008 preprint EN 2024-07-11

Microsatellites Explorer: A Database of Short Tandem Repeats Across Genomes

OPENALEX - Publications

Kimonas Provatas Nikol Chantzi Michail Patsakis Akshatha Nayak Ioannis Mouratidis and 1 more

Short tandem repeats (STRs) are widespread, repetitive elements, with a number of biological functions and among the most rapidly mutating regions in genome. Their distribution varies significantly between taxonomic groups tree life highly polymorphic within human population. Advances sequencing technologies coupled decreasing costs have enabled generation an ever-growing complete genomes. Additionally, arrival accurate long reads has facilitated Telomere-to-Telomere (T2T) assemblies...

10.1016/j.csbj.2024.10.041 article EN cc-by Computational and Structural Biotechnology Journal 2024-10-28

Frequentmers - a novel way to look at metagenomic next generation sequencing data and an application in detecting liver cirrhosis

OPENALEX - Publications

Ioannis Mouratidis Nikol Chantzi Umair Khan Maxwell A. Konnaris Candace S. Y. Chan and 3 more

Abstract Early detection of human disease is associated with improved clinical outcomes. However, many diseases are often detected at an advanced, symptomatic stage where patients past efficacious treatment periods and can result in less favorable Therefore, methods that accurately detect a presymptomatic urgently needed. Here, we introduce “frequentmers”; short sequences specific recurrently observed either patient or healthy control samples, but not both. We showcase the utility...

10.1186/s12864-023-09861-w article EN cc-by BMC Genomics 2023-12-12

The determinants of the rarity of nucleic and peptide short sequences in nature

OPENALEX - Publications

Nikol Chantzi Ioannis Mouratidis Manvita Mareboina Maxwell A. Konnaris Austin Montgomery and 1 more

Abstract The prevalence of nucleic and peptide short sequences across organismal genomes proteomes has not been thoroughly investigated. Here we examined 45,785 reference 21,871 proteomes, spanning archaea, bacteria, viruses eukaryotes to calculate the rarity in them. To capture this, developed a metric each sequence nature, Anti-Kardashian index. We find that frequency certain dipeptides rare oligopeptide is hundreds times lower than expected, which case for any dinucleotides. also generate...

10.1101/2023.09.24.559219 preprint EN cc-by-nd bioRxiv (Cold Spring Harbor Laboratory) 2023-09-25

kmerDB: A Database Encompassing the Set of Genomic and Proteomic Sequence Information for Each Species

OPENALEX - Publications

Ioannis Mouratidis Fotis A. Baltoumas Nikol Chantzi Candace S. Y. Chan Austin Montgomery and 7 more

ABSTRACT The rapid decline in sequencing cost has enabled the generation of reference genomes and proteomes for a growing number organisms. However, at present time, there is no established repository that provides information about organism-specific genomic proteomic sequences certain lengths, also known as kmers, are either or absent each genome proteome. In this article, we kmerDB, database accessible through an interactive web interface kmer based from systematic way. kmerDB currently...

10.1101/2023.11.13.566926 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2023-11-16

Frequentmers - a novel way to look at metagenomic Next Generation Sequencing data and an application in detecting liver cirrhosis

OPENALEX - Publications

Ioannis Mouratidis Nikol Chantzi Umair Khan Maxwell A. Konnaris Candace S. Y. Chan and 2 more

Abstract Early detection of human disease is associated with improved clinical outcomes. However, many diseases are often detected at an advanced, symptomatic stage where patients past efficacious treatment periods and can result in less favorable Therefore, methods that accurately detect a presymptomatic urgently needed. Here, we introduce “frequentmers”; short sequences specific recurrently observed either patient or healthy control samples, but not both. We showcase the utility...

10.1101/2023.09.19.23295771 preprint EN medRxiv (Cold Spring Harbor Laboratory) 2023-09-19

Ribosomal DNA arrays are the most H-DNA rich element in the human genome

OPENALEX - Publications

Nikol Chantzi Michail Patsakis Akshatha Nayak Austin Montgomery Ioannis Mouratidis and 1 more

Abstract Repetitive DNA sequences can form non-canonical structures such as H-DNA which is an intramolecular triplex structure. The new Telomere-to-Telomere (T2T) genome assembly for the human has eliminated gaps, enabling examination of highly repetitive regions including centromeric and pericentromeric repeats ribosomal arrays. This gapless allows distribution in parts that were not previously annotated. We find appears once every 30,000 bps genome. Its inhomogeneous with motif hotspots...

10.1101/2024.07.12.602585 preprint EN 2024-07-13

invertiaDB: A Database of Inverted Repeats Across Organismal Genomes

OPENALEX - Publications

Kimonas Provatas Nikol Chantzi Michail Patsakis Akshatha Nayak Ioannis Mouratidis and 2 more

Inverted repeats are repetitive elements that can form hairpin and cruciform structures. They linked to genomic instability, however they also have various biological functions. Their distribution differs markedly across taxonomic groups in the tree of life, exhibit high polymorphism due their inherent instability. Advances sequencing technologies declined costs enabled generation an ever-growing number complete genomes for organisms life. However, a comprehensive database encompassing...

10.1101/2024.11.11.622808 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-11-13

The topography of nullomer-emerging mutations and their relevance to human disease

OPENALEX - Publications

Candace S. Y. Chan Ioannis Mouratidis Austin Montgomery Georgios Christos Tsiatsianis Nikol Chantzi and 3 more

10.1016/j.csbj.2024.12.026 article EN cc-by-nc-nd Computational and Structural Biotechnology Journal 2024-12-25