Nikol Chantzi

ORCID: 0009-0005-4947-0745
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • RNA and protein synthesis mechanisms
  • Genomics and Phylogenetic Studies
  • Machine Learning in Bioinformatics
  • Chromosomal and Genetic Variations
  • Genomics and Chromatin Dynamics
  • RNA modifications and cancer
  • Liver Disease and Transplantation
  • Glycosylation and Glycoproteins Research
  • Bacteriophages and microbial interactions
  • Metabolomics and Mass Spectrometry Studies
  • Advanced Proteomics Techniques and Applications
  • Liver Disease Diagnosis and Treatment
  • Molecular Biology Techniques and Applications
  • Advanced biosensing and bioanalysis techniques
  • Cellular Automata and Applications
  • Genetic Mapping and Diversity in Plants and Animals
  • DNA and Nucleic Acid Chemistry
  • Plant Molecular Biology Research
  • Music and Audio Processing
  • vaccines and immunoinformatics approaches
  • Gene Regulatory Network Analysis
  • Genomic variations and chromosomal abnormalities
  • Handwritten Text Recognition Techniques
  • Genomics and Rare Diseases
  • Molecular Junctions and Nanostructures

Pennsylvania State University
2023-2025

Penn State Milton S. Hershey Medical Center
2023-2025

The rapid progression of genomics and proteomics has been driven by the advent advanced sequencing technologies, large, diverse, readily available omics datasets, evolution computational data processing capabilities. vast amount generated these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large offering several advantages in speed memory efficiency carrying potential for intrinsic biological functionality....

10.1016/j.csbj.2024.05.025 article EN cc-by-nc-nd Computational and Structural Biotechnology Journal 2024-05-21

Repetitive DNA sequences can form noncanonical structures such as H-DNA. The new telomere-to-telomere genome assembly for the human has eliminated gaps, enabling examination of highly repetitive regions including centromeric and pericentromeric repeats ribosomal arrays. We find that H-DNA appears once every 25 000 base pairs in genome. Its distribution is inhomogeneous with motif hotspots being detectable acrocentric chromosomes. Ribosomal arrays are genomic element a 40.94-fold enrichment....

10.1093/nargab/lqaf012 article EN cc-by NAR Genomics and Bioinformatics 2025-01-07

Determining the organisms present in a biosample has many important applications agriculture, wildlife conservation, and healthcare. Here, we develop universal fingerprint based on identification of short peptides that are unique to specific organism. We define quasi-prime as sequences found only one species, analyzed proteomes from 21 875 viruses humans, annotated smallest peptide kmer species absent all other proteomes. also perform simulations across reference observe lower than expected...

10.1093/nargab/lqad039 article EN cc-by-nc NAR Genomics and Bioinformatics 2023-03-29

Whole Genome and Proteome Alignments, represented by the Multiple Alignment File (MAF) format, have become a standard approach in comparative genomics proteomics. These often require identifying conserved motifs, which is crucial for understanding functional evolutionary relationships. However, current approaches lack direct method motif detection within MAF files. We present MAFin, novel tool that enables efficient conservation analysis files to address this gap, streamlining genomic...

10.1093/bioinformatics/btaf125 article EN cc-by Bioinformatics 2025-03-19

Despite the exponential increase in sequencing information driven by massively parallel DNA technologies, universal and succinct genomic fingerprints for each organism are still missing. Identifying shortest species-specific nucleotide sequences offers insights into species evolution holds potential practical applications agriculture, wildlife conservation, healthcare. We propose a new method sequence analysis termed nucleic “quasi-primes,” occurring of 45,076 organismal reference genomes,...

10.1101/gr.280070.124 article EN Genome Research 2025-01-02

Abstract The identification of succinct, universal fingerprints that enable the characterization individual taxonomies can reveal insights into trait development and have widespread applications in pathogen diagnostics, human healthcare, ecology biomes. Here, we investigated existence peptide k-mer sequences are exclusively present a specific taxonomy absent every other taxonomic level, termed quasi-primes. By analyzing proteomes across 24,073 species, identified quasi-prime peptides to...

10.1101/2025.02.05.636664 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2025-02-08

Z-DNA is an alternative left-handed helical form of DNA with a zigzag-shaped backbone that differs from the right-handed canonical B-DNA helix. has been implicated in various biological processes, including transcription, replication, and repair, can induce genetic instability. Repetitive sequences alternating purines pyrimidines have potential to adopt structures. ZSeeker novel computational tool developed for accurate detection Z-DNA-forming genomes, addressing limitations prior methods....

10.1101/2025.02.07.637205 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2025-02-10

Abstract Inverted repeats are repetitive elements that can form hairpin and cruciform structures. They linked to genomic instability; however, they also have various biological functions. Their distribution differs markedly across taxonomic groups in the tree of life, exhibit high polymorphism due their inherent instability. Advances sequencing technologies declined costs enabled generation an ever-growing number complete genomes for organisms life. However, a comprehensive database...

10.1093/nar/gkaf329 article EN cc-by-nc Nucleic Acids Research 2025-04-15

Abstract Short tandem repeats (STRs) are widespread, dynamic repetitive elements with a number of biological functions and relevance to human diseases. However, their prevalence across taxa remains poorly characterized. Here we examined the impact STRs in genomes 117,253 organisms spanning tree life. We find that there large differences frequencies between organismal these largely driven by taxonomic group an organism belongs to. Using simulated genomes, on average is no enrichment bacterial...

10.1101/2024.08.08.607201 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-08-09

Abstract The prevalence of nucleic and peptide short sequences across organismal genomes proteomes has not been thoroughly investigated. We examined 45 785 reference 21 871 proteomes, spanning archaea, bacteria, eukaryotes viruses to calculate the rarity in them. To capture this, we developed a metric each sequence nature, index. find that frequency certain dipeptides rare oligopeptide is hundreds times lower than expected, which case for any dinucleotides. also generate predictive...

10.1093/nargab/lqae029 article EN cc-by NAR Genomics and Bioinformatics 2024-04-04

The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array organisms. Nevertheless, no established repository that details organism-specific genomic proteomic sequences specific lengths, referred to as kmers, exists our knowledge. In this article, we present kmerDB, a database accessible through interactive web interface provides kmer-based information from systematic way. kmerDB currently contains 202,340,859,107 base pairs...

10.1016/j.csbj.2024.04.050 article EN cc-by-nc-nd Computational and Structural Biotechnology Journal 2024-04-21

Inverted repeats (IRs) can form alternative DNA secondary structures called hairpins and cruciforms, which have a multitude of functional roles been associated with genomic instability. However, their prevalence across diverse organismal genomes remains only partially understood. Here, we examine the IRs 118,065 complete genomes. Our comprehensive analysis taxonomic subdivisions reveals significant differences in distribution, frequency, biophysical properties perfect among these We identify...

10.1101/2024.09.29.615628 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2024-09-29

G-quadruplex DNA structures exhibit a profound influence on essential biological processes, including transcription, replication, telomere maintenance, and genomic stability. These have demonstrably shaped organismal evolution. However, comprehensive, organism-wide map encompassing the diversity of life has remained elusive. Here, we introduce Quadrupia, most extensive well-characterized database to date, facilitating exploration across evolutionary spectrum. Quadrupia identified sequences...

10.1101/2024.07.09.602008 preprint EN 2024-07-11

Short tandem repeats (STRs) are widespread, repetitive elements, with a number of biological functions and among the most rapidly mutating regions in genome. Their distribution varies significantly between taxonomic groups tree life highly polymorphic within human population. Advances sequencing technologies coupled decreasing costs have enabled generation an ever-growing complete genomes. Additionally, arrival accurate long reads has facilitated Telomere-to-Telomere (T2T) assemblies...

10.1016/j.csbj.2024.10.041 article EN cc-by Computational and Structural Biotechnology Journal 2024-10-28

Abstract Early detection of human disease is associated with improved clinical outcomes. However, many diseases are often detected at an advanced, symptomatic stage where patients past efficacious treatment periods and can result in less favorable Therefore, methods that accurately detect a presymptomatic urgently needed. Here, we introduce “frequentmers”; short sequences specific recurrently observed either patient or healthy control samples, but not both. We showcase the utility...

10.1186/s12864-023-09861-w article EN cc-by BMC Genomics 2023-12-12

Abstract The prevalence of nucleic and peptide short sequences across organismal genomes proteomes has not been thoroughly investigated. Here we examined 45,785 reference 21,871 proteomes, spanning archaea, bacteria, viruses eukaryotes to calculate the rarity in them. To capture this, developed a metric each sequence nature, Anti-Kardashian index. We find that frequency certain dipeptides rare oligopeptide is hundreds times lower than expected, which case for any dinucleotides. also generate...

10.1101/2023.09.24.559219 preprint EN cc-by-nd bioRxiv (Cold Spring Harbor Laboratory) 2023-09-25

ABSTRACT The rapid decline in sequencing cost has enabled the generation of reference genomes and proteomes for a growing number organisms. However, at present time, there is no established repository that provides information about organism-specific genomic proteomic sequences certain lengths, also known as kmers, are either or absent each genome proteome. In this article, we kmerDB, database accessible through an interactive web interface kmer based from systematic way. kmerDB currently...

10.1101/2023.11.13.566926 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2023-11-16

Abstract Early detection of human disease is associated with improved clinical outcomes. However, many diseases are often detected at an advanced, symptomatic stage where patients past efficacious treatment periods and can result in less favorable Therefore, methods that accurately detect a presymptomatic urgently needed. Here, we introduce “frequentmers”; short sequences specific recurrently observed either patient or healthy control samples, but not both. We showcase the utility...

10.1101/2023.09.19.23295771 preprint EN medRxiv (Cold Spring Harbor Laboratory) 2023-09-19

Abstract Repetitive DNA sequences can form non-canonical structures such as H-DNA which is an intramolecular triplex structure. The new Telomere-to-Telomere (T2T) genome assembly for the human has eliminated gaps, enabling examination of highly repetitive regions including centromeric and pericentromeric repeats ribosomal arrays. This gapless allows distribution in parts that were not previously annotated. We find appears once every 30,000 bps genome. Its inhomogeneous with motif hotspots...

10.1101/2024.07.12.602585 preprint EN 2024-07-13

Inverted repeats are repetitive elements that can form hairpin and cruciform structures. They linked to genomic instability, however they also have various biological functions. Their distribution differs markedly across taxonomic groups in the tree of life, exhibit high polymorphism due their inherent instability. Advances sequencing technologies declined costs enabled generation an ever-growing number complete genomes for organisms life. However, a comprehensive database encompassing...

10.1101/2024.11.11.622808 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-11-13
Coming Soon ...