Paul Medvedev
- Genomics and Phylogenetic Studies
- Algorithms and Data Compression
- Chromosomal and Genetic Variations
- RNA and protein synthesis mechanisms
- Genomics and Chromatin Dynamics
- Gene expression and cancer classification
- Genome Rearrangement Algorithms
- Genetic and Clinical Aspects of Sex Determination and Chromosomal Abnormalities
- Food Industry and Aquatic Biology
- DNA and Biological Computing
- Advanced biosensing and bioanalysis techniques
- Machine Learning in Bioinformatics
- Caching and Content Delivery
- Genomic variations and chromosomal abnormalities
- Advanced Graph Theory Research
- Data Mining Algorithms and Applications
- Advanced Scientific Research Methods
- Molecular Biology Techniques and Applications
- Genetic diversity and population structure
- Optimization and Search Problems
- CRISPR and Genetic Engineering
- Animal Nutrition and Health
- Microbial Community Ecology and Physiology
- Agricultural Productivity and Crop Improvement
- Cancer Genomics and Diagnostics
Ural State University of Economics
2025
Pennsylvania State University
2015-2024
Orenburg State University
2016-2022
Park University
2020
University of Pennsylvania
2016-2019
University of California, San Diego
2011-2013
Hospital for Sick Children
2013
SickKids Foundation
2013
University of Toronto
2007-2011
Bielefeld University
2009
Abstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, biodiversity conservation. However, such available only a few non-microbial species 1–4 . To address this issue, international Genome 10K (G10K) consortium 5,6 has worked over five-year period evaluate develop cost-effective methods assembling highly accurate nearly genomes. Here we present lessons learned from generating 16 that represent six major vertebrate...
Abstract Motivation: Genome assembly tools based on the de Bruijn graph framework rely a parameter k, which represents trade-off between several competing effects that are difficult to quantify. There is currently lack of would automatically estimate best k use and/or quickly generate histograms k-mer abundances allow user make an informed decision. Results: We develop fast and accurate sampling method constructs approximate abundance with orders magnitude performance improvement over...
As the quantity of data per sequencing experiment increases, challenges fragment assembly are becoming increasingly computational. The de Bruijn graph is a widely used structure in algorithms, to represent information from set reads. Compaction an important reduction step most based algorithms where long simple paths compacted into single vertices. has recently become bottleneck pipelines, and improving its running time memory usage problem.We present algorithm tool bcalm 2 for compaction...
Abstract Oxford Nanopore (ONT) is a leading long-read technology which has been revolutionizing transcriptome analysis through its capacity to sequence the majority of transcripts from end-to-end. This greatly increased our ability study diversity transcription mechanisms such as initiation, termination, and alternative splicing. However, ONT still suffers high error rates have thus far limited scope reference-based analyses. When reference not available or viable option due reference-bias,...
Abstract Apes possess two sex chromosomes—the male-specific Y chromosome and the X chromosome, which is present in both males females. The crucial for male reproduction, with deletions being linked to infertility 1 . vital reproduction cognition 2 Variation mating patterns brain function among apes suggests corresponding differences their chromosomes. However, owing repetitive nature incomplete reference assemblies, ape chromosomes have been challenging study. Here, using methodology...
The development of high-throughput sequencing (HTS) technologies has opened the door to novel methods for detecting copy number variants (CNVs) in human genome. While past CNVs have been detected based on array CGH data, recent studies shown that depth-of-coverage information from HTS can also be used reliable identification large copy-variable regions. Such methods, however, are hindered by biases lead certain regions genome over- or undersampled, lowering their resolution and ability...
Abstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, biodiversity conservation. However, such only available a few non-microbial species 1–4 . To address this issue, international Genome 10K (G10K) consortium 5,6 has worked over five-year period evaluate develop cost-effective methods assembling most accurate genomes date. Here we summarize these developments, introduce set quality standards, present lessons...
Abstract Motivation: The continuing improvements to high-throughput sequencing (HTS) platforms have begun unfold a myriad of new applications. As result, error correction reads remains an important problem. Though several tools do excellent job correcting datasets where the are sampled close uniformly, problem coming from drastically non-uniform datasets, such as those single-cell sequencing, open. Results: In this article, we develop method Hammer for without any uniformity assumptions. is...
The mammalian Y Chromosome sequence, critical for studying male fertility and dispersal, is enriched in repeats palindromes, thus, the most difficult component of genome to assemble. Previously, expensive labor-intensive BAC-based techniques were used sequence a handful species. Here, we present much faster more affordable strategy sequencing assembling Chromosomes sufficient quality comparative genomics analyses conservation genetics applications. combines flow sorting, short- long-read...
Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation instability. Yet profiling STRs from short-read sequencing data is challenging because their high error rates. Here, we developed STR-FM, short repeat using flank-based mapping, a computational pipeline that can detect the full spectrum STR alleles data, adapt emerging read-mapping algorithms, be applied heterogeneous samples (e.g., tumors, viruses, genomes...
Long-read sequencing of transcripts with Pacific Biosciences (PacBio) Iso-Seq and Oxford Nanopore Technologies has proven to be central the study complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving potential these technologies unfulfilled. A common bottleneck is dearth scalable accurate for clustering long reads according their gene family origin. To address this challenge, we develop isONclust,...
Whole genome shotgun assembly is the process of taking many short sequenced segments (reads) and reconstructing from which they originated. We demonstrate how technique bidirected network flow can be used to explicitly model double-stranded nature DNA for assembly. By combining an algorithm Chinese Postman Problem on graphs with construction a de Bruijn graph, we are able find shortest sequence that contains given set k-long molecules. This first exact polynomial time genome. Furthermore,...
de Bruijn graphs have been proposed as a data structure to facilitate the analysis of related whole genome sequences, in both population and comparative genomic settings. However, current approaches do not scale well many genomes large size (such mammalian genomes).In this article, we present TwoPaCo, simple scalable low memory algorithm for direct construction compacted graph from set complete genomes. We demonstrate that it can construct 100 simulated human less than day eight real...
The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, raised formidable computational challenges in genome assembly. One key advances that led to an improvement contig lengths been mate pairs, which facilitate assembly repeating regions. Mate pairs have algorithmically incorporated into most assemblers as various heuristic post-processing steps correct graph or link contigs scaffolds. Such methods...
Abstract Many disciplines, from human genetics and oncology to plant breeding, microbiology virology, commonly face the challenge of analyzing rapidly increasing numbers genomes. In case Homo sapiens , number sequenced genomes will approach hundreds thousands in next few years. Simply scaling up established bioinformatics pipelines not be sufficient for leveraging full potential such rich genomic datasets. Instead, novel, qualitatively different computational methods paradigms are needed. We...
The de Bruijn graph plays an important role in bioinformatics, especially the context of novo assembly. However, representation memory is a computational bottleneck for many assemblers. Recent papers proposed navigational data structure approach order to improve usage. We prove several theoretical space lower bounds show limitations these types approaches. further design and implement general (dbgfm) demonstrate its use on human whole-genome dataset, achieving usage 1.5 GB 46% improvement...