Paul Medvedev

ORCID: 0000-0003-3143-594X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • Algorithms and Data Compression
  • Chromosomal and Genetic Variations
  • RNA and protein synthesis mechanisms
  • Genomics and Chromatin Dynamics
  • Gene expression and cancer classification
  • Genome Rearrangement Algorithms
  • Genetic and Clinical Aspects of Sex Determination and Chromosomal Abnormalities
  • Food Industry and Aquatic Biology
  • DNA and Biological Computing
  • Advanced biosensing and bioanalysis techniques
  • Machine Learning in Bioinformatics
  • Caching and Content Delivery
  • Genomic variations and chromosomal abnormalities
  • Advanced Graph Theory Research
  • Data Mining Algorithms and Applications
  • Advanced Scientific Research Methods
  • Molecular Biology Techniques and Applications
  • Genetic diversity and population structure
  • Optimization and Search Problems
  • CRISPR and Genetic Engineering
  • Animal Nutrition and Health
  • Microbial Community Ecology and Physiology
  • Agricultural Productivity and Crop Improvement
  • Cancer Genomics and Diagnostics

Ural State University of Economics
2025

Pennsylvania State University
2015-2024

Orenburg State University
2016-2022

Park University
2020

University of Pennsylvania
2016-2019

University of California, San Diego
2011-2013

Hospital for Sick Children
2013

SickKids Foundation
2013

University of Toronto
2007-2011

Bielefeld University
2009

Arang Rhie Shane McCarthy Olivier Fédrigo Joana Damas Giulio Formenti and 95 more Sergey Koren Marcela Uliano‐Silva William Chow Arkarachai Fungtammasan Ju‐Wan Kim Chul Lee Byung June Ko Mark Chaisson Gregory Gedman Lindsey Cantin Françoise Thibaud‐Nissen Leanne Haggerty Iliana Bista Michelle Smith Bettina Haase Jacquelyn Mountcastle Sylke Winkler Sadye Paez Jason T. Howard Sonja C. Vernes Tanya M. Lama Frank Grützner Wesley C. Warren Christopher N. Balakrishnan David W. Burt Julia M. George Matthew T. Biegler David Iorns Andrew Digby Daryl Eason Bruce C. Robertson Taylor Edwards Mark Wilkinson George F. Turner Axel Meyer Andreas F. Kautt Paolo Franchini H. William Detrich Hannes Svardal Maximilian Wagner Gavin J. P. Naylor Martin Pippel Milan Malinsky Mark P. Mooney Maria Simbirsky Brett T. Hannigan Trevor Pesout Marlys L. Houck Ann C. Misuraca Sarah B. Kingan Richard Hall Zev Kronenberg Ivan Sović Christopher Dunn Zemin Ning Alex Hastie Joyce Lee Siddarth Selvaraj Richard E. Green Nicholas H. Putnam Marta Gut Jay Ghurye Erik Garrison Ying Sims Joanna Collins Sarah Pelan James Torrance Alan Tracey Jonathan Wood Robel E. Dagnew Dengfeng Guan Sarah E. London David F. Clayton Claudio V. Mello Samantha R. Friedrich Peter V. Lovell Ekaterina Osipova Farooq O. Al-Ajli Simona Secomandi Heebal Kim Constantina Theofanopoulou Michael Hiller Yang Zhou Robert S. Harris Kateryna D. Makova Paul Medvedev Jinna Hoffman Patrick Masterson Karen Clark Fergal J. Martin Kevin Howe Paul Flicek Brian P. Walenz Woori Kwak Hiram Clawson

Abstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, biodiversity conservation. However, such available only a few non-microbial species 1–4 . To address this issue, international Genome 10K (G10K) consortium 5,6 has worked over five-year period evaluate develop cost-effective methods assembling highly accurate nearly genomes. Here we present lessons learned from generating 16 that represent six major vertebrate...

10.1038/s41586-021-03451-0 article EN cc-by Nature 2021-04-28

Abstract Motivation: Genome assembly tools based on the de Bruijn graph framework rely a parameter k, which represents trade-off between several competing effects that are difficult to quantify. There is currently lack of would automatically estimate best k use and/or quickly generate histograms k-mer abundances allow user make an informed decision. Results: We develop fast and accurate sampling method constructs approximate abundance with orders magnitude performance improvement over...

10.1093/bioinformatics/btt310 article EN Bioinformatics 2013-06-03

10.1038/s41586-023-06457-y article EN Nature 2023-08-23

As the quantity of data per sequencing experiment increases, challenges fragment assembly are becoming increasingly computational. The de Bruijn graph is a widely used structure in algorithms, to represent information from set reads. Compaction an important reduction step most based algorithms where long simple paths compacted into single vertices. has recently become bottleneck pipelines, and improving its running time memory usage problem.We present algorithm tool bcalm 2 for compaction...

10.1093/bioinformatics/btw279 article EN cc-by-nc Bioinformatics 2016-06-11

Abstract Oxford Nanopore (ONT) is a leading long-read technology which has been revolutionizing transcriptome analysis through its capacity to sequence the majority of transcripts from end-to-end. This greatly increased our ability study diversity transcription mechanisms such as initiation, termination, and alternative splicing. However, ONT still suffers high error rates have thus far limited scope reference-based analyses. When reference not available or viable option due reference-bias,...

10.1038/s41467-020-20340-8 article EN cc-by Nature Communications 2021-01-04

Abstract Apes possess two sex chromosomes—the male-specific Y chromosome and the X chromosome, which is present in both males females. The crucial for male reproduction, with deletions being linked to infertility 1 . vital reproduction cognition 2 Variation mating patterns brain function among apes suggests corresponding differences their chromosomes. However, owing repetitive nature incomplete reference assemblies, ape chromosomes have been challenging study. Here, using methodology...

10.1038/s41586-024-07473-2 article EN cc-by Nature 2024-05-29

The development of high-throughput sequencing (HTS) technologies has opened the door to novel methods for detecting copy number variants (CNVs) in human genome. While past CNVs have been detected based on array CGH data, recent studies shown that depth-of-coverage information from HTS can also be used reliable identification large copy-variable regions. Such methods, however, are hindered by biases lead certain regions genome over- or undersampled, lowering their resolution and ability...

10.1101/gr.106344.110 article EN cc-by-nc Genome Research 2010-08-30

10.1016/j.tcs.2012.03.004 article EN publisher-specific-oa Theoretical Computer Science 2012-03-16
Arang Rhie Shane McCarthy Olivier Fédrigo Joana Damas Giulio Formenti and 95 more Sergey Koren Marcela Uliano‐Silva William Chow Arkarachai Fungtammasan Gregory Gedman Lindsey Cantin Françoise Thibaud‐Nissen Leanne Haggerty Chul Lee Byung June Ko Ju‐Wan Kim Iliana Bista Michelle Smith Bettina Haase Jacquelyn Mountcastle Sylke Winkler Sadye Paez Jason T. Howard Sonja C. Vernes Tanya M. Lama Frank Grützner Wesley C. Warren Christopher N. Balakrishnan David W. Burt Julia M. George Mathew Biegler David Iorns Andrew Digby Daryl Eason Taylor Edwards Mark Wilkinson George F. Turner Axel Meyer Andreas F. Kautt Paolo Franchini H. William Detrich Hannes Svardal Maximilian Wagner Gavin J. P. Naylor Martin Pippel Milan Malinsky Mark P. Mooney Maria Simbirsky Brett T. Hannigan Trevor Pesout Marlys L. Houck Ann C. Misuraca Sarah B. Kingan Richard Hall Zev Kronenberg Jonas Korlach Ivan Sović Christopher Dunn Zemin Ning Alex Hastie Joyce Lee Siddarth Selvaraj Richard E. Green Nicholas H. Putnam Jay Ghurye Erik Garrison Ying Sims Joanna Collins Sarah Pelan James Torrance Alan Tracey Jonathan Wood Dengfeng Guan Sarah E. London David F. Clayton Claudio V. Mello Samantha R. Friedrich Peter V. Lovell Ekaterina Osipova Farooq O. Al-Ajli Simona Secomandi Heebal Kim Constantina Theofanopoulou Yang Zhou Robert S. Harris Kateryna D. Makova Paul Medvedev Jinna Hoffman Patrick Masterson Karen Clark Fergal J. Martin Kevin Howe Paul Flicek Brian P. Walenz Woori Kwak Hiram Clawson Mark Diekhans Luis R Nassar Benedict Paten R.H. Kraus

Abstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, biodiversity conservation. However, such only available a few non-microbial species 1–4 . To address this issue, international Genome 10K (G10K) consortium 5,6 has worked over five-year period evaluate develop cost-effective methods assembling most accurate genomes date. Here we summarize these developments, introduce set quality standards, present lessons...

10.1101/2020.05.22.110833 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2020-05-23

Abstract Motivation: The continuing improvements to high-throughput sequencing (HTS) platforms have begun unfold a myriad of new applications. As result, error correction reads remains an important problem. Though several tools do excellent job correcting datasets where the are sampled close uniformly, problem coming from drastically non-uniform datasets, such as those single-cell sequencing, open. Results: In this article, we develop method Hammer for without any uniformity assumptions. is...

10.1093/bioinformatics/btr208 article EN cc-by-nc Bioinformatics 2011-06-14

The mammalian Y Chromosome sequence, critical for studying male fertility and dispersal, is enriched in repeats palindromes, thus, the most difficult component of genome to assemble. Previously, expensive labor-intensive BAC-based techniques were used sequence a handful species. Here, we present much faster more affordable strategy sequencing assembling Chromosomes sufficient quality comparative genomics analyses conservation genetics applications. combines flow sorting, short- long-read...

10.1101/gr.199448.115 article EN cc-by-nc Genome Research 2016-03-02

Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation instability. Yet profiling STRs from short-read sequencing data is challenging because their high error rates. Here, we developed STR-FM, short repeat using flank-based mapping, a computational pipeline that can detect the full spectrum STR alleles data, adapt emerging read-mapping algorithms, be applied heterogeneous samples (e.g., tumors, viruses, genomes...

10.1101/gr.185892.114 article EN cc-by-nc Genome Research 2015-03-30

Long-read sequencing of transcripts with Pacific Biosciences (PacBio) Iso-Seq and Oxford Nanopore Technologies has proven to be central the study complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving potential these technologies unfulfilled. A common bottleneck is dearth scalable accurate for clustering long reads according their gene family origin. To address this challenge, we develop isONclust,...

10.1089/cmb.2019.0299 article EN Journal of Computational Biology 2020-03-17

Whole genome shotgun assembly is the process of taking many short sequenced segments (reads) and reconstructing from which they originated. We demonstrate how technique bidirected network flow can be used to explicitly model double-stranded nature DNA for assembly. By combining an algorithm Chinese Postman Problem on graphs with construction a de Bruijn graph, we are able find shortest sequence that contains given set k-long molecules. This first exact polynomial time genome. Furthermore,...

10.1089/cmb.2009.0047 article EN Journal of Computational Biology 2009-08-01

de Bruijn graphs have been proposed as a data structure to facilitate the analysis of related whole genome sequences, in both population and comparative genomic settings. However, current approaches do not scale well many genomes large size (such mammalian genomes).In this article, we present TwoPaCo, simple scalable low memory algorithm for direct construction compacted graph from set complete genomes. We demonstrate that it can construct 100 simulated human less than day eight real...

10.1093/bioinformatics/btw609 article EN Bioinformatics 2016-09-21

The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, raised formidable computational challenges in genome assembly. One key advances that led to an improvement contig lengths been mate pairs, which facilitate assembly repeating regions. Mate pairs have algorithmically incorporated into most assemblers as various heuristic post-processing steps correct graph or link contigs scaffolds. Such methods...

10.1089/cmb.2011.0151 article EN Journal of Computational Biology 2011-10-14

10.1016/j.tcs.2011.05.021 article EN publisher-specific-oa Theoretical Computer Science 2011-05-23

Abstract Many disciplines, from human genetics and oncology to plant breeding, microbiology virology, commonly face the challenge of analyzing rapidly increasing numbers genomes. In case Homo sapiens , number sequenced genomes will approach hundreds thousands in next few years. Simply scaling up established bioinformatics pipelines not be sufficient for leveraging full potential such rich genomic datasets. Instead, novel, qualitatively different computational methods paradigms are needed. We...

10.1101/043430 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2016-03-12

The de Bruijn graph plays an important role in bioinformatics, especially the context of novo assembly. However, representation memory is a computational bottleneck for many assemblers. Recent papers proposed navigational data structure approach order to improve usage. We prove several theoretical space lower bounds show limitations these types approaches. further design and implement general (dbgfm) demonstrate its use on human whole-genome dataset, achieving usage 1.5 GB 46% improvement...

10.1089/cmb.2014.0160 article EN Journal of Computational Biology 2015-01-28
Coming Soon ...