Richard Durbin

ORCID: 0000-0002-9130-1006
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • Chromosomal and Genetic Variations
  • Genetic diversity and population structure
  • RNA and protein synthesis mechanisms
  • Genetic Associations and Epidemiology
  • Genetic Mapping and Diversity in Plants and Animals
  • Genomics and Chromatin Dynamics
  • Forensic and Genetic Research
  • Machine Learning in Bioinformatics
  • Genomics and Rare Diseases
  • Gene expression and cancer classification
  • Genetics, Aging, and Longevity in Model Organisms
  • Genomic variations and chromosomal abnormalities
  • CRISPR and Genetic Engineering
  • Algorithms and Data Compression
  • Epigenetics and DNA Methylation
  • Genetics, Bioinformatics, and Biomedical Research
  • Bioinformatics and Genomic Networks
  • Genetic and phenotypic traits in livestock
  • Evolution and Genetic Dynamics
  • RNA modifications and cancer
  • Fungal and yeast genetics research
  • Forensic Anthropology and Bioarchaeology Studies
  • Environmental DNA in Biodiversity Studies
  • Single-cell and spatial transcriptomics

University of Cambridge
1993-2025

Wellcome Sanger Institute
2015-2024

University of Edinburgh
2020

Aarhus University
2020

University of Michigan–Ann Arbor
2020

Michigan United
2020

University of Oxford
2017

Centre National de la Recherche Scientifique
1993-2017

Université de Montpellier
2017

University of St Andrews
2017

Summary: The Sequence Alignment/Map (SAM) format is a generic alignment for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It flexible in style, compact size, efficient random access the which from 1000 Genomes Project are released. SAMtools implements various utilities post-processing SAM format, such as indexing, variant caller viewer, thus provides universal tools processing alignments....

10.1093/bioinformatics/btp352 article EN cc-by-nc Bioinformatics 2009-06-08

Abstract Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for development fast and accurate read alignment programs. A first generation hash table-based methods has been developed, including MAQ, which is accurate, feature rich enough to align from a single individual. However, MAQ does not support gapped single-end reads, makes it unsuitable longer where indels may occur frequently. speed also concern when scaled up resequencing hundreds...

10.1093/bioinformatics/btp324 article EN cc-by-nc Bioinformatics 2009-05-18
Eric S. Lander Lauren Linton Bruce W. Birren Chad Nusbaum Michael C. Zody and 95 more Jennifer N. Baldwin Keri Devon Ken Dewar Michael P. Doyle William W. Fitzhugh Roel Funke Diane Gage Katrina L. Harris Andrew Heaford John G. Howland Lisa Kann Jessica A. Lehoczky R Paul Levine Paul McEwan Kevin McKernan James C. Meldrim Jill P. Mesirov Cher Miranda William Morris Jerome W. Naylor Christina Raymond Mark Rosetti Ralph Santos Andrew Sheridan Carrie Sougnez Nicole Stange-Thomann Nikola M. Stojanović Aravind Subramanian Dudley Wyman Jane Rogers John Sulston R. Ainscough Stephan Beck David Bentley John H. Burton Christopher Clee Nigel Carter Alan Coulson Rebecca Deadman Panos Deloukas Andrew Dunham Ian Dunham Richard Durbin Lisa French Darren Grafham Simon G. Gregory Tim Hubbard Sean Humphray Adrienne Hunt Matthew C. Jones Christine Lloyd Amanda A. McMurray Lucy Matthews Simon Mercer Sarah Milne James C. Mullikin Andrew J. Mungall R. W. Plumb Mark T. Ross R. Shownkeen Sarah Sims R Waterston Richard K. Wilson LaDeana W. Hillier John D. McPherson Marco A. Marra Elaine R. Mardis Lucinda A. Fulton Asif Chinwalla Kymberlie Pepin Warren Gish Stephanie L. Chissoe Michael C. Wendl Kim D. Delehaunty Tracie L. Miner Andrew Delehaunty Jason Kramer Lisa L. Cook Robert S. Fulton D. Johnson Patrick Minx Sandra W. Clifton Trevor Hawkins Elbert Branscomb Paul Predki Paul Richardson Sarah Wenning Tom Slezak Norman A. Doggett Jan‐Fang Cheng Anne S. Olsen Susan Lucas Christopher J. Elkin Edward C. Uberbacher M.E. Frazier

The human genome holds an extraordinary trove of information about development, physiology, medicine and evolution. Here we report the results international collaboration to produce make freely available a draft sequence genome. We also present initial analysis data, describing some insights that can be gleaned from sequence.

10.1038/35057062 article EN public-domain Nature 2001-02-01

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing diverse individuals from multiple populations. Here we report completion the project, having reconstructed genomes 2,504 26 populations using combination low-coverage sequencing, deep exome and dense microarray genotyping. We characterized broad spectrum variation, in total over 88 million variants (84.7 single nucleotide polymorphisms (SNPs), 3.6...

10.1038/nature15393 article EN cc-by-nc-sa Nature 2015-09-29

Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. available on the World Wide Web in UK at http://www.sanger.ac.uk/Software/Pfam/, Sweden http://www.cgb.ki.se/Pfam/, France http://pfam.jouy.inra.fr/ US http://pfam.wustl.edu/. The latest version (6.6) contains 3071 families, which match 69% proteins SWISS-PROT 39 TrEMBL 14. Structural data, where available, have been utilised to ensure that families correspond with structural domains, improve...

10.1093/nar/30.1.276 article EN cc-by-nc Nucleic Acids Research 2002-01-01

Abstract Summary: The variant call format (VCF) is a generic for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF usually stored in compressed manner can be indexed fast retrieval of variants from range positions on the reference genome. was developed 1000 Genomes Project, has also been adopted by other projects UK10K, dbSNP NHLBI Exome Project. VCFtools software suite that implements various utilities processing...

10.1093/bioinformatics/btr330 article EN cc-by-nc Bioinformatics 2011-06-07

Abstract Motivation: Many programs for aligning short sequencing reads to a reference genome have been developed in the last 2 years. Most of them are very efficient but inefficient or not applicable >200 bp because algorithms heavily and specifically tuned queries with low error rate. However, some platforms already produce longer others expected become available soon. For reads, hashing-based software such as BLAT SSAHA2 remain only choices. Nonetheless, these methods substantially...

10.1093/bioinformatics/btp698 article EN cc-by-nc Bioinformatics 2010-01-15

By characterizing the geographic and functional spectrum of human genetic variation, 1000 Genomes Project aims to build a resource help understand contribution disease. Here we describe genomes 1,092 individuals from 14 populations, constructed using combination low-coverage whole-genome exome sequencing. developing methods integrate information across several algorithms diverse data sources, provide validated haplotype map 38 million single nucleotide polymorphisms, 1.4 short insertions...

10.1038/nature11632 article EN cc-by-nc-sa Nature 2012-10-31

New sequencing technologies promise a new era in the use of DNA sequence. However, some these produce very short reads, typically few tens base pairs, and to reads effectively requires algorithms software. In particular, there is major issue efficiently aligning reference genome handling ambiguity or lack accuracy this alignment. Here we introduce concept mapping quality , measure confidence that read actually comes from position it aligned by algorithm. We describe software MAQ can build...

10.1101/gr.078212.108 article EN cc-by-nc Genome Research 2008-08-19

We present two algorithms in this paper: GeneWise, which predicts gene structure using similar protein sequences, and Genomewise, provides a final parse across cDNA- EST-defined spliced structure. Both are heavily used by the Ensembl annotation system. The GeneWise algorithm was developed from principled combination of hidden Markov models (HMMs). highly accurate can provide both complete structures when with correct evidence.

10.1101/gr.1865504 article EN cc-by-nc Genome Research 2004-05-01
Sergey Nurk Sergey Koren Arang Rhie Mikko Rautiainen Andrey V. Bzikadze and 95 more Alla Mikheenko Mitchell R. Vollger Nicolas Altemose Lev Uralsky Ariel Gershman Sergey Aganezov Savannah J. Hoyt Mark Diekhans Glennis A. Logsdon Michael Alonge Stylianos E. Antonarakis Matthew Borchers Gerard G. Bouffard Shelise Brooks Gina V. Caldas Nae-Chyun Chen Haoyu Cheng Chen-Shan Chin William Chow Leonardo Gomes de Lima Philip C. Dishuck Richard Durbin Tatiana Dvorkina Ian T. Fiddes Giulio Formenti Robert S. Fulton Arkarachai Fungtammasan Erik Garrison Patrick G. S. Grady Tina A. Graves-Lindsay Ira M. Hall Nancy F. Hansen Gabrielle A. Hartley Marina Haukness Kerstin Howe Michael W. Hunkapiller Chirag Jain Miten Jain Erich D. Jarvis Peter Kerpedjiev Melanie Kirsche Mikhail Kolmogorov Jonas Korlach Milinn Kremitzki Heng Li Valerie V. Maduro Tobias Marschall Ann M. Mc Cartney Jennifer McDaniel Danny E. Miller James C. Mullikin Eugene W. Myers Nathan D. Olson Benedict Paten Paul Peluso Pavel A. Pevzner David Porubský Tamara Potapova Е. И. Рогаев Jeffrey Rosenfeld Steven L. Salzberg Valérie Schneider Fritz J. Sedlazeck Kishwar Shafin Colin J. Shew Alaina Shumate Ying Sims Arian F. A. Smit Daniela C. Soto Ivan Sović Jessica M. Storer Aaron Streets Beth A. Sullivan Françoise Thibaud‐Nissen James Torrance Justin Wagner Brian P. Walenz Aaron M. Wenger Jonathan Wood Chunlin Xiao Stephanie M. Yan Alice Young Samantha Zarate Urvashi Surti Rajiv C. McCoy Megan Y. Dennis Ivan A. Alexandrov Jennifer L. Gerton Rachel J. O’Neill Winston Timp Justin M. Zook Michael C. Schatz Evan E. Eichler Karen H. Miga Adam M. Phillippy

Since its initial release in 2000, the human reference genome has covered only euchromatic fraction of genome, leaving important heterochromatic regions unfinished. Addressing remaining 8% Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors prior references, and introduces nearly 200 million base pairs containing 1956 gene predictions, 99 which are predicted to be...

10.1126/science.abj6987 article EN Science 2022-03-31

Abstract Motivation Rapid development in long-read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic genomes. However, haplotype divergence regions high heterozygosity often results assemblers creating two copies rather than one copy a region, leading to breaks contiguity compromising downstream steps such as gene annotation. Several tools have been developed resolve this problem. they either focus only on removing...

10.1093/bioinformatics/btaa025 article EN cc-by Bioinformatics 2020-01-19

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is comprehensive source stable automatic annotation human genome sequence, with confirmed gene predictions that have been integrated external data sources, and available as either an interactive web site or flat files. also open software engineering develop portable system able handle very genomes associated requirements from sequence...

10.1093/nar/30.1.38 article EN Nucleic Acids Research 2002-01-01

We report genome sequences of 17 inbred strains laboratory mice and identify almost ten times more variants than previously known. use these genomes to explore the phylogenetic history mouse examine functional consequences allele-specific variation on transcript abundance, revealing that at least 12% transcripts show a significant tissue-specific expression bias. By identifying candidate 718 quantitative trait loci we molecular nature their position relative genes vary according effect size...

10.1038/nature10413 article EN cc-by-nc-sa Nature 2011-09-01

We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large families. two novel non-sequence-based metrics correctness benchmarked number methods. The TreeBeST method from TreeFam shows best performance in our hands. also compared this approach clustering approaches for ortholog prediction, showing increase coverage using...

10.1101/gr.073585.107 article EN cc-by-nc Genome Research 2008-11-24

Databases of multiple sequence alignments are a valuable aid to protein classification and analysis. One the main challenges when constructing such database is simultaneously satisfy conflicting demands completeness on one hand quality alignment domain definitions other. The latter properties best dealt with by manual approaches, whereas in practice only amenable automatic methods. Herein we present based hidden Markov model profiles (HMMs), which combines high completeness. Our database,...

10.1002/(sici)1097-0134(199707)28:3<405::aid-prot10>3.0.co;2-l article EN Proteins Structure Function and Bioinformatics 1997-07-01

Abstract Summary We present YaHS, a user-friendly command-line tool for the construction of chromosome-scale scaffolds from Hi-C data. It can be run with single-line command, requires minimal input users (an assembly file and an alignment file) which is compatible similar tools provides results in multiple formats, thereby enabling rapid, robust scalable high-quality genome assemblies high accuracy contiguity. Availability implementation YaHS implemented C licensed under MIT License. The...

10.1093/bioinformatics/btac808 article EN cc-by Bioinformatics 2022-12-16

Knowledge of the complete genomic DNA sequence an organism allows a systematic approach to defining its genetic components. The provides access structures all genes, including those without known function, their control elements, and, by inference, proteins they encode, as well other biologically important sequences. Furthermore, is rich and permanent source information for design further biological studies study evolution through cross-species comparison. power this has been amply...

10.1038/990031 article EN public-domain Nature 1999-12-02
Coming Soon ...