Ian Korf

ORCID: 0000-0001-5259-6182
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • Molecular Biology Techniques and Applications
  • Cancer-related molecular mechanisms research
  • Gut microbiota and health
  • RNA and protein synthesis mechanisms
  • RNA modifications and cancer
  • Genomics and Chromatin Dynamics
  • Chromosomal and Genetic Variations
  • Epigenetics and DNA Methylation
  • Microbial Community Ecology and Physiology
  • Animal Genetics and Reproduction
  • RNA Research and Splicing
  • Infant Nutrition and Health
  • Single-cell and spatial transcriptomics
  • Genetic Mapping and Diversity in Plants and Animals
  • Helicobacter pylori-related gastroenterology studies
  • Plant Molecular Biology Research
  • Genetics, Bioinformatics, and Biomedical Research
  • Machine Learning in Bioinformatics
  • Autism Spectrum Disorder Research
  • RNA Interference and Gene Delivery
  • Bioinformatics and Genomic Networks
  • Genetic Syndromes and Imprinting
  • Evolution and Genetic Dynamics
  • Genetics, Aging, and Longevity in Model Organisms

University of California, Davis
2015-2025

Institut de Biologie Moléculaire et Cellulaire
2009-2013

University of California System
2012

Wellcome Sanger Institute
2003-2008

University of Dundee
2007

J. Craig Venter Institute
2007

Imperial College London
2007

New England Biolabs (United States)
2007

Washington University in St. Louis
1999-2003

National Human Genome Research Institute
2000

Computational gene prediction continues to be an important problem, especially for genomes with little experimental data.I introduce the SNAP finder which has been designed easily adaptable a variety of genomes. In novel without appropriate finder, I demonstrate that employing foreign can produce highly inaccurate results, and most compatible parameters may not come from nearest phylogenetic neighbor. find finders are more usefully employed bootstrap parameter estimation resulting...

10.1186/1471-2105-5-59 article EN cc-by BMC Bioinformatics 2004-05-14

Abstract Motivation: The numbers of finished and ongoing genome projects are increasing at a rapid rate, providing the catalog genes for these new genomes is key challenge. Obtaining set well-characterized basic requirement in initial steps any annotation process. An accurate needed order to learn about species-specific properties, train gene-finding programs, validate automatic predictions. Unfortunately, many lack comprehensive experimental data derive reliable genes. Results: In this...

10.1093/bioinformatics/btm071 article EN cc-by-nc Bioinformatics 2007-03-01

We have developed a portable and easily configurable genome annotation pipeline called MAKER. Its purpose is to allow investigators independently annotate eukaryotic genomes create databases. MAKER identifies repeats, aligns ESTs proteins genome, produces ab initio gene predictions, automatically synthesizes these data into annotations having evidence-based quality indices. also trainable: Outputs of preliminary runs are used retrain its gene-prediction algorithm, producing higher-quality...

10.1101/gr.6743907 article EN Genome Research 2007-11-19

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into most comprehensive library Perl modules available for managing manipulating life-science information. provides easy-to-use, stable, consistent programming interface bioinformatics application programmers. have been successfully repeatedly used to reduce otherwise complex tasks only a few lines code. object model proven be...

10.1101/gr.361602 article EN Genome Research 2002-10-01

The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly such into high-quality, finished sequences remains challenging. Many tools are available, but they differ greatly in terms their performance (speed, scalability, hardware requirements, acceptance newer read technologies) final output (composition assembled sequence). More importantly, it largely unclear how best assess the quality sequences. Assemblathon competitions...

10.1186/2047-217x-2-10 article EN GigaScience 2013-07-22

The zebrafish is an important vertebrate model for the mutational analysis of genes effecting developmental processes. Understanding relationship between and mutations with those humans will require understanding syntenic correspondence human genomes. High throughput gene EST mapping projects in are now facilitating this goal. Map positions 523 ESTs predicted orthologs reveal extensive contiguous blocks synteny Eighty percent analyzed belong to conserved groups (two or more linked both...

10.1101/gr.144700 article EN cc-by-nc Genome Research 2000-09-01

Parasitic nematodes that cause elephantiasis and river blindness threaten hundreds of millions people in the developing world. We have sequenced approximately 90 megabase (Mb) genome human filarial parasite Brugia malayi predict 11,500 protein coding genes 71 Mb robustly assembled sequence. Comparative analysis with free-living, model nematode Caenorhabditis elegans revealed that, despite these having maintained little conservation local synteny during 350 million years evolution, they...

10.1126/science.1145406 article EN Science 2007-09-20

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe Assemblathon 1 competition, which aimed to comprehensively assess state art in methods when applied current technologies. In collaborative effort, teams were asked assemble simulated Illumina HiSeq data set an unknown, diploid A total 41 assemblies from 17 different groups received. Novel haplotype aware...

10.1101/gr.126599.111 article EN cc-by-nc Genome Research 2011-09-16

Abstract Twinscan is a new gene-structure prediction system that directly extends the probability model of Genscan, allowing it to exploit homology between two related genomes. Separate models are used for conservation in exons, introns, splice sites, and UTRs, reflecting differences among their patterns evolutionary conservation. specifically designed analysis high-throughput genomic sequences containing an unknown number genes. In experiments on mouse sequences, using homologous from human...

10.1093/bioinformatics/17.suppl_1.s140 article EN Bioinformatics 2001-06-01

Abstract Background Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite importance, very little is known about the degree to which centromere repeats share common properties between different species across phyla. We used bioinformatic methods identify high-copy from 282 using publicly available genomic sequence our own data....

10.1186/gb-2013-14-1-r10 article EN cc-by Genome biology 2013-01-30

Genome sequencing projects have been initiated for a wide range of eukaryotes. A few reached completion, but most exist as draft assemblies. As one the main reasons to sequence genome is obtain its catalog genes, an important question how complete or completable in unfinished genomes. To answer this question, we identified set core eukaryotic genes (CEGs), that are extremely highly conserved and which believe present low copy numbers higher From analysis phylogenetically diverse assemblies,...

10.1093/nar/gkn916 article EN cc-by-nc Nucleic Acids Research 2008-11-28

Abstract Lettuce ( Lactuca sativa ) is a major crop and member of the large, highly successful Compositae family flowering plants. Here we present reference assembly for species family. This was generated using whole-genome shotgun Illumina reads plus in vitro proximity ligation data to create large superscaffolds; it validated genetically superscaffolds were oriented genetic bins ordered along nine chromosomal pseudomolecules. We identify several genomic features that may have contributed...

10.1038/ncomms14953 article EN cc-by Nature Communications 2017-04-12

Strand asymmetry in the distribution of guanines and cytosines, measured by GC skew, predisposes DNA sequences toward R-loop formation upon transcription. Previous work revealed that skew associate with a core set unmethylated CpG island (CGI) promoters human genome. Here, we show can distinguish four classes promoters, including three types CGI each associated unique epigenetic gene ontology signatures. In particular, identify strong weak class these loci are enriched distinct chromosomal...

10.1101/gr.158436.113 article EN cc-by-nc Genome Research 2013-07-18

Tissue-specific DNA methylation is found at promoters, enhancers, and CpG islands but also over larger genomic regions. In most human tissues, the vast majority of genome highly methylated (>70%). Recently, sequencing bisulfite-treated (MethylC-seq) has revealed large partially domains (PMDs) in some cell lines. PMDs cover up to 40% are associated with gene repression inactive chromatin marks. However, date, only cultured cells cancers have shown evidence for PMDs. Here, we performed...

10.1073/pnas.1215145110 article EN Proceedings of the National Academy of Sciences 2013-03-25

Abstract Gene regulatory elements are central drivers of phenotypic variation and thus critical importance towards understanding the genetics complex traits. The Functional Annotation Animal Genomes consortium was formed to collaboratively annotate functional in animal genomes, starting with domesticated animals. Here we present an expansive collection datasets from eight diverse tissues three important agricultural species: chicken ( Gallus gallus ), pig Sus scrofa cattle Bos taurus )....

10.1038/s41467-021-22100-8 article EN cc-by Nature Communications 2021-03-23

Abstract Summary: Identifying and masking repetitive elements is usually the first step when analyzing vertebrate genomic sequence. Current repeat identification software sensitive but slow, creating a costly bottleneck in large-scale analyses. We have developed MaskerAid , enhancement to RepeatMasker that increased speed of more than 30-fold at most setting. Availability: On request from authors (see http://sapiens.wustl.edu/MaskerAid). Contact: maskeraid@watson.wustl.edu These contributed...

10.1093/bioinformatics/16.11.1040 article EN Bioinformatics 2000-11-01

Abstract Introns that elevate mRNA accumulation have been found in a wide range of eukaryotes. However, not all introns affect gene expression, and direct testing is currently the only way to identify stimulatory introns. Our genome-wide analysis Arabidopsis thaliana revealed promoter-proximal as group are compositionally distinct from distal degree which an individual intron matches profile strong predictor its ability increase expression. We sequences responsible for elevating expression...

10.1105/tpc.107.057190 article EN cc-by-nc The Plant Cell 2008-03-01

Sorghum bicolor is a close relative of maize and staple crop in Africa much the developing world because its superior tolerance arid growth conditions. We have generated sequence from hypomethylated portion sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% genes been tagged, with an average coverage 65% across their length. Remarkably, this level gene discovery was accomplished after generating raw less than 300 megabases 735-megabase genome....

10.1371/journal.pbio.0030013 article EN cc-by PLoS Biology 2004-12-27

In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). size and complexity these prompted much speculation as feasibility completing conifer genome sequence. Conifer are reputed be highly repetitive, but there little information available on nature identity repetitive units gymnosperms. pines extensive...

10.1186/1471-2164-11-420 article EN cc-by BMC Genomics 2010-07-07

Introns in a wide range of organisms including plants, animals and fungi are able to increase the expression gene that they contained in. This process intron-mediated enhancement (IME) is most thoroughly studied Arabidopsis thaliana , where it has been shown enhancing introns typically located near promoter compositionally distinct from downstream introns. In this study, we perform comprehensive comparative analysis several sequenced plant genomes. We find sequences conserved multi-cellular...

10.1093/nar/gkr043 article EN Nucleic Acids Research 2011-03-22

Transcription factor-DNA interactions are some of the most important processes in biology because they directly control hereditary information. The targets transcription factor unknown. In this report, we introduce Bind-n-Seq, a new high-throughput method for analyzing protein-DNA vitro, with several advantages over current methods. procedure has three steps (i) binding proteins to randomized oligonucleotide DNA targets, (ii) sequencing bound massively parallel technology and (iii) finding...

10.1093/nar/gkp802 article EN cc-by-nc Nucleic Acids Research 2009-10-20

Three alloherpesviruses are known to cause disease in cyprinid fish: herpesviruses 1 and 3 (CyHV1 CyHV3) common carp koi herpesvirus 2 (CyHV2) goldfish. We have determined the genome sequences of CyHV1 CyHV2 compared them with published CyHV3 sequence. The genomes 291,144 290,304 bp, respectively, size, thus genome, at 295,146 remains largest recorded among herpesviruses. Each three consists a unique region flanked each terminus by sizeable direct repeat. CyHV1, CyHV2, predicted contain 137,...

10.1128/jvi.03206-12 article EN Journal of Virology 2012-12-27
Coming Soon ...