- Genomics and Rare Diseases
- Genomics and Phylogenetic Studies
- RNA and protein synthesis mechanisms
- Genetic Associations and Epidemiology
- Machine Learning in Bioinformatics
- Cancer Genomics and Diagnostics
- RNA modifications and cancer
- Genomic variations and chromosomal abnormalities
- Nutrition, Genetics, and Disease
- RNA Research and Splicing
- Epigenetics and DNA Methylation
- BRCA gene mutations in cancer
- African Botany and Ecology Studies
- Probiotics and Fermented Foods
- Oil Palm Production and Sustainability
- Genomics and Chromatin Dynamics
- Advanced Biosensing Techniques and Applications
- Biomedical Text Mining and Ontologies
- Pharmaceutical industry and healthcare
- Biosensors and Analytical Detection
- Urban Agriculture and Sustainability
- Human-Animal Interaction Studies
- Advanced Proteomics Techniques and Applications
- Gene expression and cancer classification
- Chromosomal and Genetic Variations
Genome Institute of Singapore
2010-2018
University of Tartu
2014-2016
Agency for Science, Technology and Research
2013-2015
J. Craig Venter Institute
2007-2012
Fred Hutch Cancer Center
2001-2009
Illumina (United States)
2006-2007
Howard Hughes Medical Institute
2001
University of Washington
2000-2001
Shriners Hospitals for Children - Northern California
2000
Many missense substitutions are identified in single nucleotide polymorphism (SNP) data and large-scale random mutagenesis projects. Each amino acid substitution potentially affects protein function. We have constructed a tool that uses sequence homology to predict whether SIFT , which s orts i ntolerant f rom t olerant substitutions, classifies as tolerated or deleterious. A higher proportion of predicted be deleterious by gives an affected phenotype than scoring matrices three test cases....
The Sorting Intolerant from Tolerant (SIFT) algorithm predicts the effect of coding variants on protein function. It was first introduced in 2001, with a corresponding website that provides users predictions their variants. Since its release, SIFT has become one standard tools for characterizing missense variation. We have updated SIFT’s genome-wide prediction tool since our last publication 2009, and added new features to insertion/deletion (indel) tool. also show accuracy metrics...
Presented here is a genome sequence of an individual human. It was produced from ∼32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 bases (Mb) contiguous with approximately 7.5-fold coverage for any given region. We developed modified version the Celera assembler to facilitate identification comparison alternate alleles within this diploid genome. Comparison National Center Biotechnology Information human reference...
A major interest in human genetics is to determine whether a nonsynonymous single-base nucleotide polymorphism (nsSNP) gene affects its protein product and, consequently, impacts the carrier's health. We used SIFT (Sorting Intolerant From Tolerant) program predict that 25% of 3084 nsSNPs from dbSNP, public SNP database, would affect function. Some predicted function were variants known be associated with disease. Others artifacts discovery. Two reports have indicated there are thousands...
Abstract Background Next generation sequencing (NGS) platforms are currently being utilized for targeted of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and ABI SOLiD technologies same 260 kb in four individuals. Results Local characteristics contribute systematic variability coverage (>100-fold difference per-base coverage), resulting patterns...
The Estonian Biobank cohort is a volunteer-based sample of the resident adult population (aged ≥18 years). current number participants—close to 52000-—represents large proportion, 5%, population, making it ideally suited population-based studies. General practitioners (GPs) and medical personnel in special recruitment offices have recruited participants throughout country. At baseline, GPs performed standardized health examination participants, who also donated blood samples for DNA, white...
There is much interest in characterizing the variation a human individual, because this may elucidate what contributes significantly to person's phenotype, thereby enabling personalized genomics. We focus here on variants 'exome,' which set of exons genome, exome believed harbor functional variation. provide an analysis ∼12,500 that affect protein coding portion individual's genome. identified ∼10,400 nonsynonymous single nucleotide polymorphisms (nsSNPs) ∼15–20% are rare population. predict...
The human naive T cell repertoire is the repository of a vast array TCRs. However, factors that shape their hierarchical distribution and relationship with memory remain poorly understood. In this study, we used polychromatic flow cytometry to isolate highly pure CD8(+) cells, stringently defined multiple phenotypic markers, deep sequencing characterize corresponding portions respective TCR repertoires from four individuals. extent interindividual sharing overlap between compartments within...
Indels in the coding regions of a gene can either cause frameshifts or amino acid insertions/deletions. Frameshifting indels are that have length is not divisible by 3 and subsequently frameshifts. insertions/deletions block substitutions; we call these 3n indels. The new changes resulting from could potentially affect protein function. Therefore, construct SIFT Indel prediction algorithm for which achieves 82% accuracy, 81% sensitivity, specificity, precision, 0.63 MCC, 0.87 AUC 10-fold...
Abstract Each human has approximately 50 to 280 frameshifting indels, yet their implications are unknown. We created SIFT Indel, a prediction method for indels that 84% accuracy. The percentage of predicted be gene-damaging is negatively correlated with allele frequency. also show although the first indel in gene causes loss function, there tendency second compensate and restore protein function. Indel available at http://sift-dna.org/www/SIFT_indels2.html
Domestication has had a strong impact on the development of modern societies. We sequenced 200 genomes chocolate plant Theobroma cacao L. to show for first time our knowledge that single population, Criollo underwent domestication ~3600 years ago (95% CI: 2481-13,806 ago). also during process domestication, there was selection genes involved in metabolism colored protectants anthocyanins and stimulant theobromine, as well disease resistance genes. Our analyses domesticated populations T....
Database searching algorithms for proteins use scoring matrices based on average protein properties, and thus are dominated by globular proteins. However, since transmembrane regions of a in distinctly different environment than proteins, one would expect generalized substitution to be inappropriate regions.We present the PHAT (predicted hydrophobic transmembrane) matrix, which significantly outperforms previously published matrix searches with queries. We conclude that better can...
Advances in high-throughput genotyping and the International HapMap Project have enabled association studies at whole-genome level. We constructed panels of over 550,000 (HumanHap550) 650,000 (HumanHap650Y) SNP loci by choosing tag SNPs from all populations genotyped Project. These also contain additional content regions that historically been overrepresented diseases, such as nonsynonymous sites, MHC region, copy number variant mitochondrial DNA. estimate these cover majority common...
The International HapMap Consortium recently completed genotyping over 3.8 million single nucleotide polymorphisms (SNPs) in three major populations, and the results of studying patterns linkage disequilibrium indicate that characterization 300,000-500,000 tag SNPs is sufficient to provide good genomic coverage for linkage-disequilibrium-based association studies many populations. These whole-genome will be used dissect genetics complex diseases pharmacogenomic drug responses. As such,...
Background: Some health websites provide a public forum for consumers to post ratings and reviews on drugs. Drug are easily accessible comprehensible, unlike clinical trials published literature. Because the increasingly uses Internet as source of medical information, it is important know whether such information reliable.
Procedural guidelines for disclosure of incidental genomic information are lacking.We introduce a method and evaluated the impact returning results to population biobank participants with 16p11.2 copy number variants, which commonly associated neurodevelopmental disorders BMI imbalance. Of 7877 participants, 11 carriers were detected. Eight informed their carrier status surveyed 11-17 months later.All demonstrated preference disclosure. Although two experienced worry, all five survey...