- Biomedical Text Mining and Ontologies
- Genomics and Phylogenetic Studies
- Semantic Web and Ontologies
- Scientific Computing and Data Management
- Genetics, Bioinformatics, and Biomedical Research
- Species Distribution and Climate Change
- Bioinformatics and Genomic Networks
- Gene expression and cancer classification
- Evolution and Paleontology Studies
- Research Data Management Practices
- Genetic diversity and population structure
- Biomedical and Engineering Education
- Data Analysis with R
- Cell Image Analysis Techniques
- Genetic Mapping and Diversity in Plants and Animals
- Evolution and Genetic Dynamics
- Mathematics, Computing, and Information Processing
- Philosophy and History of Science
- Molecular Biology Techniques and Applications
- Genomics and Chromatin Dynamics
- Identification and Quantification in Food
- Radiomics and Machine Learning in Medical Imaging
- Image Processing Techniques and Applications
- Medical and Biological Sciences
- Natural Language Processing Techniques
Duke University
2015-2024
University Hospital Carl Gustav Carus
2023
TU Dresden
2023
Center for Genomic Science
2015-2019
John Innes Centre
2019
National Evolutionary Synthesis Center
2007-2016
University of Chicago
2015
University of South Dakota
2015
James Hutton Institute
2014
Lawrence Berkeley National Laboratory
2014
The tissue-specific pattern of mRNA expression can indicate important clues about gene function. High-density oligonucleotide arrays offer the opportunity to examine patterns on a genome scale. Toward this end, we have designed custom that interrogate vast majority protein-encoding human and mouse genes used them profile panel 79 61 tissues. resulting data set provides for thousands predicted genes, as well known poorly characterized from mice humans. We explored global trends in expression,...
The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into most comprehensive library Perl modules available for managing manipulating life-science information. provides easy-to-use, stable, consistent programming interface bioinformatics application programmers. have been successfully repeatedly used to reduce otherwise complex tasks only a few lines code. object model proven be...
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack community-wide, consensus-based, human- machine-interpretable language for describing phenotypes genomic environmental contexts is perhaps most pressing scientific bottleneck integration across many key fields in biology, including genomics, systems development, medicine, evolution, ecology, systematics. Here we survey phenomics...
In many domains the rapid generation of large amounts data is fundamentally changing how research done. The deluge presents great opportunities, but also challenges in managing, analyzing and sharing data. However, good training resources for researchers looking to develop skills that will enable them be more effective productive are scarce there little space existing curriculum courses or additional lectures. To address this need we have developed an introductory two-day intensive workshop,...
In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, what may used for. To make such an computer-accessible requires standards for exchanging richly annotated data. The challenges conveying reusable are particularly acute in regard to evolutionary comparative analysis, which comprises ever-expanding list types, methods, research aims, subdisciplines. facilitate interoperability we present NeXML, XML standard...
Rab GTPases and SNARE fusion proteins direct cargo trafficking through the exocytic endocytic pathways of eukaryotic cells. We have used steady state mRNA expression profiling computational hierarchical clustering methods to generate a global overview distribution Rabs, SNAREs, coat machinery components, as well their respective adaptors, effectors, regulators in 79 human 61 mouse nonredundant tissues. now show that this systems biology approach can be define building blocks for membrane...
The importance of data archiving, sharing, and public access to has received considerable attention. Awareness is growing among scientists that collaborative databases can facilitate these activities.We provide a detailed description the life history database developed by our Working Group at National Evolutionary Synthesis Center (NESCent) address questions about patterns evolution mortality demographic variability in wild primates.Examples from each seven primate species included...
The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations phylogenetic systematics is traditionally reported a free-text format, it therefore largely inaccessible for linkage to biological databases genetics, development, phenotypes, difficult manage large-scale integrative work. Phenoscape project aims represent these complex detailed with rich formal semantics that are amenable computation integration phenotype data from other fields biology....
Background Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships trait evolution. Traditionally, these descriptions expressed natural language within context individual journal publications or monographs. As such, this rich store phenotype data has largely unavailable for statistical computational comparisons across studies integration with other biological knowledge....
The rich knowledge of morphological variation among organisms reported in the systematic literature has remained free-text format, impractical for use large-scale synthetic phylogenetic work. This noncomputable format also precluded linkage to large knowledgebase genomic, genetic, developmental, and phenotype data model organism databases. We have undertaken an effort prototype a curated, ontology-based evolutionary morphology database that maps these genetic databases...
The reality of larger and molecular databases the need to integrate data scalably have presented a major challenge for use phenotypic data. Morphology is currently primarily described in discrete publications, entrenched noncomputer readable text, requires enormous investments time resources across large numbers taxa studies. Here we present new methodology, using ontology-based reasoning systems working with Phenoscape Knowledgebase (KB; kb.phenoscape.org), automatically amounts...
The rich phenotypic diversity that characterizes the vertebrate skeleton results from evolutionary changes in regulation of genes drive development. Although relatively little is known about underlie skeletal variation among fish species, significant knowledge genetics and development available for zebrafish. Because developmental processes are highly conserved, this can be leveraged understanding evolution diversity. We developed Phenoscape Knowledgebase (KB; http://kb.phenoscape.org) to...
The application of semantic technologies to the integration biological data and interoperability bioinformatics analysis visualization tools has been common theme a series annual BioHackathons hosted in Japan for past five years. Here we provide review activities outcomes from held 2011 Kyoto 2012 Toyama. In order efficiently implement life sciences, participants formed various sub-groups worked on following topics: Resource Description Framework (RDF) models specific domains, text mining...
With the sequencing and assembly of rat genome comes difficult task assigning functions to genes. Tissue localization gene expression gives some information about potential role a in physiology. Various examples utility multiple tissue data sets are illustrated here. First, we highlight their use finding genes that might play an important particular on basis exclusive or coexpression with known function. Second, show how this be used explain phenotypic differences between strains. Third,...
Synthetic science promises an unparalleled ability to find new meaning in old data, extant results, or previously unconnected methods and concepts, but pursuing synthesis can be a difficult risky endeavor. Our experience as biologists, informaticians, educators at the National Evolutionary Synthesis Center has affirmed that yield major insights, also revealed technological hurdles, prevailing academic culture, general confusion about nature of hamper its progress. By presenting our view what...
The skeleton is of fundamental importance in research comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by questions that require computational access to reasoning across the diverse skeletal phenotypes vertebrates, we developed a module anatomical concepts for system, Vertebrate Skeletal Anatomy Ontology (VSAO), accommodate unify existing terminologies species-specific (mouse, frog Xenopus, zebrafish) multispecies (teleost,...
Phenotypes resulting from mutations in genetic model organisms can help reveal candidate genes for evolutionarily important phenotypic changes related taxa. Although testing gene hypotheses experimentally nonmodel is typically difficult, ontology-driven information systems generate testable about developmental processes tractable organisms. Here, we tested suggested by expert use of the Phenoscape Knowledgebase, specifically looking that are candidates responsible interesting phenotypes...
Abstract Background A hierarchical taxonomy of organisms is a prerequisite for semantic integration biodiversity data. Ideally, there would be single, expansive, authoritative that includes extinct and extant taxa, information on synonyms common names, monophyletic supraspecific taxa reflect our current understanding phylogenetic relationships. Description As step towards development such resource, to enable large-scale phenotypic data across vertebrates, we created the Vertebrate Taxonomy...
Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists' research increasingly depends on the quality availability upon which their works built. To highlight some these issues share experiences, First Workshop Sustainable Software Science: Practice Experiences (WSSSPE1) was held in November 2013 conjunction with SC13 Conference. The workshop featured keynote presentations large number (54) solicited...
Scientists rarely reuse expert knowledge of phylogeny, in spite years ofeffort to assemble a great "Tree Life" (ToL). A notableexception involves the use Phylomatic, which provides tools togenerate custom phylogenies from large, pre-computed, phylogeny ofplant taxa. This suggests potential for more generalized systemthat, starting with query consisting list any known species, wouldrectify non-standard names, identify containing theimplicated taxa, prune away unneeded parts, and supply branch...
How should funding agencies enable researchers to explore high-risk but potentially high-reward science? One model that appears work is the NSF-funded synthesis center, an incubator for community-led, innovative science.
Linking phenotypic with genotypic diversity has become a major requirement for basic and applied genome-centric biological research. To meet this need, comprehensive database backend efficiently storing, querying analyzing large experimental data sets is necessary. Chado, generic, modular, community-based schema widely used in the community to store information associated genome sequence data. need also accommodate large-scale phenotyping genotyping projects, new Chado module called Natural...