- Bioinformatics and Genomic Networks
- Lysosomal Storage Disorders Research
- Biomedical Text Mining and Ontologies
- RNA regulation and disease
- Cytomegalovirus and herpesvirus research
- Machine Learning in Bioinformatics
- Genomics and Rare Diseases
- Computational Drug Discovery Methods
- Cellular transport and secretion
- Genomics and Phylogenetic Studies
- Genetics and Neurodevelopmental Disorders
- RNA modifications and cancer
- Trypanosoma species research and implications
- Genomic variations and chromosomal abnormalities
- Cancer Genomics and Diagnostics
- Algorithms and Data Compression
- Biochemical and Molecular Research
- Genomics and Chromatin Dynamics
- Advanced Proteomics Techniques and Applications
- Microbial Metabolic Engineering and Bioproduction
- Glycosylation and Glycoproteins Research
- Fractal and DNA sequence analysis
- Computability, Logic, AI Algorithms
- Carbohydrate Chemistry and Synthesis
- Chronic Lymphocytic Leukemia Research
BioMarin (United States)
2016-2023
Yale University
2014-2016
Whitney Museum of American Art
2015
Mayo Clinic
2015
Indiana University Bloomington
2008-2014
Miami University
2014
Indiana University
2008
Automated annotation of protein function is challenging. As the number sequenced genomes rapidly grows, overwhelming majority products can only be annotated computationally. If computational predictions are to relied upon, it crucial that accuracy these methods high. Here we report results from first large-scale community-based critical assessment (CAFA) experiment. Fifty-four representing state art for prediction were evaluated on a target set 866 proteins 11 organisms. Two findings stand...
A major bottleneck in our understanding of the molecular underpinnings life is assignment function to proteins. While experiments provide most reliable annotation proteins, their relatively low throughput and restricted purview have led an increasing role for computational prediction. However, assessing methods protein prediction tracking progress field remain challenging.We conducted second critical assessment functional (CAFA), a timed challenge assess that automatically assign function....
A common assumption in comparative genomics is that orthologous genes share greater functional similarity than do paralogous (the "ortholog conjecture"). Many methods used to computationally predict protein function are based on this assumption, even though it largely untested. Here we present the first large-scale test of ortholog conjecture using genomic data from human and mouse. We use experimentally derived functions more 8,900 genes, as well an independent microarray dataset, directly...
Abstract One of the most important tasks modern bioinformatics is development computational tools that can be used to understand and treat human disease. To date, a variety methods have been explored algorithms for candidate gene prioritization are gaining in their usefulness. Here, we propose an algorithm detecting gene–disease associations based on protein–protein interaction network, known associations, protein sequence, functional information at molecular level. Our method, PhenoPred,...
Understanding protein function is one of the keys to understanding life at molecular level. It also important in context human disease because many conditions arise as a consequence alterations function. The recent availability relatively inexpensive sequencing technology has resulted thousands complete or partially sequenced genomes with millions functionally uncharacterized proteins. Such large volume data, combined lack high-throughput experimental assays annotate proteins, attributes...
Abstract Motivation: The development of effective methods for the prediction ontological annotations is an important goal in computational biology, with protein function and disease gene prioritization gaining wide recognition. Although various algorithms have been proposed these tasks, evaluating their performance difficult owing to problems caused both by structure biomedical ontologies biased or incomplete experimental genes products. Results: We propose information-theoretic framework...
Investigating genomic structural variants at basepair resolution is crucial for understanding their formation mechanisms. We identify and analyse 8,943 deletion breakpoints in 1,092 samples from the 1000 Genomes Project. find have more nearby SNPs indels than average, likely a consequence of relaxed selection. By investigating correlation with DNA methylation, Hi–C interactions, histone marks substitution patterns nucleotides near them, we that signature non-allelic homologous recombination...
Significance Pseudogenes have long been considered nonfunctional elements. However, recent studies shown they can potentially regulate the expression of protein-coding genes. Capitalizing on available functional-genomics data and finished annotation human, worm, fly, we compared pseudogene complements across three phyla. We found that in contrast to genes, pseudogenes are highly lineage specific, reflecting genome history more so than conservation essential biological functions....
Abstract Background Metachromatic leukodystrophy (MLD) is a lysosomal storage disorder caused by mutations in the arylsulfatase A gene ( ARSA ) and categorized into three subtypes according to age of onset. The functional effect most mutants remains unknown; better understanding genotype–phenotype relationship required support newborn screening (NBS) guide treatment. Results We collected patient data set from literature that relates disease severity genotype 489 individuals with MLD....
Abstract Motivation: The automated functional annotation of biological macromolecules is a problem computational assignment concepts or ontological terms to genes and gene products. A number methods have been developed computationally annotate using standardized nomenclature such as Gene Ontology (GO). However, questions remain about the possibility for development accurate that can integrate disparate molecular data well an unbiased evaluation these methods. One important concern...
Continued advances in variant effect prediction are necessary to demonstrate the ability of machine learning methods accurately determine clinical impact variants unknown significance (VUS). Towards this goal, ARSA Critical Assessment Genome Interpretation (CAGI) challenge was designed characterize progress by utilizing 219 experimentally assayed missense VUS
GM1 gangliosidosis is a rare autosomal recessive genetic disorder caused by the disruption of GLB1 gene that encodes β-galactosidase, lysosomal hydrolase removes β-linked galactose from non-reducing end glycans. Deficiency this catabolic enzyme leads to accumulation and its asialo derivative GA1 in β-galactosidase deficient patients animal models. In addition GA1, there are other glycoconjugates contain whose metabolites substrates for β-galactosidase. For example, number N-linked glycan...
Abstract Prioritizing genes for translation to therapeutics common diseases has been challenging. Here, we propose an approach identify drug targets with high probability of success by focusing on both gain function (GoF) and loss (LoF) mutations associated opposing effects phenotype (Bidirectional Effect Selected Targets, BEST). We find 98 BEST a variety indications. Drugs targeting those are 3.8-fold more likely be approved than non-BEST genes. focus five ( IGF1R, NPPC, NPR2, FGFR3 , SHOX...
Previous chapter Next Full AccessProceedings Proceedings of the 2006 SIAM International Conference on Data Mining (SDM)Using Compression to Identify Classes Inauthentic TextsMehmet M. Dalkilic, Wyatt T. Clark, James C. Costello, and Predrag RadivojacMehmet Radivojacpp.604 - 608Chapter DOI:https://doi.org/10.1137/1.9781611972764.69PDFBibTexSections ToolsAdd favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract Recent events have made it clear that some kinds technical texts,...
Next-generation sequencing (NGS) technologies are yielding ever higher volumes of human genome sequence data. Given this large amount data, it has become both a possibility and priority to determine how disease-causing single nucleotide polymorphisms (SNPs) detected within gene regulatory regions (rSNPs) exert their effects on expression. Recently, several studies have explored whether attributes that can distinguish them from those neutral, attaining moderate success at discriminating...
The NAGLU challenge of the fourth edition Critical Assessment Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict impact variants unknown significance (VUS) on enzymatic activity lysosomal hydrolase α-N-acetylglucosaminidase (NAGLU). Deficiencies lead a rare, monogenic, recessive storage disorder, Sanfilippo syndrome type B (MPS IIIB). This attracted 17 submissions from 10 groups. We observed that top models were able missense mutations with Pearson's...
Given the large and expanding quantity of publicly available sequencing data, it should be possible to extract incidence information for monogenic diseases from allele frequencies, provided one knows which mutations are causal. We tested this idea on a rare, monogenic, lysosomal storage disorder, Sanfilippo Type B (Mucopolysaccharidosis type IIIB). is caused by in gene encoding α-N-acetylglucosaminidase (NAGLU). There were 189 NAGLU missense variants found ExAC dataset that comprises roughly...
Abstract While GWAS of common diseases has delivered thousands novel genetic findings, prioritizing genes for translation to therapeutics been challenging. Here, we propose an approach resolve that issue by identifying have both gain function (GoF) and loss (LoF) mutations associated with opposing effects on phenotype (Bidirectional Effect Selected Targets, BEST). Bidirectionality is a desirable feature the best targets because it implies causal role in one direction modulating target...
Abstract Given the large and expanding quantity of publicly available sequencing data, it should be possible to extract incidence information for monogenic diseases from allele frequencies, provided one knows which mutations are causal. We tested this idea on a rare, monogenic, lysosomal storage disorder, Sanfilippo Type B (Mucopolysaccharidosis type IIIB). is caused by in gene encoding α-N-acetylglucosaminidase (NAGLU). There were 189 NAGLU missense variants found ExAC dataset that...