- Biomedical Text Mining and Ontologies
- Bioinformatics and Genomic Networks
- Genomics and Phylogenetic Studies
- Semantic Web and Ontologies
- Machine Learning in Bioinformatics
- Growth Hormone and Insulin-like Growth Factors
- RNA Research and Splicing
- Gene expression and cancer classification
- Cancer, Hypoxia, and Metabolism
- Advanced Text Analysis Techniques
- Angiogenesis and VEGF in Cancer
- Genetic and Clinical Aspects of Sex Determination and Chromosomal Abnormalities
- RNA and protein synthesis mechanisms
- Estrogen and related hormone effects
- Congenital heart defects research
- Big Data and Business Intelligence
- DNA Repair Mechanisms
- Data Visualization and Analytics
- Cancer Genomics and Diagnostics
- Zebrafish Biomedical Research Applications
- Reproductive Biology and Fertility
- Data Quality and Management
- Genomics and Chromatin Dynamics
- Sexual Differentiation and Disorders
- Biomedical and Engineering Education
Zhejiang Normal University
2024
National Institutes of Health
1990-2013
Georgetown University
2004-2011
Georgetown University Medical Center
2004-2011
Iowa State University
2009
Eunice Kennedy Shriver National Institute of Child Health and Human Development
1991-2002
The Protein Information Resource (PIR) is an integrated public resource of protein informatics. To facilitate the sensible propagation and standardization annotation systematic detection errors, PIR has extended its superfamily concept developed SuperFamily (PIRSF) classification system. Based on evolutionary relationships whole proteins, this system allows both specific biological generic biochemical functions. adopts a network structure for from to subfamily levels. family members are...
The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation protein data to support genomic/proteomic research and scientific discovery. PIR, in collaboration with the Munich Center for Sequences (MIPS) Japan International Database (JIPID), produces PIR-International Sequence (PSD), major annotated sequence database domain, containing about 250 000 proteins. To improve coverage experimentally validated data, a bibliography submission system is...
We have identified a novel exon 11 of the human prolactin receptor (hPRLR) gene that is distinct from its rodent counterparts and demonstrated presence two short forms hPRLR (S1a S1b), which are derived alternative splicing exons 10 11. S1a encodes 376 amino acids (aa) contain partial unique 39-aa C-terminal region encoded by S1b 288 aa lack entire contains 3 at C terminus using shifted reading frame. These forms, were found in several normal tissues breast cancer cell lines, expressed as...
Abstract BioThesaurus is a web-based system designed to map comprehensive collection of protein and gene names entries in the UniProt Knowledgebase. Currently covering more than two million proteins, consists over 2.8 extracted from multiple molecular biological databases according database cross-references iProClass. The web site allows retrieval synonymous given identification sharing same names. Availability: accessible for online searching at Contact: hfliu@umbc.edu Supplementary...
Biomedical ontologies are emerging as critical tools in genomic and proteomic research, where complex data disparate resources need to be integrated. A number of describe properties that can attributed proteins. For example, protein functions described by the Gene Ontology (GO) human diseases SNOMED CT or ICD10. There is, however, a gap current set - one describes entities themselves their relationships. We have designed PRotein (PRO) facilitate annotation guide new experiments. The...
Prolactin receptors (PRLRs) are widely expressed, and multiple mRNA transcripts encoding PRLRs present in prolactin target tissues. The molecular basis for the control of PRLR gene expression is currently unknown. Analyses 5′-untranslated regions mRNAs expressed gonadal non-gonadal tissues their genomic organization revealed three alternative first exons designated as E11, E12, E13. Each these alternatively spliced to a common noncoding exon (exon 2, nucleotides −115 −56) that precedes third...
The expression of the prolactin receptor is under control two putative tissue-specific (PI, gonads; PII, liver) and one common (PIII) promoters (Hu, Z. Z., Zhuang, L., Dufau, M. L. (1996) J. Biol. Chem.271, 10242–10246). three promoter regions were co-localized to rat chromosomal locus 2ql6, in order 5′-PIII-PI-PII-3′. To investigate mechanisms gonad-specific utilization PI, domain, regulatory cis-elements, trans-factors identified gonadal cells. domain localized 152-base pair 5′...
Three promoters are operative in the rat prolactin receptor gene as follows: promoter I (PI) and II (PII) specific for gonads liver, respectively, III (PIII) is common to several tissues. To investigate mechanisms controlling activity of III, its regulatory elements transcription factors were characterized gonadal non-gonadal cells. The TATA-less PIII domain was localized region −437 −179 (ATG +1) containing 5′-flanking part non-coding first exon. Within domain, a functional...
Transcription of the prolactin receptor (PRLR) is under control multiple promoters. Following recent demonstration human non-coding exon 1, hE1(N) (hE1(N1)) and generic 1 hE1(3), we have identified their promoters characterized four other novel exons (hE1(N2-5)) that are alternatively spliced to a common 2 in tissues breast cancer cells. Genomic regions containing these exons, 5'-flanking intronic sequences, were determined order was established chromosome 5p14-13. Promoters utilized...
Functional analysis and interpretation of large-scale proteomics gene expression data require effective use bioinformatics tools public knowledge resources coupled with expert-guided examination. An integrated approach was used to analyze cellular pathways in response ionizing radiation. ATM, or ataxia-telangiectasia mutated , a serine-threonine protein kinase, plays critical roles radiation responses, including cell cycle arrest DNA repair. We analyzed responsive based on 2D-gel/MS...
Abstract Motivation: Our purpose is to develop a statistical modeling approach for cancer biomarker discovery and provide new insights into early detection. We propose the concept of dependence network, apply it identifying biomarkers, study difference between protein or gene samples from non-cancer subjects based on mass-spectrometry (MS) microarray data. Results: Three MS two datasets are studied. Clear differences observed in networks samples. Protein/gene features examined three at one...
Interest in information extraction from the biomedical literature is motivated by need to speed up creation of structured databases representing latest scientific knowledge about specific objects, such as proteins and genes. This paper addresses issue a lack standard definition problem protein name tagging. We describe lessons learned developing set guidelines present first inter-coder results, viewed an upper bound on system performance. Problems coders face include: (a) ambiguity names...
Abstract Motivation: Attribute selection is a critical step in development of document classification systems. As standard practice, words are stemmed and the most informative ones used as attributes classification. Owing to high complexity biomedical terminology, general-purpose stemming algorithms often conservative could also remove stems. This can lead accuracy reduction, especially when number labeled documents small. To address this issue, we propose an algorithm that omits and,...
Observing that many biomedical databases have been developed and maintained independently, their records referring to the same entities may different sets of synonyms. Integration names pertaining entity would provide a more comprehensive list synonyms than each individual database. We assembled BioThesaurus, thesaurus proteins corresponding genes compiled from multiple for all UniProtKB records. In this study, coverage contribution database were assessed several organisms. The result...
Embedding techniques have become essential components of large databases in the deep learning era. By encoding discrete entities, such as words, items, or graph nodes, into continuous vector spaces, embeddings facilitate more efficient storage, retrieval, and processing databases. Especially domain recommender systems, millions categorical features are encoded unique embedding vectors, which facilitates modeling similarities interactions among features. However, numerous vectors can result...
Due to the heightened concern about bioterrorism and emerging/reemerging infectious diseases, a flood of molecular data human pathogens has been generated maintained in disparate databases. However, scientific findings regarding these their host responses are buried growing volume biomedical literature there is an urgent need mine information pertaining pathogenesis-related proteins especially host-pathogen protein-protein interactions from literature. In this paper, we report our...
With more and research dedicated to literature mining in the biomedical domain, systems are available for people choose from build applications. In this study, we focus on one specific kind of task, i.e., detecting definitions acronyms/abbreviations/symbols text. The study was designed answer following questions; i) how well a system performs when provided with large set documents recently published ii) what coverage is various knowledge bases including as synonyms their definitions, iii)...