eGenPub, a text mining system for extending computationally mapped bibliography for UniProt Knowledgebase by capturing centrality

UniProt Bibliography
DOI: 10.1093/database/bax081 Publication Date: 2017-10-11T23:12:55Z
ABSTRACT
UniProt Knowledgebase (UniProtKB) is a publicly available database with access to vast amount of protein sequence and functional information. To widen the scope publications associated entry, has introduced computationally mapped additional bibliography section, which includes literature collected from external sources. In this article, we describe text mining system, eGenPub, selects articles that are 'about' specific proteins allows automatic identification for given entries. Focusing on plant initially, eGenPub utilizes gene normalization tool called pGenN, trained support vector machine model, achieves precision 95.3%, predict whether an based its abstract, should be linked entry. We have conducted full-scale PubMed processing using eight common species. Altogether, 9025 identified as relevant 4752 entries, among 5252 papers not in existing publication section. These newly via being integrated production pipeline, can accessed UniProtKB entry view.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (24)
CITATIONS (6)