BC4GO: a full-text corpus for the BioCreative IV GO task
0301 basic medicine
03 medical and health sciences
Vocabulary, Controlled
Databases, Genetic
Computational Biology
Data Mining
Humans
028
Original Article
Molecular Sequence Annotation
Software
DOI:
10.1093/database/bau074
Publication Date:
2014-07-29T03:29:26Z
AUTHORS (15)
ABSTRACT
Gene function curation via Ontology (GO) annotation is a common task among Model Organism Database groups. Owing to its manual nature, this considered one of the bottlenecks in literature curation. There have been many previous attempts at automatic identification GO terms and supporting information from full text. However, few systems delivered an accuracy that comparable with humans. One recognized challenge developing such lack marked sentence-level evidence text provides basis for making annotations. We aim create corpus includes along three core elements annotations: (i) gene or product, (ii) term (iii) code. To ensure our results are consistent real-life data, we recruited eight professional curators asked them follow their routine protocols. Our annotators up more than 5000 passages 200 articles 1356 distinct terms. For sentence selection, inter-annotator agreement (IAA) 9.3% (strict) 42.7% (relaxed) F 1 -measures. IAAs 47% 62.9% (hierarchical). analysis further shows abstracts contain ∼10% relevant sentences 30% terms, while Results/Experiment section has nearly 60% >70% Further, those found abstracts, less one-third enough experimental detail fulfill criteria annotation. This result demonstrates need using full-text mining Through use BioCreative IV (BC4GO) task, expect become valuable resource BioNLP research community. URL:http://www.biocreative.org/resources/corpora/bc-iv-go-task-corpus/ .
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (42)
CITATIONS (35)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....