A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC

Gold standard (test) Agreement
DOI: 10.1093/jamia/ocv037 Publication Date: 2015-05-07T00:31:51Z
ABSTRACT
Abstract Objective To create a multilingual gold-standard corpus for biomedical concept recognition. Materials and methods We selected text units from different parallel corpora (Medline abstract titles, drug labels, patent claims) in English, French, German, Spanish, Dutch. Three annotators per language independently annotated the concepts, based on subset of Unified Medical Language System covering wide range semantic groups. reduce annotation workload, automatically generated preannotations were provided. Individual annotations harmonized then adjudicated, cross-language consistency checks carried out to arrive at final annotations. Results The number was 5530. Inter-annotator agreement scores indicate good (median F-score 0.79), are similar those between individual gold standard. set each performed equally well as best annotator that language. Discussion use automatic preannotations, annotations, helped keep manual efforts manageable. inter-annotator provide reference standard gauging performance techniques. Conclusion our knowledge, this is first recognition languages other than English. Other distinguishing features variety groups being covered, diversity genres annotated.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (28)
CITATIONS (38)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....