MeSH Up: effective MeSH text classification for improved document retrieval
Thesaurus
Vector space model
Controlled vocabulary
DOI:
10.1093/bioinformatics/btp249
Publication Date:
2009-04-18T00:34:24Z
AUTHORS (6)
ABSTRACT
Abstract Motivation: Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and Gene Ontology (GO) provide an efficient way of accessing organizing biomedical information by reducing ambiguity inherent to free-text data. Different methods automating assignment MeSH concepts have been proposed replace manual annotation, but they are either limited a small subset or only compared with number other systems. Results: We compare performance six classification systems [MetaMap, EAGL, language vector space model-based approach, K-Nearest Neighbor (KNN) approach MTI] in terms reproducing complementing annotations. A KNN system clearly outperforms published approaches scales well large amounts text using full thesaurus. Our measurements demonstrate what extent annotations can be reproduced how complemented automatic also show that statistically significant improvement obtained retrieval (IR) when user's query is automatically annotated concepts, original textual alone. Conclusions: The annotation texts controlled automated improve text-only IR. Furthermore, we propose highly scalable it generates improvements IR comparable those observed for Contact: trieschn@ewi.utwente.nl Supplementary information: data available at Bioinformatics online.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (25)
CITATIONS (72)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....