Biomedical term extraction: overview and a new methodology
BioNLP
recherche de l'information
Text Mining
02 engineering and technology
Biomedical Terminology Extraction
ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
ACM: H.: Information Systems/H.3: INFORMATION STORAGE AND RETRIEVAL/H.3.3: Information Search and Retrieval
http://aims.fao.org/aos/agrovoc/c_24907
terminologie
méthode statistique
Automatic Term Extraction
0202 electrical engineering, electronic engineering, information engineering
ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing/I.2.7.6: Text analysis
http://aims.fao.org/aos/agrovoc/c_3863
ACM: I.: Computing Methodologies/I.5: PATTERN RECOGNITION/I.5.4: Applications/I.5.4.2: Text processing
Natural Language Processing
U10 - Informatique, mathématiques et statistiques
[INFO.INFO-WB]Computer Science [cs]/Web
006
ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing/I.2.7.1: Language generation
méthodologie
Web Mining
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
C30 - Documentation et information
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
extraction
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
http://aims.fao.org/aos/agrovoc/c_12522
Graphs
http://aims.fao.org/aos/agrovoc/c_7377
http://aims.fao.org/aos/agrovoc/c_36910
DOI:
10.1007/s10791-015-9262-2
Publication Date:
2015-08-24T02:20:40Z
AUTHORS (4)
ABSTRACT
Terminology extraction is an essential task in domain knowledge acquisition, as well as for information retrieval. It is also a mandatory first step aimed at building/enriching terminologies and ontologies. As often proposed in the literature, existing terminology extraction methods feature linguistic and statistical aspects and solve some problems related (but not completely) to term extraction, e.g. noise, silence, low frequency, large-corpora, complexity of the multi-word term extraction process. In contrast, we propose a cutting edge methodology to extract and to rank biomedical terms, covering all the mentioned problems. This methodology offers several measures based on linguistic, statistical, graphic and web aspects. These measures extract and rank candidate terms with excellent precision: we demonstrate that they outperform previously reported precision results for automatic term extraction, and work with different languages (English, French, and Spanish). We also demonstrate how the use of graphs and the web to assess the significance of a term candidate, enables us to outperform precision results. We evaluated our methodology on the biomedical GENIA and LabTestsOnline corpora and compared it with previously reported measures.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (63)
CITATIONS (46)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....