A simple named entity extractor using AdaBoost
0202 electrical engineering, electronic engineering, information engineering
02 engineering and technology
DOI:
10.3115/1119176.1119197
Publication Date:
2007-05-11T11:01:26Z
AUTHORS (3)
ABSTRACT
This paper presents a Named Entity Extraction (NEE) system for the CoNLL-2003 shared task competition. As in the past year edition (Carreras et al., 2002a), we have approached the task by treating the two main sub–tasks of the problem, recognition (NER) and classification (NEC), sequentially and independently with separate modules. Both modules are machine learning based systems, which make use of binary and multiclass AdaBoost classifiers. Named Entity recognition is performed as a greedy sequence tagging procedure under the well–known BIO labelling scheme. This tagging process makes use of three binary classifiers trained to be experts on the recognition of B, I, and O labels, respectively. Named Entity classification is viewed as a 4–class classification problem (with LOC, PER, ORG, and MISC class labels), which is straightforwardly addressed by the use of a multiclass learning algorithm. The system presented here consists of a replication, with some minor changes, of the system that obtained the best results in the CoNLL-2002 NEE task. Therefore, it can be considered as a benchmark of the state–of–the– art technology for the current edition, and will allow also to make comparisons about the training corpora of both editions.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (31)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....