A simple named entity extractor using AdaBoost

0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology
DOI: 10.3115/1119176.1119197 Publication Date: 2007-05-11T11:01:26Z
ABSTRACT
This paper presents a Named Entity Extraction (NEE) system for the CoNLL-2003 shared task competition. As in the past year edition (Carreras et al., 2002a), we have approached the task by treating the two main sub–tasks of the problem, recognition (NER) and classification (NEC), sequentially and independently with separate modules. Both modules are machine learning based systems, which make use of binary and multiclass AdaBoost classifiers. Named Entity recognition is performed as a greedy sequence tagging procedure under the well–known BIO labelling scheme. This tagging process makes use of three binary classifiers trained to be experts on the recognition of B, I, and O labels, respectively. Named Entity classification is viewed as a 4–class classification problem (with LOC, PER, ORG, and MISC class labels), which is straightforwardly addressed by the use of a multiclass learning algorithm. The system presented here consists of a replication, with some minor changes, of the system that obtained the best results in the CoNLL-2002 NEE task. Therefore, it can be considered as a benchmark of the state–of–the– art technology for the current edition, and will allow also to make comparisons about the training corpora of both editions.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (31)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....