Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention

Pinyin
DOI: 10.1007/s40747-024-01753-0 Publication Date: 2025-01-04T09:33:36Z
ABSTRACT
Named Entity Recognition (NER) aims to identify entities with specific meanings and their boundaries in natural language texts. Due the differences between Chinese English families, NER faces challenges such as ambiguous word boundary delineation semantic diversity. Previous studies on have focused character lexical information, neglecting unique feature of Chinese—pinyin information. In this paper, we propose CPL-NER, which combines multiple information characters embedding enhance representation by introducing pinyin dictionary For named entity recognition, helps resolve polyphonic phenomenon, while aids addressing segmentation ambiguities. Additionally, innovatively designed Pinyin-Lexicon Cross-Attention Mechanism (PLCA), calculates attention scores various embeddings. This mechanism deeply integrates character, pinyin, lexicon embeddings, generating sequences enriched Finally, BiLSTM-CRF is employed for sequence modeling. Through design, can more comprehensively capture features text, improving model's ability handle ambiguities, thereby enhancing recognition performance entities. We conducted experiments four standard benchmark datasets, results show that our method outperforms most baselines, demonstrating effectiveness proposed model.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (37)
CITATIONS (0)