Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention
Pinyin
DOI:
10.1007/s40747-024-01753-0
Publication Date:
2025-01-04T09:33:36Z
AUTHORS (7)
ABSTRACT
Named Entity Recognition (NER) aims to identify entities with specific meanings and their boundaries in natural language texts. Due the differences between Chinese English families, NER faces challenges such as ambiguous word boundary delineation semantic diversity. Previous studies on have focused character lexical information, neglecting unique feature of Chinese—pinyin information. In this paper, we propose CPL-NER, which combines multiple information characters embedding enhance representation by introducing pinyin dictionary For named entity recognition, helps resolve polyphonic phenomenon, while aids addressing segmentation ambiguities. Additionally, innovatively designed Pinyin-Lexicon Cross-Attention Mechanism (PLCA), calculates attention scores various embeddings. This mechanism deeply integrates character, pinyin, lexicon embeddings, generating sequences enriched Finally, BiLSTM-CRF is employed for sequence modeling. Through design, can more comprehensively capture features text, improving model's ability handle ambiguities, thereby enhancing recognition performance entities. We conducted experiments four standard benchmark datasets, results show that our method outperforms most baselines, demonstrating effectiveness proposed model.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (37)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....