Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark
FOS: Computer and information sciences
Computer Science - Computation and Language
0202 electrical engineering, electronic engineering, information engineering
02 engineering and technology
Computation and Language (cs.CL)
01 natural sciences
0105 earth and related environmental sciences
DOI:
10.48550/arxiv.2207.13005
Publication Date:
2023-02-27
AUTHORS (5)
ABSTRACT
Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.<br/>WSDM 2023<br/>
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....