Toxicity Classification in Ukrainian
Ukrainian
DOI:
10.48550/arxiv.2404.17841
Publication Date:
2024-04-27
AUTHORS (4)
ABSTRACT
The task of toxicity detection is still a relevant task, especially in the context safe and fair LMs development. Nevertheless, labeled binary classification corpora are not available for all languages, which understandable given resource-intensive nature annotation process. Ukrainian, particular, among languages lacking such resources. To our knowledge, there has been no existing corpus Ukrainian. In this study, we aim to fill gap by investigating cross-lingual knowledge transfer techniques creating by: (i)~translating from an English corpus, (ii)~filtering toxic samples using keywords, (iii)~annotating with crowdsourcing. We compare LLMs prompting other approaches without fine-tuning offering insights into most robust efficient baselines.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....