FBK-DH at SemEval-2020 Task 12: Using Multi-channel BERT for Multilingual Offensive Language Detection
SemEval
Offensive
Language identification
Identification
DOI:
10.18653/v1/2020.semeval-1.201
Publication Date:
2021-10-20T02:33:20Z
AUTHORS (4)
ABSTRACT
In this paper we present our submission to sub-task A at SemEval 2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval2). For Danish, Turkish, Arabic and Greek, develop an architecture based on transfer learning relying a two-channel BERT model, which the English multilingual one are combined after creating machine-translated parallel corpus for each language task. English, instead, adopt more standard, single-channel approach. We find that, scenario, with some languages having small training data, using models machine translated data can give systems stability, especially when dealing noisy data. The fact that translation social media may not be perfect does hurt overall classification performance.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (7)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....