BNU-HKBU UIC NLP Team 2 at SemEval-2019 Task 6: Detecting Offensive Language Using BERT model

Offensive SemEval Language identification Identification
DOI: 10.18653/v1/s19-2099 Publication Date: 2019-07-21T17:29:51Z
ABSTRACT
In this study we deal with the problem of identifying and categorizing offensive language in social media. Our group, BNU-HKBU UIC NLP Team2, use supervised classification along multiple version data generated by different ways pre-processing data. We then state-of-the-art model Bidirectional Encoder Representations from Transformers, or BERT (Devlin et al, 2018), to capture linguistic, syntactic semantic features. Long range dependencies between each part a sentence can be captured BERT’s bidirectional encoder representations. results show 85.12% accuracy 80.57% F1 scores Subtask A (offensive identification), 87.92% 50% B (categorization offense types), 69.95% 50.47% score C (offense target identification). Analysis shows that distinguishing targeted untargeted is not simple task. More work needs done on unbalance Subtasks C. Some future also discussed.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (5)