A Transformer based Multi-Task Learning Approach Leveraging Translated and Transliterated Data to Hate Speech Detection in Hindi

Devanagari Telugu Keyword spotting
DOI: 10.5121/csit.2022.121516 Publication Date: 2022-09-19T07:46:03Z
ABSTRACT
The increase in usage of the internet has also led to an unsocial activities, hate speech is one them. Hate over a few years been biggest problems and automated techniques need be developed detect it. This paper aims use eight publicly available Hindi datasets explore different deep neural network aggression, hate, abuse, etc. We experimented on multilingual-bidirectional encoder representations from transformer (M-BERT) multilingual for Indian languages (MuRIL) four settings (i) Single task learning (STL) framework. (ii) Transfering knowledge recurrent (RNN). (iii) Multi-task (MTL) where were jointly trained (iv) pre-training with translated English tweets Devanagari script same scripts transliterated romanized then fine-tuning it MTL fashion. Experimental evaluation shows that cross-lingual information helps improving performance all by significant margin, hence outperforming state-of-the-art approaches terms weightedF1 score. Qualitative quantitative error analysis done show effects proposed approach.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (1)