Code-mixing unveiled: Enhancing the hate speech detection in Arabic dialect tweets using machine learning models

Code-mixing Merge (version control) Sentiment Analysis
DOI: 10.1371/journal.pone.0305657 Publication Date: 2024-07-17T17:32:42Z
ABSTRACT
Technological developments over the past few decades have changed way people communicate, with platforms like social media and blogs becoming vital channels for international conversation. Even though hate speech is vigorously suppressed on media, it still a concern that needs to be constantly recognized observed. The Arabic language poses particular difficulties in detection of speech, despite considerable efforts made this area English-language content. calls consideration when comes because its many dialects linguistic nuances. Another degree complication added by widespread practice "code-mixing," which users merge various languages smoothly. Recognizing research vacuum, study aims close examining how well machine learning models containing variation features can detect especially tweets featuring code-mixing. Therefore, objective assess compare effectiveness different code-mixing datasets. To achieve objectives, methodology used includes data collection, pre-processing, feature extraction, construction classification models, evaluation constructed models. findings from analysis revealed TF-IDF feature, employed SGD model, attained highest accuracy, reaching 98.21%. Subsequently, these results were contrasted outcomes three existing studies, proposed method outperformed them, underscoring significance method. Consequently, our carries practical implications serves as foundational exploration realm automated text.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (65)
CITATIONS (0)