ViHateT5: Enhancing Hate Speech Detection in Vietnamese With A Unified Text-to-Text Transformer Model

Vietnamese
DOI: 10.48550/arxiv.2405.14141 Publication Date: 2024-05-22
ABSTRACT
Recent advancements in hate speech detection (HSD) Vietnamese have made significant progress, primarily attributed to the emergence of transformer-based pre-trained language models, particularly those built on BERT architecture. However, necessity for specialized fine-tuned models has resulted complexity and fragmentation developing a multitasking HSD system. Moreover, most current methodologies focus fine-tuning general trained formal textual datasets like Wikipedia, which may not accurately capture human behavior online platforms. In this research, we introduce ViHateT5, T5-based model our proposed large-scale domain-specific dataset named VOZ-HSD. By harnessing power text-to-text architecture, ViHateT5 can tackle multiple tasks using unified achieve state-of-the-art performance across all standard benchmarks Vietnamese. Our experiments also underscore significance label distribution pre-training data efficacy. We provide experimental materials research purposes, including VOZ-HSD dataset, checkpoint, HSD-multitask model, related source code GitHub publicly.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....