Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation
Utterance
Open domain
Chatbot
DOI:
10.48550/arxiv.2309.09749
Publication Date:
2023-01-01
AUTHORS (5)
ABSTRACT
NSFW (Not Safe for Work) content, in the context of a dialogue, can have severe side effects on users open-domain dialogue systems. However, research detecting language, especially sexually explicit within has significantly lagged behind. To address this issue, we introduce CensorChat, monitoring dataset aimed at detection. Leveraging knowledge distillation techniques involving GPT-4 and ChatGPT, offers cost-effective means constructing content detectors. The process entails collecting real-life human-machine interaction data breaking it down into single utterances single-turn dialogues, with chatbot delivering final utterance. ChatGPT is employed to annotate unlabeled data, serving as training set. Rationale validation test sets are constructed using annotators, self-criticism strategy resolving discrepancies labeling. A BERT model fine-tuned text classifier pseudo-labeled its performance assessed. study emphasizes importance AI systems prioritizing user safety well-being digital conversations while respecting freedom expression. proposed approach not only advances detection but also aligns evolving protection needs AI-driven dialogues.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....