NFDI4DS | UHH-SEMS - Publication Details

Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation

Utterance Open domain Chatbot

DOI: 10.48550/arxiv.2309.09749 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (5)

Huachuan Qiu

Shuai Zhang

Hongliang He

Anqi Li

Zhenzhong Lan

ABSTRACT

NSFW (Not Safe for Work) content, in the context of a dialogue, can have severe side effects on users open-domain dialogue systems. However, research detecting language, especially sexually explicit within has significantly lagged behind. To address this issue, we introduce CensorChat, monitoring dataset aimed at detection. Leveraging knowledge distillation techniques involving GPT-4 and ChatGPT, offers cost-effective means constructing content detectors. The process entails collecting real-life human-machine interaction data breaking it down into single utterances single-turn dialogues, with chatbot delivering final utterance. ChatGPT is employed to annotate unlabeled data, serving as training set. Rationale validation test sets are constructed using annotators, self-criticism strategy resolving discrepancies labeling. A BERT model fine-tuned text classifier pseudo-labeled its performance assessed. study emphasizes importance AI systems prioritizing user safety well-being digital conversations while respecting freedom expression. proposed approach not only advances detection but also aligns evolving protection needs AI-driven dialogues.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Facilitating NSFW Text Detection in Open-Domain Dialogue Systems via Knowledge Distillation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....