NFDI4DS | UHH-SEMS - Publication Details

Text Detoxification using Large Pre-trained Neural Models

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Computation and Language (cs.CL) 01 natural sciences Machine Learning (cs.LG) 0105 earth and related environmental sciences

DOI: 10.18653/v1/2021.emnlp-main.629 Publication Date: 2021-12-17T03:56:42Z

Abstract Supplemental Material References Cited by

AUTHORS (7)

David Dale

Anton Voronov

Daryna Dementieva

Varvara Logacheva

Olga Kozlova

Nikita Semenov

Alexander Panchenko

ABSTRACT

We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas: (1) guidance of the generation process with small style-conditional language models and (2) use of paraphrasing models to perform style transfer. We use a well-performing paraphraser guided by style-trained language models to keep the text content and remove toxicity. Our second method uses BERT to replace toxic words with their non-offensive synonyms. We make the method more flexible by enabling BERT to replace mask tokens with a variable number of words. Finally, we present the first large-scale comparative study of style transfer models on the task of toxicity removal. We compare our models with a number of methods for style transfer. The models are evaluated in a reference-free way using a combination of unsupervised style transfer metrics. Both methods we suggest yield new SOTA results.<br/>Accepted to the EMNLP 2021 conference<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (11)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

Text Detoxification using Large Pre-trained Neural Models

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....