NFDI4DS | UHH-SEMS - Publication Details

When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails

FOS: Computer and information sciences Computer Science - Computation and Language Computation and Language (cs.CL)

DOI: 10.48550/arxiv.2407.06323 Publication Date: 2024-07-08

Abstract Supplemental Material References Cited by

AUTHORS (4)

Manish Nagireddy

Inkit Padhi

Soumya K. Ghosh

Prasanna Sattigeri

ABSTRACT

Large language models (LLMs) have convincing performance in a variety of downstream tasks. However, these systems are prone to generating undesirable outputs such as harmful and biased text. In order remedy generations, the development guardrail (or detector) has gained traction. Motivated by findings from developing detector for social bias, we adopt notion use-mention distinction - which identified primary source under-performance preliminary versions our bias detector. Armed with this information, describe fully extensible reproducible synthetic data generation pipeline leverages taxonomy-driven instructions create targeted labeled data. Using pipeline, generate over 300K unique contrastive samples provide extensive experiments systematically evaluate on suite open datasets. We show that method achieves competitive fraction cost compute offers insight into iteratively efficient capable models. Warning: This paper contains examples text toxic, biased, potentially harmful.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....