Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework

Debiasing Causality
DOI: 10.48550/arxiv.2403.08743 Publication Date: 2024-03-13
ABSTRACT
Large language models (LLMs) can easily generate biased and discriminative responses. As LLMs tap into consequential decision-making (e.g., hiring healthcare), it is of crucial importance to develop strategies mitigate these biases. This paper focuses on social bias, tackling the association between demographic information LLM outputs. We propose a causality-guided debiasing framework that utilizes causal understandings (1) data-generating process training corpus fed LLMs, (2) internal reasoning inference, guide design prompts for outputs through selection mechanisms. Our unifies existing de-biasing prompting approaches such as inhibitive instructions in-context contrastive examples, sheds light new ways by encouraging bias-free reasoning. strong empirical performance real-world datasets demonstrates our provides principled guidelines even with only black-box access.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....