NFDI4DS | UHH-SEMS - Publication Details

Contextual Value Alignment

OPENALEX - Publications

Pierre Dognin Jesus Rios Ronny Luss Prasanna Sattigeri Miao Liu and 5 more

10.1109/icassp49660.2025.10890426 article CA ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

OPENALEX - Publications

Swapnaja Achintalwar Adriana Alvarado Garcia Ateret Anaby-Tavor Ioana Baldini Sara Berger and 30 more

Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output biased and toxic generations. Due several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible impose direct safety constraints on deployed model. Therefore, an efficient reliable alternative is required. To this end, we present our ongoing efforts create deploy library detectors: compact easy-to-build classification that provide labels...

10.48550/arxiv.2403.06009 preprint EN arXiv (Cornell University) 2024-03-09

SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in Generative Language Models

OPENALEX - Publications

Manish Nagireddy Lamogha Chiazor Moninder Singh Ioana Baldini

Current datasets for unwanted social bias auditing are limited to studying protected demographic features such as race and gender. In this work, we introduce a comprehensive benchmark that is meant capture the amplification of bias, via stigmas, in generative language models. Taking inspiration from science research, start with documented list 93 US-centric stigmas curate question-answering (QA) dataset which involves simple situations. Our benchmark, SocialStigmaQA, contains roughly 10K...

10.1609/aaai.v38i19.30142 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Language Models in Dialogue: Conversational Maxims for Human-AI Interactions

OPENALEX - Publications

Erik Miehling Manish Nagireddy Prasanna Sattigeri Elizabeth Daly David Piorkowski and 1 more

10.18653/v1/2024.findings-emnlp.843 article EN 2024-01-01

Contextual Moral Value Alignment Through Context-Based Aggregation

OPENALEX - Publications

Pierre Dognin Jesus Rios Ronny Luss Inkit Padhi Matthew Riemer and 5 more

Developing value-aligned AI agents is a complex undertaking and an ongoing challenge in the field of AI. Specifically within domain Large Language Models (LLMs), capability to consolidate multiple independently trained dialogue agents, each aligned with distinct moral value, into unified system that can adapt be values paramount importance. In this paper, we propose does contextual value alignment based on aggregation. Here, aggregation defined as process integrating subset LLM responses are...

10.48550/arxiv.2403.12805 preprint EN arXiv (Cornell University) 2024-03-19

A Sandbox Tool to Bias(Stress)-Test Fairness Algorithms

OPENALEX - Publications

Nil-Jana Akpinar Manish Nagireddy Logan Stapleton Hao-Fei Cheng Haiyi Zhu and 2 more

Motivated by the growing importance of reducing unfairness in ML predictions, Fair-ML researchers have presented an extensive suite algorithmic 'fairness-enhancing' remedies. Most existing algorithms, however, are agnostic to sources observed unfairness. As a result, literature currently lacks guiding frameworks specify conditions under which each intervention can potentially alleviate underpinning cause To close this gap, we scrutinize underlying biases (e.g., training data or design...

10.48550/arxiv.2204.10233 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

OPENALEX - Publications

Swapnaja Achintalwar Ioana Baldini Djallel Bouneffouf Joan Byamugisha Maria Chang and 14 more

The alignment of large language models is usually done by model providers to add or control behaviors that are common universally understood across use cases and contexts. In contrast, in this article, we present an approach architecture empowers application developers tune a their particular values, social norms, laws other regulations, orchestrate between potentially conflicting requirements context. We lay out three main components such Alignment Studio architecture: Framers, Instructors,...

10.48550/arxiv.2403.09704 preprint EN arXiv (Cornell University) 2024-03-08

Multi-Level Explanations for Generative Language Models

OPENALEX - Publications

Lucas Monteiro Paes Dennis Wei Hyo Jin Do Hendrik Strobelt Ronny Luss and 6 more

Perturbation-based explanation methods such as LIME and SHAP are commonly applied to text classification. This work focuses on their extension generative language models. To address the challenges of output long inputs, we propose a general framework called MExGen that can be instantiated with different attribution algorithms. handle output, introduce notion scalarizers for mapping real numbers investigate multiple possibilities. take multi-level approach, proceeding from coarser levels...

10.48550/arxiv.2403.14459 preprint EN arXiv (Cornell University) 2024-03-21

Language Models in Dialogue: Conversational Maxims for Human-AI Interactions

OPENALEX - Publications

Erik Miehling Manish Nagireddy Prasanna Sattigeri Elizabeth Daly David Piorkowski and 1 more

Modern language models, while sophisticated, exhibit some inherent shortcomings, particularly in conversational settings. We claim that many of the observed shortcomings can be attributed to violation one or more principles. By drawing upon extensive research from both social science and AI communities, we propose a set maxims -- quantity, quality, relevance, manner, benevolence, transparency for describing effective human-AI conversation. first justify applicability four (from Grice)...

10.48550/arxiv.2403.15115 preprint EN arXiv (Cornell University) 2024-03-22

The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers

OPENALEX - Publications

Hussein Mozannar Valerie Chen Mohammed Alsobay Subhro Das Sebastian Zhao and 5 more

Evaluation of large language models (LLMs) for code has primarily relied on static benchmarks, including HumanEval (Chen et al., 2021), which measure the ability LLMs to generate complete that passes unit tests. As are increasingly used as programmer assistants, we study whether gains existing benchmarks translate in productivity when coding with LLMs, time spent coding. In addition investigate utility preference metrics might be proxies LLM helpfulness, such acceptance or copy rates. To do...

10.48550/arxiv.2404.02806 preprint EN arXiv (Cornell University) 2024-04-03

When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails

OPENALEX - Publications

Manish Nagireddy Inkit Padhi Soumya K. Ghosh Prasanna Sattigeri

Large language models (LLMs) have convincing performance in a variety of downstream tasks. However, these systems are prone to generating undesirable outputs such as harmful and biased text. In order remedy generations, the development guardrail (or detector) has gained traction. Motivated by findings from developing detector for social bias, we adopt notion use-mention distinction - which identified primary source under-performance preliminary versions our bias detector. Armed with this...

10.48550/arxiv.2407.06323 preprint EN arXiv (Cornell University) 2024-07-08

ComVas: Contextual Moral Values Alignment System

OPENALEX - Publications

Inkit Padhi Pierre Dognin Jesus Rios Ronny Luss Swapnaja Achintalwar and 6 more

In contemporary society, the integration of artificial intelligence (AI) systems into various aspects daily life raises significant ethical concerns. One critical aspect is to ensure that AI align with moral values endusers. To end, we introduce Contextual Moral Value Alignment System, ComVas. Unlike traditional which have predefined, ComVas empowers users dynamically select and customize desired thereby guiding system’s decision-making process. Through a user-friendly interface, individuals...

10.24963/ijcai.2024/1026 article EN 2024-07-26

DARE to Diversify: DAta Driven and Diverse LLM REd Teaming

OPENALEX - Publications

Manish Nagireddy Bernat Guillén Pegueroles Ioana Baldini

Large language models (LLMs) have been rapidly adopted, as showcased by ChatGPT's overnight popularity, and are integrated in products used millions of people every day, such search engines productivity suites. Yet the societal impact LLMs, encompassing both benefits harms, is not well understood. Inspired cybersecurity practices, red-teaming emerging a technique to uncover model vulnerabilities. Despite increasing attention from industry, academia, government centered around efforts still...

10.1145/3637528.3671444 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2024-08-24

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

OPENALEX - Publications

Swapnaja Achintalwar Ioana Baldini Djallel Bouneffouf Joan Byamugisha Maria Chang and 14 more

10.1109/mic.2024.3453671 article IEEE Internet Computing 2024-01-01

Value Alignment from Unstructured Text

OPENALEX - Publications

Inkit Padhi Karthikeyan Natesan Ramamurthy Prasanna Sattigeri Manish Nagireddy Pierre Dognin and 1 more

Aligning large language models (LLMs) to value systems has emerged as a significant area of research within the fields AI and NLP. Currently, this alignment process relies on availability high-quality supervised preference data, which can be both time-consuming expensive curate or annotate. In paper, we introduce systematic end-to-end methodology for aligning LLMs implicit explicit values represented in unstructured text data. Our proposed approach leverages use scalable synthetic data...

10.48550/arxiv.2408.10392 preprint EN arXiv (Cornell University) 2024-08-19

Programming Refusal with Conditional Activation Steering

OPENALEX - Publications

Bruce W. Lee Inkit Padhi Karthikeyan Natesan Ramamurthy Erik Miehling Pierre Dognin and 2 more

LLMs have shown remarkable capabilities, but precisely controlling their response behavior remains challenging. Existing activation steering methods alter LLM indiscriminately, limiting practical applicability in settings where selective responses are essential, such as content moderation or domain-specific assistants. In this paper, we propose Conditional Activation Steering (CAST), which analyzes patterns during inference to selectively apply withhold based on the input context. Our method...

10.48550/arxiv.2409.05907 preprint EN arXiv (Cornell University) 2024-09-06

Value Alignment from Unstructured Text

OPENALEX - Publications

Inkit Padhi Karthikeyan Natesan Ramamurthy Prasanna Sattigeri Manish Nagireddy Pierre Dognin and 1 more

10.18653/v1/2024.emnlp-industry.81 article EN 2024-01-01

Granite Guardian

OPENALEX - Publications

Inkit Padhi Manish Nagireddy Giandomenico Cornacchia Subhajit Chaudhury Tejaswini Pedapati and 17 more

We introduce the Granite Guardian models, a suite of safeguards designed to provide risk detection for prompts and responses, enabling safe responsible use in combination with any large language model (LLM). These models offer comprehensive coverage across multiple dimensions, including social bias, profanity, violence, sexual content, unethical behavior, jailbreaking, hallucination-related risks such as context relevance, groundedness, answer relevance retrieval-augmented generation (RAG)....

10.48550/arxiv.2412.07724 preprint EN arXiv (Cornell University) 2024-12-10

Function Composition in Trustworthy Machine Learning: Implementation Choices, Insights, and Questions

OPENALEX - Publications

Manish Nagireddy Moninder Singh Samuel C. Hoffman Evaline Ju Karthikeyan Natesan Ramamurthy and 1 more

Ensuring trustworthiness in machine learning (ML) models is a multi-dimensional task. In addition to the traditional notion of predictive performance, other notions such as privacy, fairness, robustness distribution shift, adversarial robustness, interpretability, explainability, and uncertainty quantification are important considerations evaluate improve (if deficient). However, these sub-disciplines or 'pillars' have largely developed independently, which has limited us from understanding...

10.48550/arxiv.2302.09190 preprint EN cc-by arXiv (Cornell University) 2023-01-01

SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in Generative Language Models

OPENALEX - Publications

Manish Nagireddy Lamogha Chiazor Moninder Singh Ioana Baldini

Current datasets for unwanted social bias auditing are limited to studying protected demographic features such as race and gender. In this work, we introduce a comprehensive benchmark that is meant capture the amplification of bias, via stigmas, in generative language models. Taking inspiration from science research, start with documented list 93 US-centric stigmas curate question-answering (QA) dataset which involves simple situations. Our benchmark, SocialStigmaQA, contains roughly 10K...

10.48550/arxiv.2312.07492 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01