- Topic Modeling
- Natural Language Processing Techniques
- Scientific Computing and Data Management
- Explainable Artificial Intelligence (XAI)
- Ethics and Social Impacts of AI
- Semantic Web and Ontologies
- Adversarial Robustness in Machine Learning
- Speech and dialogue systems
- Computational and Text Analysis Methods
- Big Data Technologies and Applications
- Particle Detector Development and Performance
- Hate Speech and Cyberbullying Detection
- Software Reliability and Analysis Research
- Online Learning and Analytics
- Logic, programming, and type systems
- Knowledge Management and Technology
- Distributed systems and fault tolerance
- Transportation Safety and Impact Analysis
- Geotechnical and Geomechanical Engineering
- Psychology of Moral and Emotional Judgment
Cambridge Scientific (United States)
2024
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output biased and toxic generations. Due several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible impose direct safety constraints on deployed model. Therefore, an efficient reliable alternative is required. To this end, we present our ongoing efforts create deploy library detectors: compact easy-to-build classification that provide labels...
Current datasets for unwanted social bias auditing are limited to studying protected demographic features such as race and gender. In this work, we introduce a comprehensive benchmark that is meant capture the amplification of bias, via stigmas, in generative language models. Taking inspiration from science research, start with documented list 93 US-centric stigmas curate question-answering (QA) dataset which involves simple situations. Our benchmark, SocialStigmaQA, contains roughly 10K...
Developing value-aligned AI agents is a complex undertaking and an ongoing challenge in the field of AI. Specifically within domain Large Language Models (LLMs), capability to consolidate multiple independently trained dialogue agents, each aligned with distinct moral value, into unified system that can adapt be values paramount importance. In this paper, we propose does contextual value alignment based on aggregation. Here, aggregation defined as process integrating subset LLM responses are...
Motivated by the growing importance of reducing unfairness in ML predictions, Fair-ML researchers have presented an extensive suite algorithmic 'fairness-enhancing' remedies. Most existing algorithms, however, are agnostic to sources observed unfairness. As a result, literature currently lacks guiding frameworks specify conditions under which each intervention can potentially alleviate underpinning cause To close this gap, we scrutinize underlying biases (e.g., training data or design...
The alignment of large language models is usually done by model providers to add or control behaviors that are common universally understood across use cases and contexts. In contrast, in this article, we present an approach architecture empowers application developers tune a their particular values, social norms, laws other regulations, orchestrate between potentially conflicting requirements context. We lay out three main components such Alignment Studio architecture: Framers, Instructors,...
Perturbation-based explanation methods such as LIME and SHAP are commonly applied to text classification. This work focuses on their extension generative language models. To address the challenges of output long inputs, we propose a general framework called MExGen that can be instantiated with different attribution algorithms. handle output, introduce notion scalarizers for mapping real numbers investigate multiple possibilities. take multi-level approach, proceeding from coarser levels...
Modern language models, while sophisticated, exhibit some inherent shortcomings, particularly in conversational settings. We claim that many of the observed shortcomings can be attributed to violation one or more principles. By drawing upon extensive research from both social science and AI communities, we propose a set maxims -- quantity, quality, relevance, manner, benevolence, transparency for describing effective human-AI conversation. first justify applicability four (from Grice)...
Evaluation of large language models (LLMs) for code has primarily relied on static benchmarks, including HumanEval (Chen et al., 2021), which measure the ability LLMs to generate complete that passes unit tests. As are increasingly used as programmer assistants, we study whether gains existing benchmarks translate in productivity when coding with LLMs, time spent coding. In addition investigate utility preference metrics might be proxies LLM helpfulness, such acceptance or copy rates. To do...
Large language models (LLMs) have convincing performance in a variety of downstream tasks. However, these systems are prone to generating undesirable outputs such as harmful and biased text. In order remedy generations, the development guardrail (or detector) has gained traction. Motivated by findings from developing detector for social bias, we adopt notion use-mention distinction - which identified primary source under-performance preliminary versions our bias detector. Armed with this...
In contemporary society, the integration of artificial intelligence (AI) systems into various aspects daily life raises significant ethical concerns. One critical aspect is to ensure that AI align with moral values endusers. To end, we introduce Contextual Moral Value Alignment System, ComVas. Unlike traditional which have predefined, ComVas empowers users dynamically select and customize desired thereby guiding system’s decision-making process. Through a user-friendly interface, individuals...
Large language models (LLMs) have been rapidly adopted, as showcased by ChatGPT's overnight popularity, and are integrated in products used millions of people every day, such search engines productivity suites. Yet the societal impact LLMs, encompassing both benefits harms, is not well understood. Inspired cybersecurity practices, red-teaming emerging a technique to uncover model vulnerabilities. Despite increasing attention from industry, academia, government centered around efforts still...
Aligning large language models (LLMs) to value systems has emerged as a significant area of research within the fields AI and NLP. Currently, this alignment process relies on availability high-quality supervised preference data, which can be both time-consuming expensive curate or annotate. In paper, we introduce systematic end-to-end methodology for aligning LLMs implicit explicit values represented in unstructured text data. Our proposed approach leverages use scalable synthetic data...
LLMs have shown remarkable capabilities, but precisely controlling their response behavior remains challenging. Existing activation steering methods alter LLM indiscriminately, limiting practical applicability in settings where selective responses are essential, such as content moderation or domain-specific assistants. In this paper, we propose Conditional Activation Steering (CAST), which analyzes patterns during inference to selectively apply withhold based on the input context. Our method...
We introduce the Granite Guardian models, a suite of safeguards designed to provide risk detection for prompts and responses, enabling safe responsible use in combination with any large language model (LLM). These models offer comprehensive coverage across multiple dimensions, including social bias, profanity, violence, sexual content, unethical behavior, jailbreaking, hallucination-related risks such as context relevance, groundedness, answer relevance retrieval-augmented generation (RAG)....
Ensuring trustworthiness in machine learning (ML) models is a multi-dimensional task. In addition to the traditional notion of predictive performance, other notions such as privacy, fairness, robustness distribution shift, adversarial robustness, interpretability, explainability, and uncertainty quantification are important considerations evaluate improve (if deficient). However, these sub-disciplines or 'pillars' have largely developed independently, which has limited us from understanding...
Current datasets for unwanted social bias auditing are limited to studying protected demographic features such as race and gender. In this work, we introduce a comprehensive benchmark that is meant capture the amplification of bias, via stigmas, in generative language models. Taking inspiration from science research, start with documented list 93 US-centric stigmas curate question-answering (QA) dataset which involves simple situations. Our benchmark, SocialStigmaQA, contains roughly 10K...