NFDI4DS | UHH-SEMS - Publication Details

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

OPENALEX - Publications

Stephen T. Casper Xander Davies Claudia Shi Thomas Krendl Gilbert Jérémy Scheurer and 27 more

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with goals. RLHF has emerged as the central method used finetune state-of-the-art large language models (LLMs). Despite this popularity, there been relatively little public work systematizing its flaws. In paper, we (1) survey open problems and fundamental limitations of related methods; (2) overview techniques understand, improve, complement in practice; (3) propose auditing disclosure...

10.48550/arxiv.2307.15217 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Aligning Human and Robot Representations

OPENALEX - Publications

Andreea Bobu Andi Peng Pulkit Agrawal Julie Shah Anca D. Dragan

To act in the world, robots rely on a representation of salient task aspects: for example, to carry coffee mug, robot may consider movement efficiency or mug orientation its behavior. However, if we want and with people, their representations must not be just functional but also reflective what humans care about, i.e. they aligned. We observe that current learning approaches suffer from misalignment, where robot's learned does capture human's representation. suggest because are ultimate...

10.1145/3610977.3634987 article EN cc-by 2024-03-10

Make greenhouse-gas accounting reliable — build interoperable systems

OPENALEX - Publications

Amy Luers Leehi Yona Christopher B. Field Robert B. Jackson Katharine J. Mach and 11 more

10.1038/d41586-022-02033-y article EN Nature 2022-07-26

Getting aligned on representational alignment

OPENALEX - Publications

Ilia Sucholutsky Lukas Muttenthaler Adrian Weller Andi Peng Andreea Bobu and 25 more

Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, make decisions. How we measure the extent which formed by these diverse agree? Do similarities in then translate into similar behavior? a system's be modified better match those of another system? These questions pertaining study representational alignment are at heart some most active research areas cognitive science, neuroscience, machine learning. For...

10.48550/arxiv.2310.13018 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Investigations of Performance and Bias in Human-AI Teamwork in Hiring

OPENALEX - Publications

Andi Peng Besmira Nushi Emre Kıcıman Kori Inkpen Ece Kamar

In AI-assisted decision-making, effective hybrid (human-AI) teamwork is not solely dependent on AI performance alone, but also its impact human decision-making. While prior work studies the effects of model accuracy humans, we endeavour here to investigate complex dynamics how both a model's predictive and bias may transfer humans in recommendation-aided decision task. We consider domain ML-assisted hiring, where humans---operating constrained selection setting---can choose whether they wish...

10.1609/aaai.v36i11.21468 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Preference-Conditioned Language-Guided Abstraction

OPENALEX - Publications

Andi Peng Andreea Bobu Belinda Z. Li Theodore R. Sumers Ilia Sucholutsky and 3 more

Learning from demonstrations is a common way for users to teach robots, but it prone spurious feature correlations. Recent work constructs state abstractions, i.e. visual representations containing task-relevant features, language as perform more generalizable learning. However, these abstractions also depend on user's preference what matters in task, which may be hard describe or infeasible exhaustively specify using alone. How do we construct capture latent preferences? We observe that how...

10.1145/3610977.3634930 article EN cc-by 2024-03-10

What You See Is What You Get? The Impact of Representation Criteria on Human Bias in Hiring

OPENALEX - Publications

Andi Peng Besmira Nushi Emre Kıcıman Kori Inkpen Siddharth Suri and 1 more

Although systematic biases in decision-making are widely documented, the ways which they emerge from different sources is less understood. We present a controlled experimental platform to study gender bias hiring by decoupling effect of world distribution (the breakdown candidates specific profession) human decision-making. explore effectiveness representation criteria, fixed proportional display candidates, as an intervention strategy for mitigation conducting experiments measuring...

10.1609/hcomp.v7i1.5281 article EN Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 2019-10-28

Human-Machine Collaboration for Fast Land Cover Mapping

OPENALEX - Publications

Caleb Robinson Anthony Ortiz Kolya Malkin Blake Elias Andi Peng and 3 more

We propose incorporating human labelers in a model fine-tuning system that provides immediate user feedback. In our framework, can interactively query predictions on unlabeled data, choose which data to label, and see the resulting effect model's predictions. This bi-directional feedback loop allows humans learn how responds new data. implement this framework for high-resolution land cover segmentation models compare human-selected points selected using standard active learning methods....

10.1609/aaai.v34i03.5633 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Aligning Robot and Human Representations

OPENALEX - Publications

Andreea Bobu Andi Peng Pulkit Agrawal Julie Shah Anca D. Dragan

To act in the world, robots rely on a representation of salient task aspects: for example, to carry coffee mug, robot may consider movement efficiency or mug orientation its behavior. However, if we want and with people, their representations must not be just functional but also reflective what humans care about, i.e. they aligned. We observe that current learning approaches suffer from misalignment, where robot's learned does capture human's representation. suggest because are ultimate...

10.48550/arxiv.2302.01928 preprint EN other-oa arXiv (Cornell University) 2023-01-01

What You See Is What You Get? The Impact of Representation Criteria on Human Bias in Hiring

OPENALEX - Publications

Andi Peng Besmira Nushi Emre Kıcıman Kori Inkpen Siddharth Suri and 1 more

Although systematic biases in decision-making are widely documented, the ways which they emerge from different sources is less understood. We present a controlled experimental platform to study gender bias hiring by decoupling effect of world distribution (the breakdown candidates specific profession) human decision-making. explore effectiveness \textit{representation criteria}, fixed proportional display candidates, as an intervention strategy for mitigation conducting experiments measuring...

10.48550/arxiv.1909.03567 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Preference-Conditioned Language-Guided Abstraction

OPENALEX - Publications

Andi Peng Andreea Bobu Belinda Z. Li Theodore R. Sumers Ilia Sucholutsky and 3 more

Learning from demonstrations is a common way for users to teach robots, but it prone spurious feature correlations. Recent work constructs state abstractions, i.e. visual representations containing task-relevant features, language as perform more generalizable learning. However, these abstractions also depend on user's preference what matters in task, which may be hard describe or infeasible exhaustively specify using alone. How do we construct capture latent preferences? We observe that how...

10.48550/arxiv.2402.03081 preprint EN arXiv (Cornell University) 2024-02-05

Learning with Language-Guided State Abstractions

OPENALEX - Publications

Andi Peng Ilia Sucholutsky Belinda Z. Li Theodore R. Sumers Thomas L. Griffiths and 2 more

We describe a framework for using natural language to design state abstractions imitation learning. Generalizable policy learning in high-dimensional observation spaces is facilitated by well-designed representations, which can surface important features of an environment and hide irrelevant ones. These representations are typically manually specified, or derived from other labor-intensive labeling procedures. Our method, LGA (language-guided abstraction), uses combination supervision...

10.48550/arxiv.2402.18759 preprint EN arXiv (Cornell University) 2024-02-28

Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input

OPENALEX - Publications

Andi Peng Yuying Sun Tianmin Shu David J. Abel

Humans use social context to specify preferences over behaviors, i.e. their reward functions. Yet, algorithms for inferring models from preference data do not take this learning view into account. Inspired by pragmatic human communication, we study how extract fine-grained regarding why an example is preferred that useful more accurate models. We propose enrich binary queries ask both (1) which features of a given are preferable in addition (2) comparisons between examples themselves. derive...

10.48550/arxiv.2405.14769 preprint EN arXiv (Cornell University) 2024-05-23

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

OPENALEX - Publications

Mehul Damani Idan Shenfeld Andi Peng Andreea Bobu Jacob Andreas

Computationally intensive decoding procedures--including search, reranking, and self-critique--can improve the quality of language model (LM) outputs in problems spanning code generation, numerical reasoning, dialog. Existing work typically applies same procedure for every input to an LM. But not all inputs require amount computation process. Can we allocate adaptively, using more resources answer questions whose answers will be harder compute? We present approach that predicts distribution...

10.48550/arxiv.2410.04707 preprint EN arXiv (Cornell University) 2024-10-06

Adaptive Language-Guided Abstraction from Contrastive Explanations

OPENALEX - Publications

Andi Peng Belinda Z. Li Ilia Sucholutsky Nishanth Kumar Julie Shah and 2 more

Many approaches to robot learning begin by inferring a reward function from set of human demonstrations. To learn good reward, it is necessary determine which features the environment are relevant before determining how these should be used compute reward. End-to-end methods for joint feature and (e.g., using deep networks or program synthesis techniques) often yield brittle functions that sensitive spurious state features. By contrast, humans can generalizably small number demonstrations...

10.48550/arxiv.2409.08212 preprint EN arXiv (Cornell University) 2024-09-12

Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge

OPENALEX - Publications

Weihua Du Qiushi Lyu Jiaming Shan Zhenting Qi Hongxin Zhang and 6 more

We introduce Constrained Human-AI Cooperation (CHAIC), an inclusive embodied social intelligence challenge designed to test perception and cooperation in agents. In CHAIC, the goal is for agent equipped with egocentric observations assist a human who may be operating under physical constraints -- e.g., unable reach high places or confined wheelchair performing common household outdoor tasks as efficiently possible. To achieve this, successful helper must: (1) infer human's intents by...

10.48550/arxiv.2411.01796 preprint EN arXiv (Cornell University) 2024-11-03

Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation

OPENALEX - Publications

Andi Peng Aviv Netanyahu Mark K. Ho Tianmin Shu Andreea Bobu and 2 more

Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed new environments. Data augmentation can increase robustness by making model invariant task-irrelevant agent's observation. However, designers don't know which concepts are irrelevant priori, especially different end users have preferences about how task performed. We propose an interactive framework leverage feedback directly from user identify personalized concepts. Our key...

10.48550/arxiv.2307.06333 preprint EN cc-by arXiv (Cornell University) 2023-01-01

The Perils of Objectivity

OPENALEX - Publications

Andi Peng Malina Simard-Halm

Fair decision-making in criminal justice relies on the recognition and incorporation of infinite shades grey. In this paper, we detail how algorithmic risk assessment tools are counteractive to fair legal proceedings social institutions where desired states world contested ethically practically. We provide a normative framework for assessing judicial decision-making, one that does not seek elimination human bias from as fairness efforts currently focus on, but instead centers sophisticating...

10.1145/3375627.3375869 article EN 2020-02-04

Strengthening Subcommunities: Towards Sustainable Growth in AI Research

OPENALEX - Publications

Andi Peng Jessica Zosa Forde Yonadav Shavit Jonathan Frankle

AI's rapid growth has been felt acutely by scholarly venues, leading to growing pains within the peer review process. These challenges largely center on inability of specific subareas identify and evaluate work that is appropriate according criteria relevant each subcommunity as determined stakeholders subarea. We set forth a proposal re-focuses efforts these subcommunities through decentralization reviewing publication Through this re-centering effort, we hope encourage subarea confront...

10.48550/arxiv.2204.08377 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Aligning Robot Representations with Humans

OPENALEX - Publications

Andreea Bobu Andi Peng

As robots are increasingly deployed in real-world scenarios, a key question is how to best transfer knowledge learned one environment another, where shifting constraints and human preferences render adaptation challenging. A central challenge remains that often, it difficult (perhaps even impossible) capture the full complexity of deployment environment, therefore desired tasks, at training time. Consequently, representation, or abstraction, tasks hopes for robot perform may be misaligned...

10.48550/arxiv.2205.07882 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Human-Guided Complexity-Controlled Abstractions

OPENALEX - Publications

Andi Peng Mycal Tucker Eoin M. Kenny Noga Zaslavsky Pulkit Agrawal and 1 more

Neural networks often learn task-specific latent representations that fail to generalize novel settings or tasks. Conversely, humans discrete (i.e., concepts words) at a variety of abstraction levels (e.g., "bird" vs. "sparrow") and deploy the appropriate based on task. Inspired by this, we train neural models generate spectrum representations, control complexity (roughly, how many bits are allocated for encoding inputs) tuning entropy distribution over representations. In finetuning...

10.48550/arxiv.2310.17550 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Human-Machine Collaboration for Fast Land Cover Mapping

OPENALEX - Publications

Caleb Robinson Anthony Ortiz Kolya Malkin Blake Elias Andi Peng and 3 more

We propose incorporating human labelers in a model fine-tuning system that provides immediate user feedback. In our framework, can interactively query predictions on unlabeled data, choose which data to label, and see the resulting effect model's predictions. This bi-directional feedback loop allows humans learn how responds new data. Our hypothesis is this rich create mental models enable them better biases introduce model. compare human-selected points selected using standard active...

10.48550/arxiv.1906.04176 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Investigations of Performance and Bias in Human-AI Teamwork in Hiring

OPENALEX - Publications

Andi Peng Besmira Nushi Emre Kıcıman Kori Inkpen Ece Kamar

In AI-assisted decision-making, effective hybrid (human-AI) teamwork is not solely dependent on AI performance alone, but also its impact human decision-making. While prior work studies the effects of model accuracy humans, we endeavour here to investigate complex dynamics how both a model's predictive and bias may transfer humans in recommendation-aided decision task. We consider domain ML-assisted hiring, where -- operating constrained selection setting can choose whether they wish utilize...

10.48550/arxiv.2202.11812 preprint EN cc-by arXiv (Cornell University) 2022-01-01