Dylan Hadfield-Menell

ORCID: 0000-0002-6168-4763
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Reinforcement Learning in Robotics
  • Adversarial Robustness in Machine Learning
  • Auction Theory and Applications
  • Robot Manipulation and Learning
  • Ethics and Social Impacts of AI
  • Advanced Bandit Algorithms Research
  • Explainable Artificial Intelligence (XAI)
  • Experimental Behavioral Economics Studies
  • AI-based Problem Solving and Planning
  • Machine Learning and Algorithms
  • Topic Modeling
  • Game Theory and Applications
  • Anomaly Detection Techniques and Applications
  • Decision-Making and Behavioral Economics
  • Robotic Path Planning Algorithms
  • Law, Economics, and Judicial Systems
  • Computability, Logic, AI Algorithms
  • Evolutionary Game Theory and Cooperation
  • Complex Systems and Decision Making
  • Domain Adaptation and Few-Shot Learning
  • Data Stream Mining Techniques
  • Multi-Agent Systems and Negotiation
  • Artificial Intelligence in Healthcare and Education
  • Recommender Systems and Techniques
  • Blood donation and transfusion practices

Massachusetts Institute of Technology
2013-2024

Vassar College
2024

University of California, Berkeley
2014-2022

IIT@MIT
2022

Moscow Institute of Thermal Technology
2021

Berkeley College
2019

University of New Mexico
2015

Mind Research Network
2015

Intel (United States)
2013

For an autonomous system to be helpful humans and pose no unwarranted risks, it needs align its values with those of the in environment such a way that actions contribute maximization value for humans. We propose formal definition alignment problem as cooperative inverse reinforcement learning (CIRL). A CIRL is cooperative, partial-information game two agents, human robot; both are rewarded according human's reward function, but robot does not initially know what this is. In contrast...

10.48550/arxiv.1606.03137 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with goals. RLHF has emerged as the central method used finetune state-of-the-art large language models (LLMs). Despite this popularity, there been relatively little public work systematizing its flaws. In paper, we (1) survey open problems and fundamental limitations of related methods; (2) overview techniques understand, improve, complement in practice; (3) propose auditing disclosure...

10.48550/arxiv.2307.15217 preprint EN cc-by arXiv (Cornell University) 2023-01-01

The last decade of machine learning has seen drastic increases in scale and capabilities. Deep neural networks (DNNs) are increasingly being deployed the real world. However, they difficult to analyze, raising concerns about using them without a rigorous understanding how function. Effective tools for interpreting will be important building more trustworthy AI by helping identify problems, fix bugs, improve basic understanding. In particular, "inner" interpretability techniques, which focus...

10.1109/satml54575.2023.00039 article EN 2023-02-01

Recommender systems are the algorithms which select, filter, and personalize content across many of world's largest platforms apps. As such, their positive negative effects on individuals societies have been extensively theorized studied. Our overarching question is how to ensure that recommender enact values they serve. Addressing this in a principled fashion requires technical knowledge design operation, also critically depends insights from diverse fields including social science, ethics,...

10.1145/3632297 article EN ACM Transactions on Recommender Systems 2023-11-13

External audits of AI systems are increasingly recognized as a key mechanism for governance. The effectiveness an audit, however, depends on the degree access granted to auditors. Recent state-of-the-art have primarily relied black-box access, in which auditors can only query system and observe its outputs. However, white-box system's inner workings (e.g., weights, activations, gradients) allows auditor perform stronger attacks, more thoroughly interpret models, conduct fine-tuning....

10.1145/3630106.3659037 article EN cc-by 2022 ACM Conference on Fairness, Accountability, and Transparency 2024-06-03

Tasks in mobile manipulation planning often require thousands of individual motions to complete. Such tasks reasoning about complex goals as well the feasibility movements configuration space. In discrete representations, complexity is exponential length plan. manipulation, parameters for an action draw from a continuous space, so we must also cope with infinite branching factor. Task and motion (TAMP) methods integrate logical search over high-level actions geometric address this challenge....

10.1109/icra.2016.7487165 article EN 2016-05-01

How do societies learn and maintain social norms? Here we use multiagent reinforcement learning to investigate the dynamics of enforcement compliance behaviors. Artificial agents populate a foraging environment need avoid poisonous berry. Agents eating berries better when doing so is taboo, meaning behavior punished by other agents. The taboo helps overcome credit assignment problem in discovering delayed health effects. Critically, introducing an additional which results punishment for...

10.1073/pnas.2106028118 article EN cc-by-nc-nd Proceedings of the National Academy of Sciences 2022-01-12

Adversarial examples are a pervasive phenomenon of machine learning models where seemingly imperceptible perturbations to the input lead misclassifications for otherwise statistically accurate models. We propose geometric framework, drawing on tools from manifold reconstruction literature, analyze high-dimensional geometry adversarial examples. In particular, we highlight importance codimension: low-dimensional data manifolds embedded in space there many directions off which construct...

10.48550/arxiv.1811.00525 preprint EN other-oa arXiv (Cornell University) 2018-01-01

It is clear that one of the primary tools we can use to mitigate potential risk from a misbehaving AI system ability turn off. As capabilities systems improve, it important ensure such do not adopt subgoals prevent human switching This challenge because many formulations rational agents create strong incentives for self-preservation. caused by built-in instinct, but agent will maximize expected utility and cannot achieve whatever objective has been given if dead. Our goal study an allow...

10.24963/ijcai.2017/32 article EN 2017-07-28

All natural languages are structured hierarchically. In humans, this structural restriction is neurologically coded: when two grammars presented with identical vocabularies, brain areas responsible for language processing only sensitive to hierarchical grammars. Using large models (LLMs), we investigate whether such functionally distinct regions can arise solely from exposure large-scale distributions. We generate inputs using English, Italian, Japanese, or nonce words, varying the...

10.48550/arxiv.2501.08618 preprint EN arXiv (Cornell University) 2025-01-15

Leading AI developers and startups are increasingly deploying agentic systems that can plan execute complex tasks with limited human involvement. However, there is currently no structured framework for documenting the technical components, intended uses, safety features of systems. To fill this gap, we introduce Agent Index, first public database to document information about deployed For each system meets criteria inclusion in index, system's components (e.g., base model, reasoning...

10.48550/arxiv.2502.01635 preprint EN arXiv (Cornell University) 2025-02-03

Evaluations of large language model (LLM) risks and capabilities are increasingly being incorporated into AI risk management governance frameworks. Currently, most evaluations conducted by designing inputs that elicit harmful behaviors from the system. However, a fundamental limitation this approach is harmfulness identified during any particular evaluation can only lower bound model's worst-possible-case behavior. As complementary method for eliciting behaviors, we propose evaluating LLMs...

10.48550/arxiv.2502.05209 preprint EN arXiv (Cornell University) 2025-02-03

Nations across the world are working to govern AI. However, from a technical perspective, there is uncertainty and disagreement on best way do this. Meanwhile, recent debates over AI regulation have led calls for "evidence-based policy" which emphasize holding regulatory action high evidentiary standard. Evidence of irreplaceable value policymaking. too an standard can lead systematic neglect certain risks. In historical policy (e.g., tobacco ca. 1965 fossil fuels 1985) rhetoric also...

10.48550/arxiv.2502.09618 preprint EN arXiv (Cornell University) 2025-02-13

Fundamental to robotics is the debate between model-based and model-free learning: should robot build an explicit model of world, or learn a policy directly? In context HRI, part world be modeled human. One option for treat human as black box how they act directly. But it can also agent, rely on "theory mind" guide bias learning (grey box). We contribute characterization performance these methods under optimistic case having ideal theory mind, well different scenarios in which assumptions...

10.1109/hri.2019.8673256 article EN 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI) 2019-03-01

The execution of long-horizon tasks under uncertainty is a fundamental challenge in robotics. Recent approaches have made headway on these with an integration task and motion planning. In this paper, we present Interfaced Belief Space Planning (IBSP): modular approach to planning belief space. We use task-independent interface layer combine off-the-shelf classical planner inference. determinize the problem maximum likelihood observation assumption obtain deterministic representation where...

10.1109/iros.2015.7354079 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2015-09-01

Our goal is to enable robots time their motion in a way that purposefully expressive of internal states, making them more transparent people. We start by investigating what types states timing capable expressing, focusing on robot manipulation and keeping the path constant while systematically varying timing. find users naturally pick up certain properties (like confidence), naturalness), or task weight object carrying). then conduct hypothesis-driven experiment tease out directions...

10.1145/2909824.3020221 preprint EN 2017-03-01

We suggest that the analysis of incomplete contracting developed by law and economics researchers can provide a useful framework for understanding AI alignment problem help to generate systematic approach finding solutions. first an overview literature explore parallels between this work alignment. As we emphasize, misalignment principal agent is core focus economic analysis. highlight some technical results from on contracts may insights researchers. Our contribution, however, bring bear...

10.1145/3306618.3314250 article EN 2019-01-27

As governments and industry turn to increased use of automated decision systems, it becomes essential consider how closely such systems can reproduce human judgment. We identify a core potential failure, finding that annotators label objects differently depending on whether they are being asked factual question or normative question. This challenges natural assumption maintained in many standard machine-learning (ML) data acquisition procedures: there is no difference between predicting the...

10.1126/sciadv.abq0701 article EN cc-by-nc Science Advances 2023-05-10

Recent work [1], [2] has shown promising results in enabling robotic manipulation of deformable objects through learning from demonstrations. Their method computes a registration training scene to test scene, and then applies an extrapolation this the gripper motion obtain for scene. The warping cost scene-to-scene registrations is used determine nearest neighbor set Then once been generalized situation, they apply trajectory optimization [3] plan robot motions that will track predicted...

10.1109/iros.2014.6943185 article EN 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems 2014-09-01

Reward functions are easy to misspecify; although designers can make corrections after observing mistakes, an agent pursuing a misspecified reward function irreversibly change the state of its environment. If that precludes optimization correctly specified function, then correction is futile. For example, robotic factory assistant could break expensive equipment due misspecification; even if immediately correct damage done. To mitigate this risk, we introduce approach balances primary with...

10.1145/3375627.3375851 preprint EN 2020-02-05

Learning preferences implicit in the choices humans make is a well studied problem both economics and computer science. However, most work makes assumption that are acting (noisily) optimally with respect to their preferences. Such approaches can fail when people themselves learning about what they want. In this work, we introduce assistive multi-armed bandit, where robot assists human playing bandit task maximize cumulative reward. problem, does not know reward function but learn it through...

10.1109/hri.2019.8673234 article EN 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI) 2019-03-01
Coming Soon ...