- Reinforcement Learning in Robotics
- Adversarial Robustness in Machine Learning
- Auction Theory and Applications
- Robot Manipulation and Learning
- Ethics and Social Impacts of AI
- Advanced Bandit Algorithms Research
- Explainable Artificial Intelligence (XAI)
- Experimental Behavioral Economics Studies
- AI-based Problem Solving and Planning
- Machine Learning and Algorithms
- Topic Modeling
- Game Theory and Applications
- Anomaly Detection Techniques and Applications
- Decision-Making and Behavioral Economics
- Robotic Path Planning Algorithms
- Law, Economics, and Judicial Systems
- Computability, Logic, AI Algorithms
- Evolutionary Game Theory and Cooperation
- Complex Systems and Decision Making
- Domain Adaptation and Few-Shot Learning
- Data Stream Mining Techniques
- Multi-Agent Systems and Negotiation
- Artificial Intelligence in Healthcare and Education
- Recommender Systems and Techniques
- Blood donation and transfusion practices
Massachusetts Institute of Technology
2013-2024
Vassar College
2024
University of California, Berkeley
2014-2022
IIT@MIT
2022
Moscow Institute of Thermal Technology
2021
Berkeley College
2019
University of New Mexico
2015
Mind Research Network
2015
Intel (United States)
2013
For an autonomous system to be helpful humans and pose no unwarranted risks, it needs align its values with those of the in environment such a way that actions contribute maximization value for humans. We propose formal definition alignment problem as cooperative inverse reinforcement learning (CIRL). A CIRL is cooperative, partial-information game two agents, human robot; both are rewarded according human's reward function, but robot does not initially know what this is. In contrast...
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with goals. RLHF has emerged as the central method used finetune state-of-the-art large language models (LLMs). Despite this popularity, there been relatively little public work systematizing its flaws. In paper, we (1) survey open problems and fundamental limitations of related methods; (2) overview techniques understand, improve, complement in practice; (3) propose auditing disclosure...
The last decade of machine learning has seen drastic increases in scale and capabilities. Deep neural networks (DNNs) are increasingly being deployed the real world. However, they difficult to analyze, raising concerns about using them without a rigorous understanding how function. Effective tools for interpreting will be important building more trustworthy AI by helping identify problems, fix bugs, improve basic understanding. In particular, "inner" interpretability techniques, which focus...
Recommender systems are the algorithms which select, filter, and personalize content across many of world's largest platforms apps. As such, their positive negative effects on individuals societies have been extensively theorized studied. Our overarching question is how to ensure that recommender enact values they serve. Addressing this in a principled fashion requires technical knowledge design operation, also critically depends insights from diverse fields including social science, ethics,...
External audits of AI systems are increasingly recognized as a key mechanism for governance. The effectiveness an audit, however, depends on the degree access granted to auditors. Recent state-of-the-art have primarily relied black-box access, in which auditors can only query system and observe its outputs. However, white-box system's inner workings (e.g., weights, activations, gradients) allows auditor perform stronger attacks, more thoroughly interpret models, conduct fine-tuning....
Tasks in mobile manipulation planning often require thousands of individual motions to complete. Such tasks reasoning about complex goals as well the feasibility movements configuration space. In discrete representations, complexity is exponential length plan. manipulation, parameters for an action draw from a continuous space, so we must also cope with infinite branching factor. Task and motion (TAMP) methods integrate logical search over high-level actions geometric address this challenge....
How do societies learn and maintain social norms? Here we use multiagent reinforcement learning to investigate the dynamics of enforcement compliance behaviors. Artificial agents populate a foraging environment need avoid poisonous berry. Agents eating berries better when doing so is taboo, meaning behavior punished by other agents. The taboo helps overcome credit assignment problem in discovering delayed health effects. Critically, introducing an additional which results punishment for...
Adversarial examples are a pervasive phenomenon of machine learning models where seemingly imperceptible perturbations to the input lead misclassifications for otherwise statistically accurate models. We propose geometric framework, drawing on tools from manifold reconstruction literature, analyze high-dimensional geometry adversarial examples. In particular, we highlight importance codimension: low-dimensional data manifolds embedded in space there many directions off which construct...
It is clear that one of the primary tools we can use to mitigate potential risk from a misbehaving AI system ability turn off. As capabilities systems improve, it important ensure such do not adopt subgoals prevent human switching This challenge because many formulations rational agents create strong incentives for self-preservation. caused by built-in instinct, but agent will maximize expected utility and cannot achieve whatever objective has been given if dead. Our goal study an allow...
All natural languages are structured hierarchically. In humans, this structural restriction is neurologically coded: when two grammars presented with identical vocabularies, brain areas responsible for language processing only sensitive to hierarchical grammars. Using large models (LLMs), we investigate whether such functionally distinct regions can arise solely from exposure large-scale distributions. We generate inputs using English, Italian, Japanese, or nonce words, varying the...
Leading AI developers and startups are increasingly deploying agentic systems that can plan execute complex tasks with limited human involvement. However, there is currently no structured framework for documenting the technical components, intended uses, safety features of systems. To fill this gap, we introduce Agent Index, first public database to document information about deployed For each system meets criteria inclusion in index, system's components (e.g., base model, reasoning...
Evaluations of large language model (LLM) risks and capabilities are increasingly being incorporated into AI risk management governance frameworks. Currently, most evaluations conducted by designing inputs that elicit harmful behaviors from the system. However, a fundamental limitation this approach is harmfulness identified during any particular evaluation can only lower bound model's worst-possible-case behavior. As complementary method for eliciting behaviors, we propose evaluating LLMs...
Nations across the world are working to govern AI. However, from a technical perspective, there is uncertainty and disagreement on best way do this. Meanwhile, recent debates over AI regulation have led calls for "evidence-based policy" which emphasize holding regulatory action high evidentiary standard. Evidence of irreplaceable value policymaking. too an standard can lead systematic neglect certain risks. In historical policy (e.g., tobacco ca. 1965 fossil fuels 1985) rhetoric also...
Fundamental to robotics is the debate between model-based and model-free learning: should robot build an explicit model of world, or learn a policy directly? In context HRI, part world be modeled human. One option for treat human as black box how they act directly. But it can also agent, rely on "theory mind" guide bias learning (grey box). We contribute characterization performance these methods under optimistic case having ideal theory mind, well different scenarios in which assumptions...
The execution of long-horizon tasks under uncertainty is a fundamental challenge in robotics. Recent approaches have made headway on these with an integration task and motion planning. In this paper, we present Interfaced Belief Space Planning (IBSP): modular approach to planning belief space. We use task-independent interface layer combine off-the-shelf classical planner inference. determinize the problem maximum likelihood observation assumption obtain deterministic representation where...
Our goal is to enable robots time their motion in a way that purposefully expressive of internal states, making them more transparent people. We start by investigating what types states timing capable expressing, focusing on robot manipulation and keeping the path constant while systematically varying timing. find users naturally pick up certain properties (like confidence), naturalness), or task weight object carrying). then conduct hypothesis-driven experiment tease out directions...
We suggest that the analysis of incomplete contracting developed by law and economics researchers can provide a useful framework for understanding AI alignment problem help to generate systematic approach finding solutions. first an overview literature explore parallels between this work alignment. As we emphasize, misalignment principal agent is core focus economic analysis. highlight some technical results from on contracts may insights researchers. Our contribution, however, bring bear...
As governments and industry turn to increased use of automated decision systems, it becomes essential consider how closely such systems can reproduce human judgment. We identify a core potential failure, finding that annotators label objects differently depending on whether they are being asked factual question or normative question. This challenges natural assumption maintained in many standard machine-learning (ML) data acquisition procedures: there is no difference between predicting the...
Recent work [1], [2] has shown promising results in enabling robotic manipulation of deformable objects through learning from demonstrations. Their method computes a registration training scene to test scene, and then applies an extrapolation this the gripper motion obtain for scene. The warping cost scene-to-scene registrations is used determine nearest neighbor set Then once been generalized situation, they apply trajectory optimization [3] plan robot motions that will track predicted...
Reward functions are easy to misspecify; although designers can make corrections after observing mistakes, an agent pursuing a misspecified reward function irreversibly change the state of its environment. If that precludes optimization correctly specified function, then correction is futile. For example, robotic factory assistant could break expensive equipment due misspecification; even if immediately correct damage done. To mitigate this risk, we introduce approach balances primary with...
Learning preferences implicit in the choices humans make is a well studied problem both economics and computer science. However, most work makes assumption that are acting (noisily) optimally with respect to their preferences. Such approaches can fail when people themselves learning about what they want. In this work, we introduce assistive multi-armed bandit, where robot assists human playing bandit task maximize cumulative reward. problem, does not know reward function but learn it through...