Dorsa Sadigh

ORCID: 0000-0002-7802-9183
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Reinforcement Learning in Robotics
  • Robot Manipulation and Learning
  • Topic Modeling
  • Multimodal Machine Learning Applications
  • Formal Methods in Verification
  • Machine Learning and Algorithms
  • Social Robot Interaction and HRI
  • Natural Language Processing Techniques
  • Traffic control and management
  • Human Pose and Action Recognition
  • Autonomous Vehicle Technology and Safety
  • Transportation and Mobility Innovations
  • Domain Adaptation and Few-Shot Learning
  • Transportation Planning and Optimization
  • Adversarial Robustness in Machine Learning
  • Robotic Path Planning Algorithms
  • Optimization and Search Problems
  • Human-Automation Interaction and Safety
  • Explainable Artificial Intelligence (XAI)
  • Data Stream Mining Techniques
  • Speech and dialogue systems
  • Multi-Agent Systems and Negotiation
  • Machine Learning and Data Classification
  • Advanced Bandit Algorithms Research
  • AI-based Problem Solving and Planning

Stanford University
2018-2025

DeepMind (United Kingdom)
2024

Google (United States)
2023-2024

University of California, Berkeley
2011-2023

Laboratoire d'Informatique de Paris-Nord
2020-2023

Princeton University
2021

Corvallis Environmental Center
2020

American Institute of Aeronautics and Astronautics
2020

École Polytechnique
2020

Applied Mathematics (United States)
2020

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and adaptable to wide range downstream tasks. We call these foundation underscore their critically central yet incomplete character. This report provides thorough account opportunities risks models, ranging from capabilities language, vision, robotics, reasoning, human interaction) technical principles(e.g., model architectures, training procedures, data, systems,...

10.48550/arxiv.2108.07258 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Traditionally, autonomous cars make predictions about other drivers' future trajectories, and plan to stay out of their way.This tends result in defensive opaque behaviors.Our key insight is that an car's actions will actually affect what do response, whether the car aware it or not.Our thesis we can leverage these responses more efficient communicative behaviors.We model interaction between a human driver as dynamical system, which robot's have immediate consequences on state car, but also...

10.15607/rss.2016.xii.029 article EN 2016-06-27

We present a counterexample-guided inductive synthesis approach to controller for cyber-physical systems subject signal temporal logic (STL) specifications, operating in potentially adversarial nondeterministic environments. encode STL specifications as mixed integer-linear constraints on the variables of discrete-time model system and environment dynamics, solve series optimization problems yield satisfying control sequence. demonstrate how scheme can be used receding horizon fashion...

10.1145/2728606.2728628 article EN 2015-04-14

Our goal is to efficiently learn reward functions encoding a human's preferences for how dynamical system should act.There are two challenges with this.First, in many problems it difficult people provide demonstrations of the desired trajectory (like high-DOF robot arm motion or an aggressive driving maneuver), even assign much numerical action get.We build on work label ranking and propose from (or comparisons) instead: person provides relative preference between trajectories.Second,...

10.15607/rss.2017.xiii.053 article EN 2017-07-12

The actions of an autonomous vehicle on the road affect and are affected by those other drivers, whether overtaking, negotiating a merge, or avoiding accident. This mutual dependence, best captured dynamic game theory, creates strong coupling between vehicle's planning its predictions drivers' behavior, constitutes open problem with direct implications safety viability driving technology. Unfortunately, games too computationally demanding to meet real-time constraints in continuous state...

10.1109/icra.2019.8794007 article EN 2022 International Conference on Robotics and Automation (ICRA) 2019-05-01

Much of estimation human internal state (goal, intentions, activities, preferences, etc.) is passive: an algorithm observes actions and updates its estimate state. In this work, we embrace the fact that robot affect what humans do, leverage it to improve estimation. We enable robots do active information gathering, by planning probe user in order clarify their For instance, autonomous car will plan nudge into a driver's lane test driving style. Results simulation study suggest gathering...

10.1109/iros.2016.7759036 article EN 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2016-10-01

Making AI more trustworthy with a formal methods-based approach to system verification and validation.

10.1145/3503914 article EN Communications of the ACM 2022-06-21

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with goals. RLHF has emerged as the central method used finetune state-of-the-art large language models (LLMs). Despite this popularity, there been relatively little public work systematizing its flaws. In paper, we (1) survey open problems and fundamental limitations of related methods; (2) overview techniques understand, improve, complement in practice; (3) propose auditing disclosure...

10.48550/arxiv.2307.15217 preprint EN cc-by arXiv (Cornell University) 2023-01-01

10.1109/cvpr52733.2024.01370 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Safe control of dynamical systems that satisfy temporal invariants expressing various safety properties is a challenging problem has drawn the attention many researchers.However, making assumption such are deterministic far from reality.For example, robotic system might employ camera sensor and machine learned to identify obstacles.Consequently, controller satisfy, will be function data associated classifier.We propose framework for achieving safe control.At heart our approach new...

10.15607/rss.2016.xii.017 article EN 2016-06-27

We propose to synthesize a control policy for Markov decision process (MDP) such that the resulting traces of MDP satisfy linear temporal logic (LTL) property. construct product incorporates deterministic Rabin automaton generated from desired LTL The reward function is defined acceptance condition automaton. This construction allows us apply techniques learning theory problem synthesis specifications even when transition probabilities are not known priori. prove our method guaranteed find...

10.1109/cdc.2014.7039527 article EN 2014-12-01

Our goal is to accurately and efficiently learn reward functions for autonomous robots.Current approaches this problem include inverse reinforcement learning (IRL), which uses expert demonstrations, preference-based learning, iteratively queries the user her preferences between trajectories.In robotics however, IRL often struggles because it difficult get high-quality demonstrations; conversely, very inefficient since attempts a continuous, high-dimensional function from binary feedback.We...

10.15607/rss.2019.xv.023 article EN 2019-06-22

Imitation learning algorithms can be used to learn a policy from expert demonstrations without access reward signal. However, most existing approaches are not applicable in multi-agent settings due the existence of multiple (Nash) equilibria and non-stationary environments. We propose new framework for imitation general Markov games, where we build upon generalized notion inverse reinforcement learning. further introduce practical actor-critic algorithm with good empirical performance. Our...

10.48550/arxiv.1807.09936 preprint EN other-oa arXiv (Cornell University) 2018-01-01

In this paper, we present a planning framework that uses combination of implicit (robot motion) and explicit (visual/audio/haptic feedback) communication during mobile robot navigation. First, developed model approximates both continuous movements discrete behavior modes in human navigation, considering the effects on decision making. The as an optimal agent, with reward function obtained through inverse reinforcement learning. Second, planner to generate communicative actions maximize...

10.1109/tro.2020.2964824 article EN IEEE Transactions on Robotics 2020-01-23

Reward functions are a common way to specify the objective of robot. As designing reward can be extremely challenging, more promising approach is directly learn from human teachers. Importantly, data teachers collected either passively or actively in variety forms: passive sources include demonstrations (e.g., kinesthetic guidance), whereas preferences comparative rankings) elicited. Prior research has independently applied learning these different sources. However, there exist many domains...

10.1177/02783649211041652 article EN The International Journal of Robotics Research 2021-08-28

Despite the advances in autonomous driving domain, vehicles (AVs) are still inefficient and limited terms of cooperating with each other or coordinating operated by humans. A group human-driven (HVs) which work together to optimize an altruistic social utility can co-exist seamlessly assure safety efficiency on road. Achieving this mission without explicit coordination among agents is challenging, mainly due difficulty predicting behavior humans heterogeneous preferences mixed-autonomy...

10.1109/tits.2022.3207872 article EN IEEE Transactions on Intelligent Transportation Systems 2022-09-29

Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers also explored using LLMs advance the of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented LLM training corpora, existing efforts applying robotics largely treated as semantic planners or relied on human-engineered control primitives interface...

10.48550/arxiv.2306.08647 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Large, high-capacity models trained on diverse datasets have shown remarkable successes efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led a consolidation of pretrained models, with general backbones serving as starting point for many Can such happen in robotics? Conventionally, robotic learning methods train separate model every application, robot, and even environment. we instead generalist X-robot policy that can be adapted new robots,...

10.48550/arxiv.2310.08864 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning that may provide utility for robots, but remain prone confidently hallucinated predictions. In this work, we present KnowNo, which is framework measuring and aligning the uncertainty LLM-based planners such they know when don't ask help needed. KnowNo builds on theory conformal prediction statistical guarantees task completion while minimizing human in complex...

10.48550/arxiv.2307.01928 preprint EN cc-by arXiv (Cornell University) 2023-01-01

People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding acknowledge a person glancing at them or saying "excuse me" pass people in busy corridor. We would like robots also demonstrate human-robot interaction. Prior work proposes rule-based methods that struggle scale new communication modalities social situations, while data-driven require specialized datasets for each situation the robot is used in. propose leverage rich...

10.1145/3610977.3634999 preprint EN other-oa 2024-03-10
Coming Soon ...