- Organ Donation and Transplantation
- Reinforcement Learning in Robotics
- Blood donation and transfusion practices
- Transportation and Mobility Innovations
- Explainable Artificial Intelligence (XAI)
- Privacy-Preserving Technologies in Data
- Ethics and Social Impacts of AI
- Software Reliability and Analysis Research
- Economic and Environmental Valuation
- Adversarial Robustness in Machine Learning
- Bayesian Modeling and Causal Inference
- Cryptography and Data Security
- Blockchain Technology Applications and Security
University of California, Berkeley
2020-2021
Berkeley College
2020
Duke University
2018
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with goals. RLHF has emerged as the central method used finetune state-of-the-art large language models (LLMs). Despite this popularity, there been relatively little public work systematizing its flaws. In paper, we (1) survey open problems and fundamental limitations of related methods; (2) overview techniques understand, improve, complement in practice; (3) propose auditing disclosure...
The efficient and fair allocation of limited resources is a classical problem in economics computer science. In kidney exchanges, central market maker allocates living donors to patients need an organ. Patients exchanges are prioritized using ad-hoc weights decided on by committee then fed into algorithm that determines who gets what--and does not. this paper, we provide end-to-end methodology for estimating individual participant profiles exchange. We first elicit from human subjects list...
The efficient allocation of limited resources is a classical problem in economics and computer science. In kidney exchanges, central market maker allocates living donors to patients need an organ. Patients exchanges are prioritized using ad-hoc weights decided on by committee then fed into algorithm that determines who get what—and does not. this paper, we provide end-to-end methodology for estimating individual participant profiles exchange. We first elicit from human subjects list patient...
Specifying reward functions for robots that operate in environments without a natural signal can be challenging, and incorrectly specified rewards incentivise degenerate or dangerous behavior. A promising alternative to manually specifying is enable infer them from human feedback, like demonstrations corrections. To interpret this treat as approximately optimal choice the person makes set, set of possible trajectories they could have demonstrated corrections made. In work, we introduce idea...