- Reinforcement Learning in Robotics
- Sports Analytics and Performance
- Artificial Intelligence in Games
- Game Theory and Applications
- Anomaly Detection Techniques and Applications
- Advanced Bandit Algorithms Research
- Bayesian Modeling and Causal Inference
- Formal Methods in Verification
- Auction Theory and Applications
- Optimization and Search Problems
- Experimental Behavioral Economics Studies
- Multi-Agent Systems and Negotiation
- Robotic Path Planning Algorithms
- Fault Detection and Control Systems
- Data Stream Mining Techniques
- Human Pose and Action Recognition
- Evolutionary Game Theory and Cooperation
- Target Tracking and Data Fusion in Sensor Networks
- Autonomous Vehicle Technology and Safety
- Gaussian Processes and Bayesian Inference
- Modular Robots and Swarm Intelligence
- Multimodal Machine Learning Applications
- Sports Performance and Training
- Big Data and Business Intelligence
- Logic, Reasoning, and Knowledge
Google (United States)
2019-2024
DeepMind (United Kingdom)
2020-2022
Massachusetts Institute of Technology
2014-2020
Decision Systems (United States)
2015-2019
We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up a human expert level. is one few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular enormous tree on order $10^{535}$ nodes, i.e., $10^{175}$ times larger than Go. It additional complexity requiring decision-making under information, similar Texas hold'em poker, which significantly smaller (on $10^{164}$ nodes). Decisions in are...
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning search/planning games. supports n-player (single- multi- agent) zero-sum, cooperative general-sum, one-shot sequential, strictly turn-taking simultaneous-move, perfect imperfect information games, as well traditional multiagent such (partially- fully- observable) grid worlds social dilemmas. also includes tools to analyze dynamics other common evaluation metrics. This document serves both...
Collective human knowledge has clearly benefited from the fact that innovations by individuals are taught to others through communication. Similar social groups, agents in distributed learning systems would likely benefit communication share and teach skills. The problem of teaching improve agent been investigated prior works, but these approaches make assumptions prevent application general multiagent problems, or require domain expertise for problems they can apply to. This inherent...
The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities various team individual sports, including baseball, basketball, tennis. More recently, AI techniques have been applied to football, due a huge increase data collection by professional teams, increased computational power, advances learning, with the goal of better addressing new scientific challenges involved analysis both players’ coordinated teams’ behaviors. research...
Learning to combine control at the level of joint torques with longer-term goal-directed behavior is a long-standing challenge for physically embodied artificial agents. Intelligent in physical world unfolds across multiple spatial and temporal scales: Although movements are ultimately executed instantaneous muscle tensions or torques, they must be selected serve goals that defined on much longer time scales often involve complex interactions environment other Recent research has...
Abstract We introduce α - Rank , a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded novel dynamical game-theoretic solution concept called Markov Conley chains (MCCs). The approach leverages continuous-time discrete-time systems applied to empirical games, scales tractably number agents, type interactions (beyond dyadic), games (symmetric asymmetric). Current models are fundamentally limited one or more...
The focus of this paper is on solving multi-robot planning problems in continuous spaces with partial observability. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for coordination problems, but representing and Dec-POMDPs often intractable large problems. To allow a high-level representation that natural scalable to discrete extends the Dec-POMDP model Semi-Markov Process (Dec-POSMDP). Dec-POSMDP formulation allows asynchronous decision-making...
This work focuses on solving general multi-robot planning problems in continuous spaces with partial observability given a high-level domain description. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are models for coordination problems. However, representing and Dec-POMDPs is often intractable large extends the Dec-POMDP model to Semi-Markov Process (Dec-POSMDP) take advantage of representations that natural facilitate scalable solutions discrete The Dec-POSMDP...
This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle special cases, (2) principle applies to general-sum, many-player games. Despite this, prior studies of have been focused two-player zero-sum games, wherein Nash equilibria are tractably computable. In moving from games more settings,...
Multiagent reinforcement learning (MARL) algorithms have been demonstrated on complex tasks that require the coordination of a team multiple agents to complete. Existing works focused sharing information between via centralized critics stabilize or through communication improve performance, but do not generally consider how can be shared address curse dimensionality in MARL. We posit multiagent problem decomposed into multi-task where each agent explores subset state space instead exploring...
Planning, control, perception, and learning are current research challenges in multirobot systems. The transition dynamics of the robots may be unknown or stochastic, making it difficult to select best action each robot must take at a given time. observation model, function robots' sensor systems, noisy partial, meaning that deterministic knowledge team's state is often impossible attain. Moreover, actions can have an associated success rate and/or probabilistic completion Robots designed...
This paper investigates the geometrical properties of real world games (e.g. Tic-Tac-Toe, Go, StarCraft II). We hypothesise that their structure resemble a spinning top, with upright axis representing transitive strength, and radial axis, which corresponds to number cycles exist at particular non-transitive dimension. prove existence this geometry for wide class games, exposing temporal nature. Additionally, we show unique also has consequences learning - it clarifies why populations...
Safety certification of data-driven control techniques remains a major open problem. This work investigates backward reachability as framework for providing collision avoidance guarantees systems controlled by neural network (NN) policies. Because NNs are typically not invertible, existing methods conservatively assume domain over which to relax the NN, causes loose over-approximations set states that could lead system into obstacle (i.e., backprojection (BP) sets). To address this issue, we...
This paper presents a data-driven approach for multi-robot coordination in partially-observable domains based on Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and macro-actions (MAs). Dec-POMDPs provide general framework cooperative sequential decision making under uncertainty MAs allow temporally extended asynchronous action execution. To date, most methods assume the underlying Dec-POMDP model is known priori or full simulator available during planning time....
This paper introduces a probabilistic algorithm for multi-robot decision-making under uncertainty, which can be posed as Decentralized Partially Observable Markov Decision Process (Dec-POMDP). Dec-POMDPs are inherently synchronous frameworks require significant computational resources to solved, making them infeasible many real-world robotics applications. The Semi-Markov (Dec-POSMDP) was recently introduced an extension of the Dec-POMDP that uses high-level macro-actions allow large-scale,...
Abstract In multiagent worlds, several decision-making individuals interact while adhering to the dynamics constraints imposed by environment. These interactions, combined with potential stochasticity of agents’ dynamic behaviors, make such systems complex and interesting study from a perspective. Significant research has been conducted on learning models for forward-direction estimation agent example, pedestrian predictions used collision-avoidance in self-driving cars. many settings, only...
Many robotic missions require online estimation of the unknown state transition models associated with uncertainty that stems from mission dynamics. The learning problem is usually distributed among agents in multiagent scenarios, either due to absence a centralized processing unit or because large size joint problem. This paper addresses likely scenario estimate different their measured data, but they can share information by communicating model parameters. Previous approaches consider...
Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these multiagent environments poses problems such as nonstationarity instability. In this paper, we first demonstrate that standard softmax-based policy can be prone to poor performance presence even most benign nonstationarity. By contrast, it is known replicator dynamics, a well-studied model from evolutionary game theory, eliminates dominated...