Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning

Benchmark (surveying) Implementation Unintended consequences
DOI: 10.1017/s0269888918000292 Publication Date: 2018-12-04T06:42:17Z
ABSTRACT
Abstract The majority of multi-agent reinforcement learning (MARL) implementations aim to optimize systems with respect a single objective, despite the fact that many real-world problems are inherently multi-objective in nature. Research into MARL is still its infancy, and few studies date have dealt issue credit assignment. Reward shaping has been proposed as means address assignment problem single-objective MARL, however it shown alter intended goals domain if misused, leading unintended behaviour. Two popular methods potential-based reward difference rewards, both repeatedly improve speed quality joint policies learned by agents domains. This work discusses theoretical implications applying these approaches cooperative problems, evaluates their efficacy using two benchmark Our results constitute first empirical evidence methodologies can sample true Pareto optimal solutions stochastic games.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (54)
CITATIONS (44)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....