NFDI4DS | UHH-SEMS - Publication Details

Safe Reinforcement Learning via Shielding

FOS: Computer and information sciences Computer Science - Logic in Computer Science Computer Science - Machine Learning 0209 industrial biotechnology Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence 02 engineering and technology Logic in Computer Science (cs.LO) Machine Learning (cs.LG)

DOI: 10.1609/aaai.v32i1.11797 Publication Date: 2022-06-21T20:48:40Z

Abstract Supplemental Material References Cited by

AUTHORS (6)

Mohammed Alshiekh

Roderick Bloem

Rüdiger Ehlers

Bettina Könighofer

Scott Niekum

Ufuk Topcu

ABSTRACT

Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive system called a shield. The shield monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification. We discuss which requirements a shield must meet to preserve the convergence guarantees of the learner. Finally, we demonstrate the versatility of our approach on several challenging reinforcement learning scenarios.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (308)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

Safe Reinforcement Learning via Shielding

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....