NFDI4DS | UHH-SEMS - Publication Details

Adaptive Reward-Poisoning Attacks against Reinforcement Learning

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Cryptography and Security Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Statistics - Machine Learning Machine Learning (stat.ML) 01 natural sciences Cryptography and Security (cs.CR) 0105 earth and related environmental sciences Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2003.12613 Publication Date: 2020-01-01

Abstract Supplemental Material References Cited by

AUTHORS (4)

Xuezhou Zhang

Yuzhe Ma

Adish Singla

Junwei Zhu

ABSTRACT

In reward-poisoning attacks against reinforcement learning (RL), an attacker can perturb the environment reward $r_t$ into $r_t+δ_t$ at each step, with goal of forcing RL agent to learn a nefarious policy. We categorize such by infinity-norm constraint on $δ_t$: provide lower threshold below which attack is infeasible and certified be safe; we corresponding upper above feasible. Feasible further categorized as non-adaptive where $δ_t$ depends only $(s_t,a_t, s_{t+1})$, or adaptive agent's process time $t$. Non-adaptive have been focus prior works. However, show that under mild conditions, achieve policy in steps polynomial state-space size $|S|$, whereas require exponential steps. constructive proof Fast Adaptive Attack strategy achieves rate. Finally, empirically find effective using state-of-the-art deep techniques.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Adaptive Reward-Poisoning Attacks against Reinforcement Learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....