Adaptive Reward-Poisoning Attacks against Reinforcement Learning

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Cryptography and Security Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Statistics - Machine Learning Machine Learning (stat.ML) 01 natural sciences Cryptography and Security (cs.CR) 0105 earth and related environmental sciences Machine Learning (cs.LG)
DOI: 10.48550/arxiv.2003.12613 Publication Date: 2020-01-01
ABSTRACT
In reward-poisoning attacks against reinforcement learning (RL), an attacker can perturb the environment reward $r_t$ into $r_t+δ_t$ at each step, with goal of forcing RL agent to learn a nefarious policy. We categorize such by infinity-norm constraint on $δ_t$: provide lower threshold below which attack is infeasible and certified be safe; we corresponding upper above feasible. Feasible further categorized as non-adaptive where $δ_t$ depends only $(s_t,a_t, s_{t+1})$, or adaptive agent's process time $t$. Non-adaptive have been focus prior works. However, show that under mild conditions, achieve policy in steps polynomial state-space size $|S|$, whereas require exponential steps. constructive proof Fast Adaptive Attack strategy achieves rate. Finally, empirically find effective using state-of-the-art deep techniques.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....