Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

Sample complexity Speedup
DOI: 10.48550/arxiv.2404.08003 Publication Date: 2024-04-09
ABSTRACT
To improve the efficiency of reinforcement learning, we propose a novel asynchronous federated learning framework termed AFedPG, which constructs global model through collaboration among $N$ agents using policy gradient (PG) updates. handle challenge lagged policies in settings, design delay-adaptive lookahead and normalized update techniques that can effectively heterogeneous arrival times gradients. We analyze theoretical convergence bound characterize advantage proposed algorithm terms both sample complexity time complexity. Specifically, our AFedPG method achieves $\mathcal{O}(\frac{{\epsilon}^{-2.5}}{N})$ at each agent on average. Compared to single setting with $\mathcal{O}(\epsilon^{-2.5})$ complexity, it enjoys linear speedup respect number agents. Moreover, compared synchronous FedPG, improves from $\mathcal{O}(\frac{t_{\max}}{N})$ $\mathcal{O}(\frac{1}{\sum_{i=1}^{N} \frac{1}{t_{i}}})$, where $t_{i}$ denotes consumption iteration $i$, $t_{\max}$ is largest one. The latter \frac{1}{t_{i}}})$ always smaller than former one, this improvement becomes significant large-scale settings computing powers ($t_{\max}\gg t_{\min}$). Finally, empirically verify improved performances three MuJoCo environments varying numbers also demonstrate improvements different heterogeneity.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()