NFDI4DS | UHH-SEMS - Publication Details

Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning

Regularization

DOI: 10.48550/arxiv.2405.19909 Publication Date: 2024-05-30

Abstract Supplemental Material References Cited by

AUTHORS (6)

Tenglong Liu

Yang Li

Yixing Lan

Hao Gao

Wei Pan

Xin Xu

ABSTRACT

In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced. To address this, existing methods often constrain learned policy through regularization. However, these suffer from issue unnecessary conservativeness, hampering improvement. This occurs due to indiscriminate use all actions behavior that generates dataset as constraints. The problem becomes particularly noticeable when quality suboptimal. Thus, we propose Adaptive Advantage-guided Policy Regularization (A2PR), obtaining high-advantage an augmented combined with VAE guide policy. A2PR can select differ those present in dataset, while still effectively maintaining conservatism OOD actions. achieved by harnessing capacity generate samples matching distribution data points. We theoretically prove improvement guaranteed. Besides, it mitigates value overestimation a bounded performance gap. Empirically, conduct series experiments on D4RL benchmark, where demonstrates state-of-the-art performance. Furthermore, experimental results additional suboptimal mixed datasets reveal exhibits superior Code available at https://github.com/ltlhuuu/A2PR.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....