NFDI4DS | UHH-SEMS - Publication Details

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Artificial Intelligence Machine Learning (stat.ML) 02 engineering and technology Systems and Control (eess.SY) Electrical Engineering and Systems Science - Systems and Control Machine Learning (cs.LG) Methodology (stat.ME) Artificial Intelligence (cs.AI) Statistics - Machine Learning 0202 electrical engineering, electronic engineering, information engineering FOS: Electrical engineering, electronic engineering, information engineering Statistics - Methodology

DOI: 10.48550/arxiv.1511.03722 Publication Date: 2015-01-01

Abstract Supplemental Material References Cited by

AUTHORS (2)

Jiang, Nan

Li, Lihong

ABSTRACT

We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy. This problem is often a critical step when applying RL in real-world problems. Despite its importance, existing general methods either have uncontrolled bias or suffer high variance. In this work, we extend the doubly robust estimator for bandits to sequential decision-making problems, which gets the best of both worlds: it is guaranteed to be unbiased and can have a much lower variance than the popular importance sampling estimators. We demonstrate the estimator's accuracy in several benchmark problems, and illustrate its use as a subroutine in safe policy improvement. We also provide theoretical results on the hardness of the problem, and show that our estimator can match the lower bound in certain scenarios.<br/>14 pages; 4 figures; ICML 2016<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....