Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health

FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology Machine Learning (cs.LG)
DOI: 10.1609/aaai.v37i10.26431 Publication Date: 2023-06-27T18:01:54Z
ABSTRACT
This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but known correlated features. The goal is to learn a model predict given features, where the Whittle index policy solves RMAB using predicted transitions. However, prior works often by maximizing predictive accuracy instead of final solution quality, causing mismatch between training and evaluation objectives. To address this shortcoming, we propose novel approach for decision-focused learning in that directly trains maximize quality. We present three key contributions: (i) establish differentiability support learning; (ii) significantly improve scalability approaches sequential problems, specifically problems; (iii) apply our algorithm previously collected dataset maternal child health demonstrate its performance. Indeed, first scales real-world problem sizes.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (1)