NFDI4DS | UHH-SEMS - Publication Details

Bandit Online Learning in Merely Coherent Games with Multi-Point Pseudo-Gradient Estimate

0301 basic medicine 03 medical and health sciences Optimization and Control (math.OC) FOS: Mathematics Mathematics - Optimization and Control

DOI: 10.48550/arxiv.2303.16430 Publication Date: 2023-12-13

Abstract Supplemental Material References Cited by

AUTHORS (2)

Huang, Yuanhanqing

Hu, Jianghai

ABSTRACT

Non-cooperative games serve as a powerful framework for capturing the interactions among self-interested players and have broad applicability in modeling a wide range of practical scenarios, ranging from power management to drug delivery. Although most existing solution algorithms assume the availability of first-order information or full knowledge of the objectives and others' action profiles, there are situations where the only accessible information at players' disposal is the realized objective function values. In this paper, we devise a bandit online learning algorithm for merely coherent games that integrates the optimistic mirror descent scheme and multi-point pseudo-gradient estimates. We further demonstrate that the generated actual sequence of play can converge a.s. to a critical point if the sequences of query radius and sample size are chosen properly, without resorting to extra Tikhonov regularization terms or additional norm conditions. Finally, we illustrate the validity of the proposed algorithm via a Rock-Paper-Scissors game and a least square estimation game.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

Bandit Online Learning in Merely Coherent Games with Multi-Point Pseudo-Gradient Estimate

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....