A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs
Contributo in Atti di convegno
Data di Pubblicazione:
2023
Citazione:
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs / D. van der Hoeven, L. Zierahn, T. Lancewicki, A. Rosenberg, N. Cesa Bianchi (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: Proceedings of Thirty Sixth Conference on Learning Theory / [a cura di] G. Neu, L. Rosasco. - [s.l] : PMLR, 2023. - pp. 1285-1321 (( Intervento presentato al 36. convegno Annual Conference on Learning Theory tenutosi a Bangalore nel 2023.
Abstract:
We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in three important settings. On the one hand, we derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov decision processes with delay (and known transition functions). On the other hand, we use our analysis to derive an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. Our novel regret decomposition shows that FTRL remains stable across multiple rounds under mild assumptions on the Hessian of the regularizer.
Tipologia IRIS:
03 - Contributo in volume
Keywords:
Online learning; bandit feedback; delayed feedback
Elenco autori:
D. van der Hoeven, L. Zierahn, T. Lancewicki, A. Rosenberg, N. Cesa Bianchi
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
Proceedings of Thirty Sixth Conference on Learning Theory