Data di Pubblicazione:
2024
Citazione:
Linear Bandits with Memory / G. Clerici, P. Laforgue, N. Cesa Bianchi. - In: TRANSACTIONS ON MACHINE LEARNING RESEARCH. - ISSN 2835-8856. - 5(2024), pp. 1-26.
Abstract:
Nonstationary phenomena, such as satiation effects in recommendations, have mostly been
modeled using bandits with finitely many arms. However, the richer action space provided by
linear bandits is often preferred in practice. In this work, we introduce a novel nonstationary
linear bandit model, where current rewards are influenced by the learner’s past actions in
a fixed-size window. Our model, which recovers stationary linear bandits as a special case,
leverages two parameters: the window size m ≥ 0, and an exponent γ that captures the
rotting (γ < 0) or rising (γ > 0) nature of the phenomenon. When both m and γ are known,
we propose and analyze a variant of OFUL which minimizes regret against cyclic policies.
By choosing the cycle length so as to trade-off approximation and estimation errors, we then
prove a bound of order √d (m + 1) 1
2 +max{γ,0} T 3/4 (ignoring log factors) on the regret against
the optimal sequence of actions, where T is the horizon and d is the dimension of the linear
action space. Through a bandit model selection approach, our results are then extended to
the case where both m and γ are unknown. Finally, we complement our theoretical results
with experiments comparing our approach to natural baselines
Tipologia IRIS:
01 - Articolo su periodico
Elenco autori:
G. Clerici, P. Laforgue, N. Cesa Bianchi
Link alla scheda completa:
Link al Full Text:
Progetto: