Skip to Main Content (Press Enter)

Logo UNIMI
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione

Expertise & Skills
Logo UNIMI

|

Expertise & Skills

unimi.it
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione
  1. Pubblicazioni

Delayed Bandits: When Do Intermediate Observations Help?

Contributo in Atti di convegno
Data di Pubblicazione:
2023
Citazione:
Delayed Bandits: When Do Intermediate Observations Help? / E. Esposito, S. Masoudian, H. Qiu, D. VAN DER HOEVEN, N. Cesa-Bianchi, Y. Seldin (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: ICML / [a cura di] A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, J. Scarlett. - [s.l] : PMLR, 2023. - pp. 9374-9395 (( Intervento presentato al 40. convegno International Conference on Machine Learning : 23 through 29 July tenutosi a Honolulu nel 2023.
Abstract:
We study a $K$-armed bandit with delayed feedback and intermediate observations. We consider a model, where intermediate observations have a form of a finite state, which is observed immediately after taking an action, whereas the loss is observed after an adversarially chosen delay. We show that the regime of the mapping of states to losses determines the complexity of the problem, irrespective of whether the mapping of actions to states is stochastic or adversarial. If the mapping of states to losses is adversarial, then the regret rate is of order $\sqrt{(K+d)T}$ (within log factors), where $T$ is the time horizon and $d$ is a fixed delay. This matches the regret rate of a $K$-armed bandit with delayed feedback and without intermediate observations, implying that intermediate observations are not helpful. However, if the mapping of states to losses is stochastic, we show that the regret grows at a rate of $\sqrt{\bigl(K+\min\{|\mathcal{S}|,d\}\bigr)T}$ (within log factors), implying that if the number $|\mathcal{S}|$ of states is smaller than the delay, then intermediate observations help. We also provide refined high-probability regret upper bounds for non-uniform delays, together with experimental validation of our algorithms.
Tipologia IRIS:
03 - Contributo in volume
Elenco autori:
E. Esposito, S. Masoudian, H. Qiu, D. VAN DER HOEVEN, N. Cesa-Bianchi, Y. Seldin
Autori di Ateneo:
CESA BIANCHI NICOLO' ANTONIO ( autore )
ESPOSITO EMMANUEL ( autore )
QIU HAO ( autore )
Link alla scheda completa:
https://air.unimi.it/handle/2434/1024138
Link al Full Text:
https://air.unimi.it/retrieve/handle/2434/1024138/2345779/esposito23a.pdf
Titolo del libro:
ICML
Progetto:
Algorithms, Games, and Digital Markets (ALGADIMAR)
  • Aree Di Ricerca

Aree Di Ricerca

Settori


Settore INF/01 - Informatica
  • Informazioni
  • Assistenza
  • Accessibilità
  • Privacy
  • Utilizzo dei cookie
  • Note legali

Realizzato con VIVO | Progettato da Cineca | 25.11.5.0