Skip to Main Content (Press Enter)

Logo UNIMI
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione

Expertise & Skills
Logo UNIMI

|

Expertise & Skills

unimi.it
  • ×
  • Home
  • Persone
  • Attività
  • Ambiti
  • Strutture
  • Pubblicazioni
  • Terza Missione
  1. Pubblicazioni

Importance-Weighted Offline Learning Done Right

Contributo in Atti di convegno
Data di Pubblicazione:
2024
Citazione:
Importance-Weighted Offline Learning Done Right / G. Gabbianelli, G. Neu, M. Papini (PROCEEDINGS OF MACHINE LEARNING RESEARCH). - In: Algorithmic Learning Theory[s.l] : PMLR, 2024. - pp. 614-634 (( International Conference on Algorithmic Learning Theory : February, 25 - 28 San Diego (California, USA) 2024.
Abstract:
We study the problem of offline policy optimization in stochastic contextual bandit problems, where the goal is to learn a near-optimal policy based on a dataset of decision data collected by a suboptimal behavior policy. Rather than making any structural assumptions on the reward function, we assume access to a given policy class and aim to compete with the best comparator policy within this class. In this setting, a standard approach is to compute importance-weighted estimators of the value of each policy, and select a policy that minimizes the estimated value up to a “pessimistic” adjustment subtracted from the estimates to reduce their random fluctuations. In this paper, we show that a simple alternative approach based on the “implicit exploration” estimator of \citet{Neu2015} yields performance guarantees that are superior in nearly all possible terms to all previous results. Most notably, we remove an extremely restrictive “uniform coverage” assumption made in all previous works. These improvements are made possible by the observation that the upper and lower tails importance-weighted estimators behave very differently from each other, and their careful control can massively improve on previous results that were all based on symmetric two-sided concentration inequalities. We also extend our results to infinite policy classes in a PAC-Bayesian fashion, and showcase the robustness of our algorithm to the choice of hyper-parameters by means of numerical simulations.
Tipologia IRIS:
03 - Contributo in volume
Elenco autori:
G. Gabbianelli, G. Neu, M. Papini
Autori di Ateneo:
PAPINI MATTEO ( autore )
Link alla scheda completa:
https://air.unimi.it/handle/2434/1226197
Link al Full Text:
https://air.unimi.it/retrieve/handle/2434/1226197/3278152/gabbianelli24a%20(1).pdf
Titolo del libro:
Algorithmic Learning Theory
  • Aree Di Ricerca

Aree Di Ricerca

Settori (2)


Settore IINF-05/A - Sistemi di elaborazione delle informazioni

Settore INFO-01/A - Informatica
  • Informazioni
  • Assistenza
  • Accessibilità
  • Privacy
  • Utilizzo dei cookie
  • Note legali

Realizzato con VIVO | Progettato da Cineca | 26.5.1.0