Learning on the Edge: Online Learning with Stochastic Feedback Graphs
Contributo in Atti di convegno
Data di Pubblicazione:
2022
Citazione:
Learning on the Edge: Online Learning with Stochastic Feedback Graphs / E. Esposito, F. Fusco, D. van der Hoeven, N. Cesa Bianchi (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). - In: NeurIPS / [a cura di] S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh. - [s.l] : Curran Associates, 2022. - pp. 34776-34788 (( Intervento presentato al 36. convegno Conference on Neural Information Processing Systems : Monday November 28th through Friday December 9th tenutosi a New Orleans nel 2022.
Abstract:
The framework of feedback graphs is a generalization of sequential decisionmaking
with bandit or full information feedback. In this work, we study
an extension where the directed feedback graph is stochastic, following
a distribution similar to the classical Erdős-Rényi model. Specifically, in
each round every edge in the graph is either realized or not with a distinct
probability for each edge. We prove nearly optimal regret bounds of
order min
minε
p
(αε/ε)T, minε(δε/ε)1/3T2/3
(ignoring logarithmic factors),
where αε and δε are graph-theoretic quantities measured on the
support of the stochastic feedback graph G with edge probabilities thresholded
at ε. Our result, which holds without any preliminary knowledge
about G, requires the learner to observe only the realized out-neighborhood
of the chosen action. When the learner is allowed to observe the realization
of the entire graph (but only the losses in the out-neighborhood of the
chosen action), we derive a more efficient algorithm featuring a dependence
on weighted versions of the independence and weak domination numbers
that exhibits improved bounds for some special cases.
with bandit or full information feedback. In this work, we study
an extension where the directed feedback graph is stochastic, following
a distribution similar to the classical Erdős-Rényi model. Specifically, in
each round every edge in the graph is either realized or not with a distinct
probability for each edge. We prove nearly optimal regret bounds of
order min
minε
p
(αε/ε)T, minε(δε/ε)1/3T2/3
(ignoring logarithmic factors),
where αε and δε are graph-theoretic quantities measured on the
support of the stochastic feedback graph G with edge probabilities thresholded
at ε. Our result, which holds without any preliminary knowledge
about G, requires the learner to observe only the realized out-neighborhood
of the chosen action. When the learner is allowed to observe the realization
of the entire graph (but only the losses in the out-neighborhood of the
chosen action), we derive a more efficient algorithm featuring a dependence
on weighted versions of the independence and weak domination numbers
that exhibits improved bounds for some special cases.
Tipologia IRIS:
03 - Contributo in volume
Elenco autori:
E. Esposito, F. Fusco, D. van der Hoeven, N. Cesa Bianchi
Link alla scheda completa:
Titolo del libro:
NeurIPS