UQSay #50

Date : 13/10/2022

Lieu : Online (Teams)

The fiftieth UQSay seminar on UQ, DACE and related topics will take place online on Thursday afternoon, October 13, 2022.

2–3 PM — Gilles Stoltz (LMO, Université Paris-Saclay – CNRS) — [slides]

Multi-armed bandit problems: a statistical view, focused on lower bounds

Multi-armed bandit problems correspond to facing K unknown probability distributions, having to sequentially pull one of them, and observing a realization thereof at each pull. Two goals will be considered.

(1) The realizations are payoffs, and the sum of these payoffs is to be maximized. This goal is achieving by minimizing regret, which is defined as the expected performance of the best arm minus the expected sum of payoffs achieved by a strategy. Two types of bounds may be defined, depending on whether they may depend on the specific bandit problem or only on the model (the class of possible distributions). We will recall classical strategies like UCB and MOSS, as well as a new strategy combining both, called KL-UCB-Switch. We will review upper bounds on the regret and detail which lower bounds may be achieved, and how. We will deal with one interesting extension, the adaptation to the unknown range of the distributions, i.e., when the distribution are supported on a compact interval that is unknown as well.

The case of regret minimization is very well understood in the literature, contrary to:

(2) A second goal can be to identify the best arm, i.e., control the probability that after T observations (sampled adaptively) the strategy does not identify the arm with the highest expectation. This is called best arm identification with a fixed budget. Limited results are available. We will describe a typical strategy, called successive rejects, that drops one distribution after the other after horse racing them. We will also indicate how we are currently laying the foundations of a non-parametric approach to this problem, based on KL divergences, as opposed to typical approaches based on differences between expectations.

Joint work with Antoine Barrier, Aurélien Garivier, Hédi Hadiji & Pierre Ménard.

Refs:

KL-UCB-Switch: JMLR, 23(179):1−66, 2022,
Lower bounds for regret minimization: MOOR, 4(2):377-766, 2019 [HAL],
Adaptation to the unknown range: HAL-02794382,
Best-arm identification: HAL-03792668.

Organizing committee: Pierre Barbillon (MIA-Paris), Julien Bect (L2S), Nicolas Bousquet (EDF R&D), Amélie Fau (LMPS), Filippo Gatti (LMPS), Bertrand Iooss (EDF R&D), Alexandre Janon (LMO), Sidonie Lefebvre (ONERA), Didier Lucor (LISN), Emmanuel Vazquez (L2S).

Coordinators: Julien Bect (L2S) & Sidonie Lefebvre (ONERA)

Practical details: the seminar will be held online using Microsoft Teams.

If you want to attend this seminar (or any of the forthcoming online UQSay seminars), and if you do not already have access to the UQSay group on Teams, simply send an email and you will be invited. Please specify which email address the invitation must be sent to (this has to be the address associated with your Teams account).

You will find the link to the seminar on the “General” UQSay channel on Teams, approximately 15 minutes before the beginning.

The technical side of things: you can use Teams either directly from your web browser or using the “fat client”, which is available for most platforms (Windows, Linux, Mac, Android & iOS). We strongly recommend the latter option whenever possible. Please give it a try before the seminar to anticipate potential problems.