Avis de soutenance HDR de Richard Combes

Date : 17/09/2025

Catégorie(s) : Generals

Avis de Soutenance

Monsieur Richard COMBES

Soutiendra publiquement ses travaux d’habilitation à diriger les recherches

“Contributions aux bandits structurés”

Soutenance prévue le mercredi 17 septembre 2025 à 14h00

Lieu : Paris Saclay Plateau de Moulon 3 rue Joliot-Curie F-91192 Gif-sur-Yvette Cedex Bâtiment Eiffel
Salle : Amphi V

Jury:
Christophe Giraud, Professeur, Université Paris-Saclay
Aurélien Garivier, Professeur, ENS de Lyon
Olivier Cappé, Directeur de Recherche CNRS, ENS
Odalric-Abrym Maillard, Chargé de Recherche INRIA, SCOOL
Gilles Stoltz, Directeur de Recherche CNRS, Université Paris-Saclay

Résumé: On commence par une brève présentation des problèmes de bandit stochastiques avec et sans structure. On propose ensuite un modèle général pour les bandits stochastiques structurés qui couvre toutes les structures connues, et une borne inférieure de regret ainsi qu’une stratégie asymptotiquement optimale appelée OSSB (Optimal Sampling for Structured Bandits) qui atteint cette borne. Dans le chapitre suivant on considère les bandits stochastiques avec structure unimodale, on spécialise la borne inférieure générale du chapitre précédent, et l’on propose OSUB (Optimal Sampling for Unimodal Bandits), une stratégie asymptotiquement optimale. On s’intéresse ensuite aux bandits stochastiques avec structure Lipschitzienne, on spécialise la borne inférieure générale et l’on dérive OSLB (Optimal Sampling for Lipschitz Bandits) une stratégie asymptotiquement optimale ainsi que CKL-UCB (Combined Kullback-Leibler Upper Confidence Bound), une variante de KL-UCB (Kullback-Leibler Upper Confidence Bound). Le chapitre suivant traite des bandits stochastiques avec structure combinatoire, et observations dites semi-bandit. On propose dans un premier temps une stratégie optimiste appelée ESCB (Efficient Sampling for Combinatorial Bandits), spécifiquement construite pour exploiter la structure combinatoire et la corrélation des récompenses. On propose ensuite AESCB (Approximate Efficient Sampling for Combinatorial Bandits) une version approximée de ESCB avec un regret comparable, et l’on montre qu’il peut être implémenté en temps polynomial pour beaucoup de structures combintaoires. On propose en outre GLPG (Graves-Lai Projected Gradient), un algorithme pour calculer la borne inférieure de regret et ainsi implémenter OSSB en temps polynomial. On conclut ce chapitre en montrant que la version combinatoire de Thompson Sampling est en générale sous-optimale, avec un regret exponentiellement grand en fonction de la dimension. Enfin, on considère les bandits stochastiques avec observations dites en cascade, pour apprendre à classer des listes d’éléments. On spécialise la borne de regret générale et on propose une stratégie asymptotiquement optimale appelée PIE (Parsimonious Item Exploration).

Abstract: We first provide a quick introduction to stochastic multi-armed bandit problems with and without structure. We then propose a general model for structured stochastic bandits which includes all known structures, provide an information theoretic regret lower bound as well as an asymptotically optimal strategy that attains this bound called OSSB (Optimal Sampling for Structured Bandits). In the next chapter we consider stochastic bandits with a unimodal structure, specialize the information theoretic regret lower bound and propose an asymptotically optimal strategy called OSUB (Optimal Sampling for Unimodal Bandits) matching this bound. We then turn our attention to structured stochastic bandits with a Lipschitz structure, specialize the information theoretic regret lower bound to the Lipschitz structure and derive an asymptotically optimal strategy called OSLB (Optimal Sampling for Lipschitz Bandits), as well as another strategy called CKL-UCB (Combined Kullback-Leibler Upper Confidence Bound), a variant of KL-UCB (Kullback-Leibler Upper Confidence Bound). The next chapter focuses on structured stochastic bandits with a combinatorial structure and semi-bandit feedback. We propose and analyze an optimistic strategy called ESCB (Efficient Sampling for Combinatorial Bandits), which is tailored to exploit the combinatorial structure as well as the correlation structure of the rewards. We propose AESCB (Approximate Efficient Sampling for Combinatorial Bandits) an approximate version of ESCB with comparable regret upper bounds, and we prove that AESCB can be implemented in polynomial time for many combinatorial structures. We then propose GLPG (Graves-Lai Projected Gradient), an algorithm in order to compute the information theoretic lower bound and implement OSSB in polynomial time. We also show that for combinatorial bandits, Thompson Sampling is in general sub-optimal and its regret can scale at least exponentially with the dimension. Last but not least we consider structured stochastic bandits with cascade-like feedback, and address the problem of learning how to rank items: we specialize the information theoretic regret lower bound to this structure, and propose an asymptotically optimal strategy matching the lower bound called PIE (Parsimonious Item Exploration).