Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

Jourdan, Marc and Mutny, Mojmir and Kirschner, Johannes and Krause, Andreas, Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback.”, 2021. Proceedings of the 32nd International Conference on Algorithmic Learning Theory (ALT)

Abstract: Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.

Full text here