Minimax Normal Two-Armed Bandit with Indefinite Control Horizon
Yaroslav-the-Wise Novgorod State University, B.St-Petersburgskaya Str, 41, Velikiy Novgorod, Russia, 173003
* e-mail: Alexander.Kolnogorov@novsu.ru
We consider the two-armed bandit problem as applied to data processing if there are two alternative processing methods available with different a priori unknown efficiencies. On should determine the most effective method and provide its dominating application. The total number of data, which is interpreted as a control horizon, is assumed to have a priori known probability distribution.
The problem is considered in minimax (robust) setting. According to the main theorem of the theory of games minimax risk and minimax strategy are sought for as Bayesian ones corresponding to the worst-case prior distribution. We describe the properties of the worst-case prior and present a recursive Bellman-type equation for determination of both minimax strategy and minimax risk. Numerical results illustrating the proposed algorithm are given. The algorithm can be applied to optimization of parallel data processing if the number of processed data is not definitely known in advance.
© The Authors, published by EDP Sciences, 2017
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.