| Issue |
ITM Web Conf.
Volume 78, 2025
International Conference on Computer Science and Electronic Information Technology (CSEIT 2025)
|
|
|---|---|---|
| Article Number | 01029 | |
| Number of page(s) | 11 | |
| Section | Deep Learning and Reinforcement Learning – Theories and Applications | |
| DOI | https://doi.org/10.1051/itmconf/20257801029 | |
| Published online | 08 September 2025 | |
Application of Dynamic Multi-Armed Bandit in Real-Time System: Comparative Study of Thompson Sampling and Ucb Algorithm
School of Data Science, Capital University of Economics and Business, Beijing, 100070, China
This study compares the performance of Thompson Sampling (TS) and Upper Confidence Bound (UCB) algorithms under the framework of dynamic multi-arm slot machines. In the experiments of three environments: static, gradient and mutation, the cumulative regret rate of the adaptive Thompson sampling in the mutation environment (286.6) was significantly lower than that of the standard UCB (376.5) and the standard Thompson sampling (346.8), with the performance improved by 24% and 17.4% respectively, and the average reward was 0.71. The cumulative regret rate of the hybrid algorithm in the three environments (264.2) is close to that of the adaptive Thompson sampling (264.0), and its robustness is outstanding. The dynamic environment significantly affects the algorithm differences. The gap between the optimal and worst algorithms in the static environment is 25.7, and the gap in the sudden change environment expands to 89.9. The adaptive mechanism dynamically adjusts the response fluctuations of posterior parameters, and the hybrid algorithm balances exploration and utilization, which has significant application value in video streams and recommendation systems. This research provides a quantitative basis for the selection of dynamic decision-making algorithms and reveals the applicable boundaries of adaptive strategies and hybrid strategies.
© The Authors, published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.

