Neural Network-Based Parameter Tuning for Multi-Armed Bandit Algorithms

Yuhan Shi

doi:10.1051/itmconf/20257801036

Open Access

Issue		ITM Web Conf. Volume 78, 2025 International Conference on Computer Science and Electronic Information Technology (CSEIT 2025)


Article Number		01036
Number of page(s)		12
Section		Deep Learning and Reinforcement Learning – Theories and Applications
DOI		https://doi.org/10.1051/itmconf/20257801036
Published online		08 September 2025

ITM Web of Conferences 78, 01036 (2025)

Neural Network-Based Parameter Tuning for Multi-Armed Bandit Algorithms

Yuhan Shi

Department of Computer Science, University College London, London, WC1E6BT, United Kingdom

This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract

This paper presents a novel approach for dynamically tuning the exploration parameter in Multi-Armed Bandit (MAB) algorithms using Deep Q-Networks (DQN), focusing on enhancing performance in static and dynamic environments. Traditional MAB algorithms such as Upper Confidence Bound (UCB) and Thompson Sampling (TS) rely on fixed exploration parameters and assume stationary reward distributions, limiting their effectiveness in real-world applications where reward distributions can be dynamic. This paper proposes a learning-based method where a DQN agent observes the state of the MAB environment and selects an appropriate exploration parameter from a predefined set to address this problem. Experimental results show that the DQN-enhanced UCB algorithm consistently outperforms its traditional counterpart in both static and dynamic environments by achieving lower cumulative regret. In contrast, DQN-tuned TS moderately improves dynamic settings but exhibits instability in static environments. These findings highlight the potential of integrating neural network-based learning with classical decision-making strategies to enable adaptive exploration in non-stationary environments, offering valuable insights for recommender systems and other sequential decision-making tasks.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.