Reinforcement of Learning Theoretical Foundations and Exploration of Multi-Domain Applications

Open Access

Issue		ITM Web Conf. Volume 78, 2025 International Conference on Computer Science and Electronic Information Technology (CSEIT 2025)


Article Number		01008
Number of page(s)		10
Section		Deep Learning and Reinforcement Learning – Theories and Applications
DOI		https://doi.org/10.1051/itmconf/20257801008
Published online		08 September 2025

Bai Tian, Lu Yao, Li Chu, and He Jialiang, "Game intelligent guidance algorithm based on deep reinforcement learning," Journal of Jilin University (Science Edition), vol. 63, no. 1, pp. 91–98, 2025, doi: https://doi.org/10.13413/j.cnki.jdxblxb.2023555. [Google Scholar]
D. Silver, et al., "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, no. 7587, pp. 484–489, 2016. [CrossRef] [Google Scholar]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017. [Google Scholar]
H. Yong, J. Seo, J. Kim, M. Kim, and J. Choi, "Suspension control strategies using switched soft actor-critic models for real roads," IEEE Trans. Ind. Electron., vol. 70, no. 1, pp. 824–832, 2022. [Google Scholar]
V. Mnih, et al., "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529–533, 2015. [NASA ADS] [CrossRef] [Google Scholar]
R. Lowe, et al., "Multi-agent actor-critic for mixed cooperative-competitive environments," Adv. Neural Inf. Process. Syst., vol. 30, 2017. [Google Scholar]
J. Achiam, D. Held, A. Tamar, and P. Abbeel, "Constrained policy optimization," in Proc. Int. Conf. Mach. Learn., 2017, pp. 22–31 [Google Scholar]
V. Mnih, et al., "Playing Atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013. [Google Scholar]
D. Silver, et al., "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, no. 7587, pp. 484–489, 2016. [CrossRef] [Google Scholar]
C. Berner, et al., "Dota 2 with large-scale deep reinforcement learning," arXiv preprint arXiv:1912.06680, 2019. [Google Scholar]
O. Vinyals, et al., "Grandmaster level in StarCraft II using multi-agent reinforcement learning," Nature, vol. 575, no. 7782, pp. 350–354, 2019. [Google Scholar]
D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, "Mastering diverse domains through world models," arXiv preprint arXiv:2301.04104, 2023. [Google Scholar]
A. Dedieu, et al., "Improving Transformer World Models for Data-Efficient RL," arXiv preprint arXiv:2502.01591, 2025. [Google Scholar]
H. Lai, et al., "World Model-based Perception for Visual Legged Locomotion," arXiv preprint arXiv:2409.16784, 2024. [Google Scholar]
V. L. Heuthe, E. Panizon, H. Gu, and C. Bechinger, "Counterfactual rewards promote collective transport using individually controlled swarm microrobots," Sci. Robot., vol. 9, no. 97, eado5888, 2024. [Google Scholar]
Y. Chen and J. Xiao, "Target Search and Navigation in Heterogeneous Robot Systems with Deep Reinforcement Learning," Mach. Intell. Res., vol. 22, no. 1, pp. 79–90, 2025. [Google Scholar]
T. Miki, et al., "Learning robust perceptive locomotion for quadrupedal robots in the wild," Sci. Robot., vol. 7, no. 62, eabk2822, 2022. [Google Scholar]
Y. Hu, et al., "Planning-oriented autonomous driving," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 17853–17862 [Google Scholar]
D. Zhang, et al., "CarPlanner: Consistent Auto-regressive Trajectory Planning for Large-scale Reinforcement Learning in Autonomous Driving," arXiv preprint arXiv:2502.19908, 2025. [Google Scholar]
P. Wang, C. Y. Chan, and A. de La Fortelle, "A reinforcement learning based approach for automated lane change maneuvers," in Proc. IEEE Intell. Vehicles Symp., 2018, pp. 1379–1384 [Google Scholar]
C. Wu, et al., "Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control," arXiv preprint arXiv:1710.05465, 2017. [Google Scholar]
S. M. McKinney, et al., "International evaluation of an AI system for breast cancer screening," Nature, vol. 577, no. 7788, pp. 89–94, 2020. [Google Scholar]
X. Zhao, S. Liu, S. Y. Yang, and C. Miao, "MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot," arXiv preprint arXiv:2502.04413, 2025. [Google Scholar]
F. Ren, et al., "A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models," Nat. Biotechnol., vol. 43, no. 1, pp. 63–75, 2025. [Google Scholar]
W. Shin, S. J. Bu, and S. B. Cho, "Automatic financial trading agent for low-risk portfolio management using deep reinforcement learning," arXiv preprint arXiv:1909.03278, 2019. [Google Scholar]
A. B. Altuner and Z. H. Kilimci, "A novel deep reinforcement learning based stock direction prediction using knowledge graph and community aware sentiments," arXiv preprint arXiv:2107.00931, 2021. [Google Scholar]
S. Bajpai, "Application of deep reinforcement learning for Indian stock trading automation," arXiv preprint arXiv:2106.16088, 2021. [Google Scholar]
J. Lee, H. Koh, and H. J. Choe, "Learning to trade in financial time series using high-frequency through wavelet transformation and deep reinforcement learning," Appl. Intell., vol. 51, no. 8, pp. 6202–6223, 2021. [Google Scholar]
H. Wang and S. Yu, "Robo-advising: Enhancing investment with inverse optimization and deep reinforcement learning," in Proc. IEEE Int. Conf. Mach. Learn. Appl., 2021, pp. 365–372 [Google Scholar]
A. Tsantekidis, N. Passalis, and A. Tefas, "Diversity-driven knowledge distillation for financial trading using deep reinforcement learning," Neural Netw., vol. 140, pp. 193–202, 2021. [Google Scholar]
D. Silver, et al., "Mastering the game of Go without human knowledge," Nature, vol. 550, no. 7676, pp. 354–359, 2017. [CrossRef] [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.