Open Access
Issue
ITM Web Conf.
Volume 78, 2025
International Conference on Computer Science and Electronic Information Technology (CSEIT 2025)
Article Number 01009
Number of page(s) 8
Section Deep Learning and Reinforcement Learning – Theories and Applications
DOI https://doi.org/10.1051/itmconf/20257801009
Published online 08 September 2025
  1. M.A. Wiering and M. Van Otterlo, 'Reinforcement learning', Adapt. Learn. Optim., vol. 12, no. 3, pp. 729, 2012. [Google Scholar]
  2. J. Garcia and F. Fernández, 'Safe exploration of state and action spaces in reinforcement learning', J. Artif. Intell. Res., vol. 45, pp. 515–564, 2012. [Google Scholar]
  3. V. François-Lavet, P. Henderson, R. Islam, et al., 'An introduction to deep reinforcement learning', Found. Trends Mach. Learn., vol. 11, no. 3-4, pp. 219–354, 2018. [Google Scholar]
  4. T. Zhang and H. Mo, 'Reinforcement learning for robot research: A comprehensive review and open issues', Int. J. Adv. Robot. Syst., vol. 18, no. 3, pp. 1–20, 2021. [Google Scholar]
  5. V. Mnih, K. Kavukcuoglu, D. Silver, et al., 'Human-level control through deep reinforcement learning', Nature, vol. 518, no. 7540, pp. 529–533, 2015. [NASA ADS] [CrossRef] [Google Scholar]
  6. T.P. Lillicrap, J.J. Hunt, A. Pritzel, et al., 'Continuous control with deep reinforcement learning', arXiv preprint, arXiv:1509.02971, 2015. [Google Scholar]
  7. J. Schulman, F. Wolski, P. Dhariwal, et al., 'Proximal policy optimization algorithms', arXiv preprint, arXiv:1707.06347, 2017. [Google Scholar]
  8. A. Awasthi, 'Evaluating reinforcement learning algorithms for LunarLander-v2: A comparative analysis of DQN, DDQN, DDPG, and PPO', 2025. [Google Scholar]
  9. S. Zhu, S. Liu, S. Feng, et al., 'An optimization method for the inverted pendulum problem based on deep reinforcement learning', in Proc. J. Phys.: Conf. Ser., 2022, vol. 2296, no. 1, pp. 012008. [Google Scholar]
  10. Y. Peng, G. Chen, M. Zhang, et al., 'Proximal evolutionary strategy: Improving deep reinforcement learning through evolutionary policy optimization', Memetic Comput., vol. 16, no. 3, pp. 445–466, 2024. [Google Scholar]
  11. Y. Wang, H. He, and X. Tan, 'Truly proximal policy optimization', in Proc. Uncertainty Artif. Intell., 2020, pp. 113–122 [Google Scholar]
  12. Z. Fan, R. Su, W. Zhang, et al., 'Hybrid actor-critic reinforcement learning in parameterized action space', arXiv preprint, arXiv:1903.01344, 2019. [Google Scholar]
  13. M. Sewak and M. Sewak, 'Deterministic policy gradient and the DDPG: Deterministic-policy-gradient-based approaches', in Deep Reinforcement Learning: Frontiers of Artificial Intelligence, Springer, 2019, pp. 173–184 [Google Scholar]
  14. E.H.H. Sumiea, S.J. Abdulkadir, M.G. Ragab, et al., 'Enhanced deep deterministic policy gradient algorithm using grey wolf optimizer for continuous control tasks', IEEE Access, vol. 11, pp. 139771–139784, 2023. [Google Scholar]
  15. J. Zhu, F. Wu, and J. Zhao, 'An overview of the action space for deep reinforcement learning', in Proc. Int. Conf. Algorithms, Comput. Artif. Intell., 2021, pp. 1–10 [Google Scholar]
  16. U. Demircioğlu, H. Bakir, and R. Bakir, 'An investigation of pendulum control using reinforcement learning: Comparison of different agents', 2024. [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.