A Survey on Reinforcement Learning-Based Multi-Agent Path Planning

Open Access

Issue		ITM Web Conf. Volume 78, 2025 International Conference on Computer Science and Electronic Information Technology (CSEIT 2025)


Article Number		01015
Number of page(s)		10
Section		Deep Learning and Reinforcement Learning – Theories and Applications
DOI		https://doi.org/10.1051/itmconf/20257801015
Published online		08 September 2025

cloud-based multi-hop multi-robot wireless networks', IETE Tech. Rev., vol. 37, no. 1, pp. 98–107, 2020 [Google Scholar]
Liu, Z. F., Cao, L., Lai, J., et al.: 'Overview of multi-agent path finding'. Computer Engineering and Applications, 2022, 58(20): 43–62 [Google Scholar]
Stern, R., Sturtevant, N. R., Felner, A., et al.: 'Multi-agent pathfinding: Definitions, variants, and bench-marks'. In: Twelfth Annual Symposium on Combinatorial Search. 2019 [Google Scholar]
Wu, W. J., Wang, T. D., Sun, Y., et al.: 'Survey of multi-agent path finding technology'. Journal of Beijing University of Technology, 2024, 50(10): 1263–1272 [Google Scholar]
Oliveira, I. R. L., & Brandão, A. S.: 'Deep reinforcement learning for mapless robot navigation systems'. 2023 Latin American Robotics Symposium (LARS), 2023 Brazilian Symposium on Robotics (SBR), and 2023 Workshop on Robotics in Education (WRE) (pp. 331–336). Salvador, Brazil. 2023 [Google Scholar]
Shi, D. X., Peng, Y. X., Yang, H. H., et al.: 'DQN based Multi-agent Motion Planning Method with Deep Reinforcement Learning'. Computer Science, 2024, 51(2):268–277 [Google Scholar]
Sartoretti, G., Kerr, J., Shi, Y., et al.: 'Primal: Pathfinding via reinforcement and imitation multi-agent learning'. IEEE Robotics and Automation Letters, 2019, 4(3):2378–2385 [Google Scholar]
Zhang, Y. L., Li, Z. W., Xu, J. H., Jiang, Y. C., Cui, Y.: 'Multi - Agent Path Planning Based on Congestion Awareness and Caching Communication'. Computer Science. 2024 [Google Scholar]
Riviere, B., Hönig, W., Yue, Y., et al.: 'Glas: Global-to-local safe autonomy synthesis for multi-robot motion planning with end-to-end learning'. IEEE Robotics and Automation Letters, 2020, 5(3): 4249–4256. [Google Scholar]
Ye, Z. H.: 'Warehouse Robot Path Planning Based on Multi - Agent Reinforcement Learning'. Harbin Institute of Technology, 2023 [Google Scholar]
Chung, J., Fayyad, J., Younes, Y. A., et al.: 'Learning team-based navigation: a review of deep reinforcement learning techniques for multi-agent pathfinding'. Artificial Intelligence Review, 2024, 57(2): 41 [Google Scholar]
Li, Q., Gama, F., Ribeiro, A., et al.: 'Graph neural networks for decentralized multirobot path planning' // 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, USA, 2020: 11785–11792. [Google Scholar]
Laurent, F., Schneider, M., Scheller, C., et al.: 'Flatland competition 2020: Mapf and marl for efficient train coordination on a grid world'. In: NeurIPS 2020 Competition and Demonstration Track, PMLR, pp 275–301. 2021 [Google Scholar]
Liu, Z., Cao, L., Lai, J., Chen, X., Chen, Y.: 'A review of multi-intelligent body path planning'. Comput. Eng. Appl. 2022, 58, 43–62 [Google Scholar]
Yang, L., Li, P., Qian, S., Quan, H., Miao, J., Liu, M., Hu, Y., & Memetimin, E.: 'Path Planning Technique for Mobile Robots: A Review'. Machines, 11(10), 980. 2023 https://doi.org/10.3390/machines11100980 [Google Scholar]
Reijnen, Y., Zhang, W., Nuijten, C., and Goldak-Altgassen, M.: 'Combining Deep Reinforcement Learning with Search Heuristics for Solving Multi-Agent Path Finding in Segment-based Layouts', 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia, 2020, pp. 2647–2654 [Google Scholar]
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D., 'Mastering the game of go with deep neural networks and tree search', Nature, vol. 529, no. 14, pp. 484, Jan. 2016 [CrossRef] [Google Scholar]
Berner, C., et al.: 'Dota 2 with large scale deep reinforcement learning', arXiv:1912.06680. Available: https://arxiv.org/abs/1912.06680.2019 [Google Scholar]
Manchella, K., Umrawal, A. K. and Aggarwal, V., 'FlexPool: A distributed model-free deep reinforcement learning algorithm for joint passengers and goods transportation', IEEETrans. Intell. Transp. Syst., vol. 22, no. 4, pp. 2035–2047, Apr. 2021. [Google Scholar]
Honari, H., Khodaygan, S.: 'Deep reinforcement learning-based framework for constrained any-objective optimization'. J Ambient Intell Humaniz Comput pp 1–17. 2023 [Google Scholar]
Sartoretti, G. et al.: 'PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning', in IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2378–2385, 2019 [Google Scholar]
Ma, Z., Luo, Y., and Ma, H.: 'Distributed Heuristic Multi-Agent Path Finding with Communication', 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi'an, China, 14, pp. 8699 [Google Scholar]
Seoung, K. L. & Nakju, D.: 'Local path planning scheme for car-like robots' Shortest turning motion using geometric analysis. Intelligent Service Robotics, (prepublish), 1–39. 2025 [Google Scholar]
Das, S., Biswas, A., Saxena, A.: 'DCC: A Cascade-Based Approach to Detect Communities in Social Networks'. Springer, Singapore. 2024. https://doi.org/10.1007/978-981-99-6690-5_28 [Google Scholar]
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., & Wierstra, D.: 'Continuous control with deep reinforcement learning'. arXiv preprint arXiv:1509.02971. 2015 [Google Scholar]
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D. & Riedmiller, M.: 'Deterministic Policy Gradient Algorithms'. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research. 32(1):387–395. 2014 [Google Scholar]
Lowe, R., et al.: 'Multi-Agent Actor-Critic for Mixed Cooperative Competitive Environments'. Advances in Neural Information Processing Systems 30: 6379–6390. 2017 [Google Scholar]
Fujimoto, S., Hoof, H. & Meger, D.: 'Addressing Function Approximation Error in Actor-Critic Methods'. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research. 80:1587–1596. 2018 [Google Scholar]
Schulman, J., Levine, S., Abbeel, P., Jordan, M. & Moritz, P.: 'Trust Region Policy Optimization. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research. 37:1889–1897. 2015 [Google Scholar]
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O.: 'Proximal Policy Optimization Algorithms'. arXiv preprint arXiv:1707.06347. 2017 [Google Scholar]
Andrychowicz, M., Baker, B., Chociej, M., Józefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., & Zaremba, W.: 'Learning dexterous in - hand manipulation. Journal of Robotics', 39(1), 1 - 13. 2020 [Google Scholar]
Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D., Janoos, F., Rudolph, L., & Mądry, A.: 'Implementation matters in deep policy gradients: A case study on PPO and TRPO'. arXiv preprint arXiv:2005.12729. 2020 [Google Scholar]
Ziebart, B. D., Maas, A., Bagnell, J. A., & Dey, A. K.: 'Maximum Entropy Inverse Reinforcement Learning'. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 1433 - 1438. 2008 [Google Scholar]
Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S.: 'Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor'. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research. 80:1861–1870. 2018 [Google Scholar]
Yang, C., Bi, S., Xu, Y., & Zhang, X.: 'CTSAC: Curriculum-Based Transformer Soft Actor-Critic for Goal-Oriented Robot Exploration'. arXiv preprint arXiv:2503.14254. 2025 [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.