Research on Optimization of Cryptocurrency Trading Strategies Based on Reinforcement Learning – combining traditional machine learning and deep reinforcement learning methods

Ruijie Huang

doi:10.1051/itmconf/20257801001

Open Access

Issue		ITM Web Conf. Volume 78, 2025 International Conference on Computer Science and Electronic Information Technology (CSEIT 2025)


Article Number		01001
Number of page(s)		13
Section		Deep Learning and Reinforcement Learning – Theories and Applications
DOI		https://doi.org/10.1051/itmconf/20257801001
Published online		08 September 2025

ITM Web of Conferences 78, 01001 (2025)

Research on Optimization of Cryptocurrency Trading Strategies Based on Reinforcement Learning – combining traditional machine learning and deep reinforcement learning methods

Ruijie Huang

Faculty of Science and Technology, Beijing Normal University-Hong Kong Baptist University United International College, Zhuhai, Guangdong, China

This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract

The cryptocurrency market, with its 24/7 trading and high volatility, challenges traditional quantitative strategies in path dependency, high-frequency optimization, and risk control. A "prediction-decision" framework is proposed, integrating Gradient Boosting Regression Trees (GBRT) for short-term forecasting and deep reinforcement learning techniques including Rainbow Deep Q-Network (Rainbow DQN) and Soft Actor-Critic (SAC) algorithms for dynamic optimization. The framework combines the complementary strengths of GBRT's pattern recognition capabilities and deep reinforcement learning's adaptive decision-making mechanisms. A spatiotemporal experience replay mechanism tailored to cryptocurrency fat-tailed distributions boosts BTC/USDT annual returns by 37.2% and enhances TD3's drawdown control by 63% during the LUNA crisis. Empirical results show SAC achieves 152% excess returns (Sharpe 2.81) in ETH/USDT trading, while Rainbow DQN yields 287% returns in trend markets. A dynamic reward function reduces maximum drawdown from 42.7% to 19.3%, and the hybrid architecture curtails losses by 23.8% during the 2020 "March 12" crash. Curriculum learning accelerates TD3 convergence by 59% with 37% GPU memory reduction. This study establishes a verifiable algorithmic framework and advances high-frequency trading through synergistic ML/RL integration, offering a dynamic decision-making paradigm for evolving financial markets.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.