Focus on the Optimization of the RLHF Algorithm to Enhance the Training Effect After LLMShuyang He, Xi Yu and Xiubin ZhangITM Web Conf., 84 (2026) 03006DOI: https://doi.org/10.1051/itmconf/20268403006