Exploring the Optimization of RLHF and its Variants in Aligning Large Models with Human PreferencesZhengxiang ZhaiITM Web Conf., 78 (2025) 01038DOI: https://doi.org/10.1051/itmconf/20257801038