xw1234gan/cnk12_GRPO_KL_Qwen2.5-7B-Instruct_beta0_lr1e-05_mb2_ga128_n2048_seed42_NoKL Text Generation • 8B • Updated 29 days ago • 24
xw1234gan/cnk12_GRPO_KL_Qwen2.5-3B-Instruct_beta0_lr1e-05_mb2_ga128_n2048_seed42_NoKL Text Generation • 3B • Updated 30 days ago • 18
xw1234gan/cnk12_GRPO_KL_Qwen2.5-1.5B-Instruct_beta0_lr1e-05_mb2_ga128_n2048_seed42_NoKL Text Generation • 2B • Updated about 1 month ago • 3
xw1234gan/GRPO_KL_Qwen2.5-7B-Instruct_MATH_beta0_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN_NoKL Text Generation • 8B • Updated May 29 • 2
xw1234gan/Extended_GRPO_KL_Qwen2.5-3B-Instruct_MATH_beta0_lr1e-05_mb2_ga128_n2048_seed42_NoKL Text Generation • 3B • Updated May 28 • 1
xw1234gan/Extended_GRPO_KL_Qwen2.5-1.5B-Instruct_MATH_beta0_lr1e-05_mb2_ga128_n2048_seed42_NoKL Text Generation • 2B • Updated May 27 • 2
xw1234gan/Merging_Qwen2.5-7B-Instruct_MMLU_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 8B • Updated May 9 • 2
xw1234gan/Fixed_Merging_Qwen2.5-7B-Instruct_MMLU_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 8B • Updated May 8 • 2
xw1234gan/GRPO_KL_Qwen2.5-7B-Instruct_MMLU_beta0.01_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 8B • Updated May 7 • 2