AIPlans
/

Qwen3-0.6B-IPO

Reinforcement Learning

text-generation

text-generation-inference

Model card Files Files and versions

sorakritt commited on 7 days ago

Commit

72bda53

·

verified ·

1 Parent(s): d51430c

Update README.md

Files changed (1) hide show

README.md +0 -1

README.md CHANGED Viewed

@@ -53,7 +53,6 @@ Below is a comparison between the base model and this IPO-trained version.
 - **Base Model:** Qwen/Qwen3-0.6B-Base
 - **SFT Model Used:** [AIPlans/Qwen3-0.6b-SFT-hs2](https://huggingface.co/AIPlans/Qwen3-0.6b-SFT-hs2)
 - **Precision:** bfloat16 (Training), bfloat16 (Final Weights)
-- **Optimizer:** AdamW
 - **Learning Rate:** 5e-7
 - **Beta:** 0.01
 - **Epochs:** 3

 - **Base Model:** Qwen/Qwen3-0.6B-Base
 - **SFT Model Used:** [AIPlans/Qwen3-0.6b-SFT-hs2](https://huggingface.co/AIPlans/Qwen3-0.6b-SFT-hs2)
 - **Precision:** bfloat16 (Training), bfloat16 (Final Weights)
 - **Learning Rate:** 5e-7
 - **Beta:** 0.01
 - **Epochs:** 3