Update README.md
Browse files
README.md
CHANGED
|
@@ -22,6 +22,8 @@ pipeline_tag: text-generation
|
|
| 22 |
>
|
| 23 |
> While preserving the original reasoning order of **Gemma 4** as much as possible, we conducted targeted refinements for answer quality, structure, clarity, and consistency.
|
| 24 |
>
|
|
|
|
|
|
|
| 25 |
> **🍎 Therefore, My fine-tuning strategy chose not to follow other teams in aggressive direct distillation from Claude. Instead, we opted for a more conservative and controllable path.**
|
| 26 |
|
| 27 |
## 🎯 Development Motivation & Industry Insights
|
|
@@ -29,7 +31,7 @@ pipeline_tag: text-generation
|
|
| 29 |
**Gemopus-4-31B-it** is a supervised fine-tune version based on the Gemma 4 31B Instruction model.
|
| 30 |
|
| 31 |
* Although this model has "Opus" in its name, it is more of a continuation of the naming convention.
|
| 32 |
-
*
|
| 33 |
|
| 34 |
---
|
| 35 |
|
|
|
|
| 22 |
>
|
| 23 |
> While preserving the original reasoning order of **Gemma 4** as much as possible, we conducted targeted refinements for answer quality, structure, clarity, and consistency.
|
| 24 |
>
|
| 25 |
+
> This model was trained in a post-fix **Unsloth** environment, after Unsloth's official gradient-accumulation and loss-accounting fixes for Gemma-family training. In practice, I used a bug-fixed stack aligned with `unsloth_zoo>=2026.4.6` and `transformers==5.5.0`, in order to avoid misleading loss inflation under gradient accumulation and to obtain more reliable optimization behavior for **Gemma 4 31B** fine-tuning.
|
| 26 |
+
>
|
| 27 |
> **🍎 Therefore, My fine-tuning strategy chose not to follow other teams in aggressive direct distillation from Claude. Instead, we opted for a more conservative and controllable path.**
|
| 28 |
|
| 29 |
## 🎯 Development Motivation & Industry Insights
|
|
|
|
| 31 |
**Gemopus-4-31B-it** is a supervised fine-tune version based on the Gemma 4 31B Instruction model.
|
| 32 |
|
| 33 |
* Although this model has "Opus" in its name, it is more of a continuation of the naming convention.
|
| 34 |
+
* The goal here is not to deny that reasoning SFT can generalize under the right conditions, but to avoid naive or superstitious replication of **"Claude-style chain of thought (CoT)"** from public distillation corpora. Recent evidence suggests that whether reasoning supervision transfers depends on optimization, data quality, and model capability. In practice, many publicly available reasoning traces still do not necessarily reflect the teacher model's true, faithful, and transferable internal process; they are often closer to polished summaries than genuinely connected reasoning. A series of recent studies have also shown that models can exhibit post-hoc rationalization in natural settings, and that CoT faithfulness varies substantially across model families and training regimes. In other words, text that merely **looks** like reasoning is not automatically a high-quality, transferable supervision signal for reasoning.
|
| 35 |
|
| 36 |
---
|
| 37 |
|