π LFM2-v5-RL-10k Adapter
GRPO + COMET κ°ννμ΅μΌλ‘ νλ ¨λ μν λ²μ LoRA Adapter
β οΈ μ΄ Adapterλ λ°λμ Base λͺ¨λΈκ³Ό ν¨κ» μ¬μ©ν΄μΌ ν©λλ€!
Base Model: gyung/lfm2-1.2b-koen-mt-v4-100k
π μ±λ₯ (Flores-200 Benchmark, 1012 Samples)
| Rank | Model | CHrF++ | BLEU | Params |
|---|---|---|---|---|
| 1 | Google Translate | 39.27 | 18.18 | - (API) |
| 2 | Yanolja-4B-GGUF | 38.61 | 16.03 | 4B |
| 3 | NLLB-200 (3.3B) | 35.09 | 11.68 | 3.3B |
| 4 | π LFM2-v5-RL (Adapter) | 32.96 | 12.05 | 1.2B |
| 5 | Gemma-3-4B-it-GGUF | 32.83 | 11.36 | 4B |
| 6 | NLLB-200-Distilled-600M | 31.97 | 10.32 | 600M |
| 7 | LFM2-v4-100k (Base) | 31.53 | 11.13 | 1.2B |
β 1.2B λͺ¨λΈλ‘ 4B λͺ¨λΈ(Gemma-3)μ λ₯κ°!
π§ μ¬μ©λ²
1. Adapter λ‘λ (μΆμ²)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Base λͺ¨λΈ λ‘λ
base_model = AutoModelForCausalLM.from_pretrained(
"gyung/lfm2-1.2b-koen-mt-v4-100k",
device_map="auto",
torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained("gyung/lfm2-1.2b-koen-mt-v4-100k")
# Adapter λ‘λ λ° λ³ν©
model = PeftModel.from_pretrained(base_model, "gyung/lfm2-1.2b-koen-mt-v5-rl-10k-adapter")
model = model.merge_and_unload() # μΆλ‘ μλ ν₯μ
2. λ²μ μ€ν
messages = [
{"role": "system", "content": "Translate the following text to Korean."},
{"role": "user", "content": "The quick brown fox jumps over the lazy dog."}
]
input_ids = tokenizer.apply_chat_template(
messages, return_tensors="pt", add_generation_prompt=True
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=256,
do_sample=True,
temperature=0.3,
min_p=0.15,
repetition_penalty=1.05
)
translation = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
print(translation)
# μΆλ ₯: λΉ λ₯Έ κ°μ μ¬μ°κ° κ²μΌλ₯Έ κ°λ₯Ό λ°μ΄λλλ€.
π νμ΅ μμΈ
| νλͺ© | κ° |
|---|---|
| Base Model | gyung/lfm2-1.2b-koen-mt-v4-100k |
| Method | GRPO (Group Relative Policy Optimization) |
| Reward Model | Unbabel/wmt22-comet-da |
| Dataset Size | 10,000 samples |
| Training Steps | 150 |
| Effective Batch Size | 128 (32 Γ 4 grad accum) |
| Samples Processed | ~19,200 (1.9 epochs) |
| LoRA Rank | 32 |
| LoRA Alpha | 64 |
| Target Modules | all-linear |
| Adapter Size | 89MB |
π 체ν¬ν¬μΈνΈλ³ μ±λ₯ 곑μ
| Step | Epoch | CHrF++ | BLEU | v4 λλΉ |
|---|---|---|---|---|
| 50 | 0.64 | 31.64 | 11.26 | +0.11 |
| 100 | 1.28 | 32.26 | 11.92 | +0.73 |
| 150 | 1.92 | 32.96 | 12.05 | +1.43 |
β Step μ¦κ°μ λ°λΌ μ§μμ μΌλ‘ μ±λ₯ ν₯μ β κ³Όμ ν© μμ΄ ν¨κ³Όμ μΈ RL νμ΅
π κ΄λ ¨ λ§ν¬
- Base Model: gyung/lfm2-1.2b-koen-mt-v4-100k
- Training Dataset: gyung/KoEn-Translation-Alpaca-100k
- GitHub Repository: LFM2-KoEn-Tuning
π Citation
@misc{lfm2-koen-v5-rl,
author = {gyung},
title = {LFM2-1.2B-KoEn-MT-v5-RL: GRPO-Enhanced English-Korean Translation Adapter},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/gyung/lfm2-1.2b-koen-mt-v5-rl-10k-adapter}
}
π License
This model inherits the LFM Open License v1.0 from the base LFM2 model.
- Downloads last month
- 90