πŸš€ LFM2-v5-RL-10k Adapter

GRPO + COMET κ°•ν™”ν•™μŠ΅μœΌλ‘œ ν›ˆλ ¨λœ μ˜ν•œ λ²ˆμ—­ LoRA Adapter

⚠️ 이 AdapterλŠ” λ°˜λ“œμ‹œ Base λͺ¨λΈκ³Ό ν•¨κ»˜ μ‚¬μš©ν•΄μ•Ό ν•©λ‹ˆλ‹€!

Base Model: gyung/lfm2-1.2b-koen-mt-v4-100k

πŸ“Š μ„±λŠ₯ (Flores-200 Benchmark, 1012 Samples)

Rank Model CHrF++ BLEU Params
1 Google Translate 39.27 18.18 - (API)
2 Yanolja-4B-GGUF 38.61 16.03 4B
3 NLLB-200 (3.3B) 35.09 11.68 3.3B
4 πŸ†• LFM2-v5-RL (Adapter) 32.96 12.05 1.2B
5 Gemma-3-4B-it-GGUF 32.83 11.36 4B
6 NLLB-200-Distilled-600M 31.97 10.32 600M
7 LFM2-v4-100k (Base) 31.53 11.13 1.2B

βœ… 1.2B λͺ¨λΈλ‘œ 4B λͺ¨λΈ(Gemma-3)을 λŠ₯κ°€!

πŸ”§ μ‚¬μš©λ²•

1. Adapter λ‘œλ“œ (μΆ”μ²œ)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Base λͺ¨λΈ λ‘œλ“œ
base_model = AutoModelForCausalLM.from_pretrained(
    "gyung/lfm2-1.2b-koen-mt-v4-100k",
    device_map="auto",
    torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained("gyung/lfm2-1.2b-koen-mt-v4-100k")

# Adapter λ‘œλ“œ 및 병합
model = PeftModel.from_pretrained(base_model, "gyung/lfm2-1.2b-koen-mt-v5-rl-10k-adapter")
model = model.merge_and_unload()  # μΆ”λ‘  속도 ν–₯상

2. λ²ˆμ—­ μ‹€ν–‰

messages = [
    {"role": "system", "content": "Translate the following text to Korean."},
    {"role": "user", "content": "The quick brown fox jumps over the lazy dog."}
]

input_ids = tokenizer.apply_chat_template(
    messages, return_tensors="pt", add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.3,
    min_p=0.15,
    repetition_penalty=1.05
)

translation = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
print(translation)
# 좜λ ₯: λΉ λ₯Έ κ°ˆμƒ‰ μ—¬μš°κ°€ 게으λ₯Έ 개λ₯Ό λ›°μ–΄λ„˜λŠ”λ‹€.

πŸ“ˆ ν•™μŠ΅ 상세

ν•­λͺ© κ°’
Base Model gyung/lfm2-1.2b-koen-mt-v4-100k
Method GRPO (Group Relative Policy Optimization)
Reward Model Unbabel/wmt22-comet-da
Dataset Size 10,000 samples
Training Steps 150
Effective Batch Size 128 (32 Γ— 4 grad accum)
Samples Processed ~19,200 (1.9 epochs)
LoRA Rank 32
LoRA Alpha 64
Target Modules all-linear
Adapter Size 89MB

πŸ“‰ μ²΄ν¬ν¬μΈνŠΈλ³„ μ„±λŠ₯ 곑선

Step Epoch CHrF++ BLEU v4 λŒ€λΉ„
50 0.64 31.64 11.26 +0.11
100 1.28 32.26 11.92 +0.73
150 1.92 32.96 12.05 +1.43

βœ… Step 증가에 따라 μ§€μ†μ μœΌλ‘œ μ„±λŠ₯ ν–₯상 β†’ 과적합 없이 효과적인 RL ν•™μŠ΅

πŸ”— κ΄€λ ¨ 링크

πŸ“ Citation

@misc{lfm2-koen-v5-rl,
  author = {gyung},
  title = {LFM2-1.2B-KoEn-MT-v5-RL: GRPO-Enhanced English-Korean Translation Adapter},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/gyung/lfm2-1.2b-koen-mt-v5-rl-10k-adapter}
}

πŸ“„ License

This model inherits the LFM Open License v1.0 from the base LFM2 model.

Downloads last month
90
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for gyung/lfm2-1.2b-koen-mt-v5-rl-10k-adapter

Base model

LiquidAI/LFM2-1.2B
Adapter
(1)
this model