GPT-SW3 356M — Icelandic Grammar-Aligned (SAGA Δ-DPO)

Fine-tuned with SAGA (Syntax-Aware Grammar Alignment), a two-stage pipeline that trains language models to generate grammatically correct Icelandic text using reinforcement learning from a symbolic parser oracle (Greynir (Icelandic constituency parser)).

This is a fully merged model — no PEFT setup needed.

Stage 1: SFT on parser-verified Icelandic Wikipedia (base PS 72.5% < 80% threshold). Stage 2: Δ-DPO from SFT checkpoint.

Results (independent Stanza evaluation)

Metric	Base	+ Δ-DPO
Parse success	72.5%	79.0%
Parse score	0.341	0.500
PPL-Wiki	22.9	14.6

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer




model = AutoModelForCausalLM.from_pretrained('{Hodfa71/gpt-sw3-356m-is-saga-delta-dpo}')
tokenizer = AutoTokenizer.from_pretrained("{Hodfa71/gpt-sw3-356m-is-saga-delta-dpo}")

prompt = "Íslenska er"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=60, temperature=0.8, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Training details

Data: 10 000 Icelandic Wikipedia sentences (filtered for quality)
Method: Δ-DPO — generate N=8 candidates per prompt, keep pairs with parser score gap Δ ≥ 0.25, train with standard DPO loss (β=0.1)
Parser oracle: Greynir (Icelandic constituency parser)
LoRA: rank 16, α=32, all linear layers, bfloat16
Auto-SFT rule: SFT is applied first only if base parse success < 80%

Citation

@article{fakhar2025saga,
  title={SAGA: Syntax-Aware Grammar Alignment for Low-Resource Nordic Languages},
  author={Fakhar, Hoda and others},
  year={2025},
  note={Under review}
}

License

Inherits the base model license (AI Sweden LLM License / LumiOpen).

Downloads last month: -

Safetensors

Model size

0.4B params

Tensor type

BF16

Model tree for Hodfa71/gpt-sw3-356m-is-saga-delta-dpo

Base model

AI-Sweden-Models/gpt-sw3-356m

Adapter

(3)

this model

Collection including Hodfa71/gpt-sw3-356m-is-saga-delta-dpo

SAGA: Syntax-Aligned Grammar Adaptation

Collection

RLVF pipeline using parser oracles to align LMs for Icelandic and Danish. GPT-SW3 and Viking-13B trained with Delta-DPO. • 6 items • Updated 5 days ago