AIPlans

Model Description

This model is an IPO (Identity Preference Optimization) fine-tune of Qwen/Qwen3-0.6B-Base.

It was aligned using the HelpSteer2 dataset to improve helpfulness and instruction following capabilities.

Unlike PPO or ReMax, IPO formulates the alignment problem as a regression task. It minimizes a regularized squared error loss on the preferences, providing a stable and theoretically grounded approach to alignment without needing a separate reward model or complex sampling loops during training.

The goal of the fine-tuning was to improve helpfulness/harmlessness behavior as measured by the HelpSteer2 dataset, while also enabling controlled model diffing experiments as part of the AIPlans research workflow.

Developed by: AIPlans
Funded by: AIPlans
Shared by: AIPlans
Model type: Causal decoder-only Transformer (LLM)
Languages: English
Intended Use: Research on model diffing, preference fine-tuning, evaluation of lightweight LLM behavior changes

πŸ“Š Evaluation

Below is a comparison between the base model and this IPO-trained version.

Task Metric Base Model IPO Model Change
arc_challenge acc_norm 0.3848 0.3968 +0.0119
arc_easy acc_norm 0.5783 0.6700 +0.0918
hellaswag acc_norm 0.5379 0.5540 +0.0160
truthfulqa_mc2 acc 0.4586 0.4576 -0.0010
winogrande acc 0.5896 0.5935 +0.0039

βš™οΈ Training Details

  • Method: IPO (Identity Preference Optimization)
  • Base Model: Qwen/Qwen3-0.6B-Base
  • SFT Model Used: AIPlans/Qwen3-0.6b-SFT-hs2
  • Precision: bfloat16 (Training), bfloat16 (Final Weights)
  • Learning Rate: 5e-7
  • Beta: 0.01
  • Epochs: 3
  • Batch Size: 8 (Global Effective: 16)
  • Hardware: NVIDIA A100 (80GB) (Took 1 hr 25 mins and acquired 78.4 GB VRAM)

Algorithm Highlights

  • Direct Optimization: Optimizes preferences directly without a reward model loop.
  • Stability: Uses a squared error loss which is bounded and often more stable than DPO's sigmoid loss.
  • Regularization: Uses a beta of 0.01 to balance preference satisfaction with reference model divergence.

πŸ’» Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "AIPlans/Qwen3-0.6B-IPO"

model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto", 
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

prompt = "User: How do I make a cake?\n\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens=128, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Card Author

Premanand Jena - AIPlans Research Intern, Contact : [email protected]

Citation

IPO Paper : https://arxiv.org/abs/2310.12036

TRL:

@misc{vonwerra2022trl,
    title        = {TRL: Transformer Reinforcement Learning},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}
Downloads last month
8
Safetensors
Model size
0.6B params
Tensor type
BF16
Β·
Video Preview
loading

Model tree for AIPlans/Qwen3-0.6B-IPO

Finetuned
(486)
this model

Datasets used to train AIPlans/Qwen3-0.6B-IPO

Collection including AIPlans/Qwen3-0.6B-IPO