Instructions to use AyoubChLin/Qwen3.5-0.8B-saudi-dialect with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AyoubChLin/Qwen3.5-0.8B-saudi-dialect with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="AyoubChLin/Qwen3.5-0.8B-saudi-dialect")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("AyoubChLin/Qwen3.5-0.8B-saudi-dialect")
model = AutoModelForImageTextToText.from_pretrained("AyoubChLin/Qwen3.5-0.8B-saudi-dialect")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AyoubChLin/Qwen3.5-0.8B-saudi-dialect with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AyoubChLin/Qwen3.5-0.8B-saudi-dialect"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AyoubChLin/Qwen3.5-0.8B-saudi-dialect",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/AyoubChLin/Qwen3.5-0.8B-saudi-dialect

SGLang

How to use AyoubChLin/Qwen3.5-0.8B-saudi-dialect with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AyoubChLin/Qwen3.5-0.8B-saudi-dialect" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AyoubChLin/Qwen3.5-0.8B-saudi-dialect",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AyoubChLin/Qwen3.5-0.8B-saudi-dialect" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AyoubChLin/Qwen3.5-0.8B-saudi-dialect",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio new

How to use AyoubChLin/Qwen3.5-0.8B-saudi-dialect with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AyoubChLin/Qwen3.5-0.8B-saudi-dialect to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AyoubChLin/Qwen3.5-0.8B-saudi-dialect to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for AyoubChLin/Qwen3.5-0.8B-saudi-dialect to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="AyoubChLin/Qwen3.5-0.8B-saudi-dialect",
    max_seq_length=2048,
)

Docker Model Runner
How to use AyoubChLin/Qwen3.5-0.8B-saudi-dialect with Docker Model Runner:
```
docker model run hf.co/AyoubChLin/Qwen3.5-0.8B-saudi-dialect
```

Qwen3.5-0.8B Saudi Dialect

This model is a Saudi dialect instruction fine-tune of unsloth/Qwen3.5-0.8B, trained on HeshamHaroon/saudi-dialect-conversations with Unsloth LoRA and merged back into a full 16-bit checkpoint.

The training data is primarily Saudi Najdi Arabic dialogue. During preprocessing, each conversation is prepended with the fixed system prompt:

أنت مساعد مفيد يتحدث باللهجة السعودية العامية.

Although the underlying Qwen3.5 architecture is multimodal, this fine-tune was trained on text-only conversations and is intended for Saudi dialect chat generation.

Model Details

Base model: unsloth/Qwen3.5-0.8B
Model type: merged 16-bit SFT checkpoint
Adapter repo: AyoubChLin/Qwen3.5-0.8B-saudi-dialect-lora
Primary language: Arabic (ar), focused on Saudi/Najdi dialect
License: Apache-2.0

Dataset

Training used HeshamHaroon/saudi-dialect-conversations:

3,545 multi-turn conversations
22,536 total turns
Average 6.4 turns per conversation
Topics span 18 categories
Complexity mix from the dataset card: 31% simple, 38% intermediate, 31% advanced

Original samples contain user and assistant turns plus metadata fields such as scenario, topic, complexity, and english_summary. For SFT, only the conversation messages were used, with the Saudi dialect system prompt inserted before applying the Qwen3.5 chat template with enable_thinking=False.

Training Setup

Framework: Unsloth + TRL SFTTrainer
GPU: single NVIDIA L4 (22.03 GB)
Precision: bf16 auto-detected
Max sequence length: 4096
LoRA: r=32, alpha=32, dropout=0
Train / eval split: 95% / 5% with seed 3407
Train examples: 3,366
Eval examples: 179
Per-device batch size: 24
Gradient accumulation: 8
Effective batch size: 192
Epochs: 4
Learning rate: 4e-4
Warmup steps: 5
Optimizer: adamw_8bit
QLoRA / 4-bit loading: disabled

Training Results

From the training notebook and the exported report plots:

Total training steps: 72
Final logged train/loss: 1.85
Logged eval/loss at step 50: 1.94
Training runtime: 2466.3s (41.11 min)
Train samples / second: 5.459
Peak reserved GPU memory: 20.59 GB
Peak memory usage: 93.45% of available GPU memory

train_loss comes directly from the notebook's trainer.train() output. The train/loss and eval/loss values are read from the PDF-exported tracking plots, so they are approximate.

Intended Use

This checkpoint is intended for:

Saudi dialect chat assistants
Arabic conversational prototyping
Continued instruction tuning or adapter-based specialization
Dialect-focused evaluation and research

Limitations

The fine-tuning dataset is relatively small, so performance may drop outside Saudi/Najdi conversational settings.
This is a text-only SFT on top of a multimodal base model; image and video capabilities were not the focus of training.
The model was tuned for direct replies with enable_thinking=False, so it is not optimized for long chain-of-thought style outputs.
The model has not been formally benchmarked for safety, factuality, or domain-specific compliance.

Usage

Transformers

This checkpoint is exposed by Transformers as a Qwen3.5 conditional-generation model, so the safest Hugging Face loading path is AutoProcessor + AutoModelForImageTextToText.

from transformers import AutoModelForImageTextToText, AutoProcessor

repo_id = "AyoubChLin/Qwen3.5-0.8B-saudi-dialect"

processor = AutoProcessor.from_pretrained(repo_id)
model = AutoModelForImageTextToText.from_pretrained(
    repo_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "system", "content": "أنت مساعد مفيد يتحدث باللهجة السعودية العامية."},
    {"role": "user", "content": "كيف حالك اليوم؟"},
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)

inputs = processor(
    text=[text],
    return_tensors="pt",
).to(model.device)

output_ids = model.generate(
    **inputs,
    max_new_tokens=200,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

response = processor.batch_decode(
    output_ids[:, inputs.input_ids.shape[-1]:],
    skip_special_tokens=True,
)[0]

print(response)

Unsloth

Install

%%capture
import re, torch

v = re.match(r"[\d]{1,}\.[\d]{1,}", str(torch.__version__)).group(0)
xformers = "xformers==" + {
    "2.10": "0.0.34",
    "2.9": "0.0.33.post1",
    "2.8": "0.0.32.post2",
}.get(v, "0.0.34")

!pip install sentencepiece protobuf "datasets>=2.18.0" "huggingface_hub>=0.34.0" hf_transfer
!pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth
!pip install -q "transformers>=5.0.0"
!pip install -q --no-deps "trl>=0.15.0"

Run

from unsloth import FastLanguageModel

repo_id = "AyoubChLin/Qwen3.5-0.8B-saudi-dialect"
max_seq_length = 4096

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=repo_id,
    max_seq_length=max_seq_length,
    load_in_4bit=False,  # this repo was pushed as merged_16bit
)

FastLanguageModel.for_inference(model)

messages = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": "أنت مساعد مفيد يتحدث باللهجة السعودية العامية."}
        ],
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "كيف حالك اليوم؟"}
        ],
    },
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    enable_thinking=False,
    return_tensors="pt",
).to(model.device)

output_ids = model.generate(
    input_ids=input_ids,
    max_new_tokens=200,
    use_cache=True,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

response = tokenizer.decode(
    output_ids[0][input_ids.shape[-1]:],
    skip_special_tokens=True,
)
print(response)