Qwen3.5-27B — BitsAndBytes NF4
4-bit quantized version of Qwen/Qwen3.5-27B using BitsAndBytes NF4 with double quantization. Full vision-language model — includes the vision encoder.
Details
| Base model | Qwen/Qwen3.5-27B |
| Quantization | BitsAndBytes NF4 (double quant) |
| Compute dtype | bfloat16 |
| Checkpoint size | ~16.7 GB |
| Model class | Qwen3_5ForConditionalGeneration |
| VRAM required | ~17 GB (fits on RTX 4090 24GB) |
Usage
import torch
from transformers import Qwen3_5ForConditionalGeneration, AutoProcessor
model = Qwen3_5ForConditionalGeneration.from_pretrained(
"skkwowee/Qwen3.5-27B-bnb-4bit",
device_map="auto",
torch_dtype=torch.bfloat16,
)
processor = AutoProcessor.from_pretrained("skkwowee/Qwen3.5-27B-bnb-4bit")
Quantization
Quantized on an NVIDIA H200 SXM (140GB) using transformers 5.2 and bitsandbytes 0.49.
from transformers import Qwen3_5ForConditionalGeneration, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
model = Qwen3_5ForConditionalGeneration.from_pretrained(
"Qwen/Qwen3.5-27B",
quantization_config=bnb_config,
device_map="auto",
torch_dtype=torch.bfloat16,
)
- Downloads last month
- 211
Model tree for skkwowee/Qwen3.5-27B-bnb-4bit
Base model
Qwen/Qwen3.5-27B