gemma-4-26B-A4B-it-W4A16
This repository contains a llm-compressor GPTQ export of google/gemma-4-26B-A4B-it using 4-bit weight-only quantization with bf16 activations.
What This Is
- Base model:
google/gemma-4-26B-A4B-it - Export format:
compressed-tensors/pack-quantized - Quantization style: W4A16 GPTQ
- Weight quantization: 4-bit signed integer, symmetric, grouped
- Weight group size:
64 - Activation dtype at runtime:
bfloat16 - Modalities: text and image
This checkpoint uses group_size=64 intentionally. Gemma 4 26B A4B contains MoE down_proj widths such as 704 and 2112, which are not divisible by 128, so a default W4A16 G128 export is not valid for these layers.
Calibration Setup
- Text calibration baseline:
HuggingFaceH4/ultrachat_200k - Image calibration baseline:
lmms-lab/flickr30k - Calibration mode: mixed text/image
vLLM Serving
Serve Command (Works after "[Gemma4] Support quantized MoE" commit (3aecdf08b4a896a92e2cbd11c3d5a83d3c09abc1))
vllm serve dhruvil237/gemma-4-26B-A4B-it-W4A16 \
--gpu-memory-utilization 0.8 \
--reasoning-parser gemma4 \
--dtype float16
- Downloads last month
- 2,166
Model tree for dhruvil237/gemma-4-26B-A4B-it-W4A16
Base model
google/gemma-4-26B-A4B-it