gemma-4-26B-A4B-it-W4A16

This repository contains a llm-compressor GPTQ export of google/gemma-4-26B-A4B-it using 4-bit weight-only quantization with bf16 activations.

What This Is

  • Base model: google/gemma-4-26B-A4B-it
  • Export format: compressed-tensors / pack-quantized
  • Quantization style: W4A16 GPTQ
  • Weight quantization: 4-bit signed integer, symmetric, grouped
  • Weight group size: 64
  • Activation dtype at runtime: bfloat16
  • Modalities: text and image

This checkpoint uses group_size=64 intentionally. Gemma 4 26B A4B contains MoE down_proj widths such as 704 and 2112, which are not divisible by 128, so a default W4A16 G128 export is not valid for these layers.

Calibration Setup

  • Text calibration baseline: HuggingFaceH4/ultrachat_200k
  • Image calibration baseline: lmms-lab/flickr30k
  • Calibration mode: mixed text/image

vLLM Serving

Serve Command (Works after "[Gemma4] Support quantized MoE" commit (3aecdf08b4a896a92e2cbd11c3d5a83d3c09abc1))

vllm serve dhruvil237/gemma-4-26B-A4B-it-W4A16 \
  --gpu-memory-utilization 0.8 \
  --reasoning-parser gemma4 \
  --dtype float16
Downloads last month
2,166
Safetensors
Model size
26B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for dhruvil237/gemma-4-26B-A4B-it-W4A16

Quantized
(122)
this model