gemma-4-A4B-109e-it-GGUF

GGUF quantizations of ManniX-ITA/gemma-4-A4B-109e-it — an expert-pruned Gemma 4 26B-A4B (128 to 109 experts, 26B to 22.4B params).

All standard quants made using imatrix with calibration data v5.

ContribDynamic (CD) Quants

CD quants use per-layer dynamic quantization based on actual expert contribution analysis of the model. Important layers (early layers that contribute more to the residual stream) get higher precision, while less important layers get lower precision.

For a CD-Q4_K_M target:

  • Layer 0 (highest importance): Q5_K precision
  • Layers 1-6, 10 (medium importance): Q4_K precision
  • Layers 7-29 (lower importance): Q3_K precision
  • Output/embeddings: Q8_0 precision

This approach is inspired by Unsloth's UD quantization but uses our own expert contribution profiling data derived from measuring actual norms across 40 calibration prompts.

Available Quantizations

Quantization Size
Q8_0 21.65 GB
Q6_K_L 18.40 GB
Q6_K 18.23 GB
Q5_K_L 15.59 GB
Q5_K_M 15.42 GB
Q5_K_S 14.51 GB
Q4_K_L 13.71 GB
Q4_K_M 13.54 GB
Q4_1 12.89 GB
Q4_K_S 12.48 GB
Q4_0 11.67 GB
IQ4_NL 11.67 GB
IQ4_XS 11.25 GB
Q3_K_XL 10.90 GB
IQ3_M 10.03 GB
Q3_K_L 11.18 GB
Q3_K_M 10.74 GB
Q3_K_S 9.88 GB
IQ3_XS 9.41 GB
IQ3_XXS 9.14 GB
Q2_K 8.57 GB
IQ2_M 8.39 GB
IQ2_S 7.99 GB
IQ2_XS 7.94 GB
IQ2_XXS 7.52 GB
CD-Q6_K 15.80 GB
CD-Q5_K_M 13.51 GB
CD-Q4_K_M 11.08 GB
CD-Q3_K_M 10.37 GB

All quants passed a 3-question sanity check (capital cities in JSON format) via llama.cpp before upload.

How to Use

Original Model

See ManniX-ITA/gemma-4-A4B-109e-it for the full model card, pruning methodology, and benchmark results (71.7% GPQA Diamond).

License

Gemma license

Downloads last month
8,913
GGUF
Model size
22B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ManniX-ITA/gemma-4-A4B-109e-it-GGUF

Quantized
(1)
this model