gemma-4-A4B-109e-it-GGUF
GGUF quantizations of ManniX-ITA/gemma-4-A4B-109e-it — an expert-pruned Gemma 4 26B-A4B (128 to 109 experts, 26B to 22.4B params).
All standard quants made using imatrix with calibration data v5.
ContribDynamic (CD) Quants
CD quants use per-layer dynamic quantization based on actual expert contribution analysis of the model. Important layers (early layers that contribute more to the residual stream) get higher precision, while less important layers get lower precision.
For a CD-Q4_K_M target:
- Layer 0 (highest importance): Q5_K precision
- Layers 1-6, 10 (medium importance): Q4_K precision
- Layers 7-29 (lower importance): Q3_K precision
- Output/embeddings: Q8_0 precision
This approach is inspired by Unsloth's UD quantization but uses our own expert contribution profiling data derived from measuring actual norms across 40 calibration prompts.
Available Quantizations
| Quantization | Size |
|---|---|
| Q8_0 | 21.65 GB |
| Q6_K_L | 18.40 GB |
| Q6_K | 18.23 GB |
| Q5_K_L | 15.59 GB |
| Q5_K_M | 15.42 GB |
| Q5_K_S | 14.51 GB |
| Q4_K_L | 13.71 GB |
| Q4_K_M | 13.54 GB |
| Q4_1 | 12.89 GB |
| Q4_K_S | 12.48 GB |
| Q4_0 | 11.67 GB |
| IQ4_NL | 11.67 GB |
| IQ4_XS | 11.25 GB |
| Q3_K_XL | 10.90 GB |
| IQ3_M | 10.03 GB |
| Q3_K_L | 11.18 GB |
| Q3_K_M | 10.74 GB |
| Q3_K_S | 9.88 GB |
| IQ3_XS | 9.41 GB |
| IQ3_XXS | 9.14 GB |
| Q2_K | 8.57 GB |
| IQ2_M | 8.39 GB |
| IQ2_S | 7.99 GB |
| IQ2_XS | 7.94 GB |
| IQ2_XXS | 7.52 GB |
| CD-Q6_K | 15.80 GB |
| CD-Q5_K_M | 13.51 GB |
| CD-Q4_K_M | 11.08 GB |
| CD-Q3_K_M | 10.37 GB |
All quants passed a 3-question sanity check (capital cities in JSON format) via llama.cpp before upload.
How to Use
Original Model
See ManniX-ITA/gemma-4-A4B-109e-it for the full model card, pruning methodology, and benchmark results (71.7% GPQA Diamond).
License
- Downloads last month
- 8,913
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for ManniX-ITA/gemma-4-A4B-109e-it-GGUF
Base model
google/gemma-4-26B-A4B-it