Qwen3.5-397B-A17B — Gutenberg Quants

Quantizations of Qwen3.5-397B-A17B using the Gutenberg (Q_K_G) quantization strategy.

Available Quants

Quant	Size	BPW	Mean KLD	Same Top P
K_G_4.00	184.5 GiB	4.00	0.021106	92.838%
K_G_2.93	135.7 GiB	2.93	0.030123	91.177%
K_G_2.50	115.6 GiB	2.50	0.035966	90.618%
K_G_2.25	103.9 GiB	2.25	0.047857	89.360%
K_G_1.95	89.8 GiB	1.95	0.071636	86.940%

KLD and Same Top P measured against Q8_0 reference logits (8192 context, 10 chunks).

Why Gutenberg?

Standard quantization (K_M) applies uniform rules to all tensors. Gutenberg uses KLD sensitivity data to allocate precision where it should matter most, upgrading the tensors that have the highest measured impact on output quality while keeping less important tensors at the base level. Non-expert tensors are kept at Q8_0 for their disproportionate quality impact.

The result is significantly better quality than standard quants at the same model size (on paper).

Compatibility

Fully compatible with stock llama.cpp, llama-server, LM Studio, and any GGUF-compatible runtime. No custom builds required.

Downloads last month: 22,419

GGUF

Model size

396B params

Architecture

qwen35moe

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Goldkoron/Qwen3.5-397B-A17B

Base model

Qwen/Qwen3.5-397B-A17B

Quantized

(73)

this model