Updates

3/10/2026

I've uploaded new quants using the new fused Up + Gate conversion, this offers up to a +10% boost in prompt processing speed from my testing.

Description

This repo contains specialized MoE-quants for Qwen3.5-122B-A10B. The idea being that given the huge size of the FFN tensors compared to the rest of the tensors in the model, it should be possible to achieve a better quality while keeping the overall size of the entire model smaller compared to a similar naive quantization. To that end, the quantization type default is kept in high quality and the FFN UP + FFN GATE tensors are quanted down along with the FFN DOWN tensors.

Quant	Size	Mixture	PPL	1-(Mean PPL(Q)/PPL(base))	KLD
Q5_K_M	85.26 GiB (6.00 BPW)	Q8_0 / Q5_K / Q5_K / Q6_K	4.822883 ± 0.028429	+0.1056%	0.005545 ± 0.000044
Q4_K_M	71.48 GiB (5.03 BPW)	Q8_0 / Q4_K / Q4_K / Q5_K	4.830384 ± 0.028459	+0.2613%	0.010455 ± 0.000084
IQ4_XS	56.29 GiB (3.96 BPW)	Q8_0 / IQ3_S / IQ3_S / IQ4_XS	4.914250 ± 0.028952	+2.0020%	0.027787 ± 0.000206
IQ3_S	43.39 GiB (3.05 BPW)	Q8_0 / IQ2_S / IQ2_S / IQ3_S	5.126355 ± 0.030507	+6.4046%	0.074562 ± 0.000524
IQ2_XXS	31.58 GiB (2.22 BPW)	Q4_K / IQ2_XXS / IQ2_XXS / IQ2_XXS	5.727638 ± 0.035038	+18.8850%	0.185195 ± 0.001112