Precision discrepancy between variants
#6
by represiv - opened
Im little bit confused by the main tensor precision between variants and final model sizes:
I-Compact uses Q4_K for main tensor, model size 17.3 GB, as expected.
I-Balanced uses Q6_K for main tensor, size 25.3 GB, NOT as expected.
I-Quality uses Q6_K for main tensor, model size 22.8 GB, as expected.
Recently uploaded gemma-4-26B-A4B variants have the tensor precisions and sizes as expected: I-Compact Q4_K, I-Balanced Q5_K, I-Quality Q6_K.
So my question is why the Qwen3.5 I-Balanced variant does not use Q5_K for main tensor and why the I-Balanced model is bigger in size than I-Quality?
P.S. Thank You for the APEX variants, really good results and speed.