Any possibilty to Re-Quantize GLM-5 quants?
I notice GLM-5.1 Quants are a bit smaller than GLM-5 on the same range of quantization:
GLM-5
UD-IQ2_XXS - 241GB
UD-IQ2_M - 255GB
GLM-5.1
UD-IQ2_XXS - 221GB (-20GB)
UD-IQ2_M - 236GB (-19GB)
I was wondering if it would be possible to re-quantize more efficiently GLM-5 IQ2_XSS and IQ2_M quants with the techniques applied on GLM-5.1 so I can have a little bit more of quality on GLM-5 for the same RAM usage (I know GLM-5.1 is out but GLM-5 works better for my personal use cases and I like more the writing style of GLM-5). I would love to run GLM-5 at IQ2_M quantization for less RAM usage like in GLM-5.1.
Thank you for the amazing work.
I notice GLM-5.1 Quants are a bit smaller than GLM-5 on the same range of quantization:
GLM-5
UD-IQ2_XXS - 241GB
UD-IQ2_M - 255GBGLM-5.1
UD-IQ2_XXS - 221GB (-20GB)
UD-IQ2_M - 236GB (-19GB)I was wondering if it would be possible to re-quantize more efficiently GLM-5 IQ2_XSS and IQ2_M quants with the techniques applied on GLM-5.1 so I can have a little bit more of quality on GLM-5 for the same RAM usage (I know GLM-5.1 is out but GLM-5 works better for my personal use cases and I like more the writing style of GLM-5). I would love to run GLM-5 at IQ2_M quantization for less RAM usage like in GLM-5.1.
Thank you for the amazing work.
Good suggestion, we reformulated our quantization scheme hence the size difference. If there is more demand for it, we will do it!
Oh I would love to see IQ2_XXS and Q2_K_XL and IQ3_XXS