Any possibilty to Re-Quantize GLM-5 quants?

#1
by elpirater312 - opened

I notice GLM-5.1 Quants are a bit smaller than GLM-5 on the same range of quantization:

GLM-5
UD-IQ2_XXS - 241GB
UD-IQ2_M - 255GB

GLM-5.1
UD-IQ2_XXS - 221GB (-20GB)
UD-IQ2_M - 236GB (-19GB)

I was wondering if it would be possible to re-quantize more efficiently GLM-5 IQ2_XSS and IQ2_M quants with the techniques applied on GLM-5.1 so I can have a little bit more of quality on GLM-5 for the same RAM usage (I know GLM-5.1 is out but GLM-5 works better for my personal use cases and I like more the writing style of GLM-5). I would love to run GLM-5 at IQ2_M quantization for less RAM usage like in GLM-5.1.

Thank you for the amazing work.

Unsloth AI org

I notice GLM-5.1 Quants are a bit smaller than GLM-5 on the same range of quantization:

GLM-5
UD-IQ2_XXS - 241GB
UD-IQ2_M - 255GB

GLM-5.1
UD-IQ2_XXS - 221GB (-20GB)
UD-IQ2_M - 236GB (-19GB)

I was wondering if it would be possible to re-quantize more efficiently GLM-5 IQ2_XSS and IQ2_M quants with the techniques applied on GLM-5.1 so I can have a little bit more of quality on GLM-5 for the same RAM usage (I know GLM-5.1 is out but GLM-5 works better for my personal use cases and I like more the writing style of GLM-5). I would love to run GLM-5 at IQ2_M quantization for less RAM usage like in GLM-5.1.

Thank you for the amazing work.

Good suggestion, we reformulated our quantization scheme hence the size difference. If there is more demand for it, we will do it!

Oh I would love to see IQ2_XXS and Q2_K_XL and IQ3_XXS

Sign up or log in to comment