Any possibilty to Re-Quantize GLM-5 quants?

by elpirater312 - opened 9 days ago

I notice GLM-5.1 Quants are a bit smaller than GLM-5 on the same range of quantization:

GLM-5
UD-IQ2_XXS - 241GB
UD-IQ2_M - 255GB

GLM-5.1
UD-IQ2_XXS - 221GB (-20GB)
UD-IQ2_M - 236GB (-19GB)

I was wondering if it would be possible to re-quantize more efficiently GLM-5 IQ2_XSS and IQ2_M quants with the techniques applied on GLM-5.1 so I can have a little bit more of quality on GLM-5 for the same RAM usage (I know GLM-5.1 is out but GLM-5 works better for my personal use cases and I like more the writing style of GLM-5). I would love to run GLM-5 at IQ2_M quantization for less RAM usage like in GLM-5.1.

Thank you for the amazing work.

shimmyshimmer

Unsloth AI org 9 days ago

I notice GLM-5.1 Quants are a bit smaller than GLM-5 on the same range of quantization:

GLM-5
UD-IQ2_XXS - 241GB
UD-IQ2_M - 255GB

GLM-5.1
UD-IQ2_XXS - 221GB (-20GB)
UD-IQ2_M - 236GB (-19GB)

I was wondering if it would be possible to re-quantize more efficiently GLM-5 IQ2_XSS and IQ2_M quants with the techniques applied on GLM-5.1 so I can have a little bit more of quality on GLM-5 for the same RAM usage (I know GLM-5.1 is out but GLM-5 works better for my personal use cases and I like more the writing style of GLM-5). I would love to run GLM-5 at IQ2_M quantization for less RAM usage like in GLM-5.1.

Thank you for the amazing work.

Good suggestion, we reformulated our quantization scheme hence the size difference. If there is more demand for it, we will do it!

AImhotep

9 days ago

Oh I would love to see IQ2_XXS and Q2_K_XL and IQ3_XXS

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment