UD-Q4_K_XL of MiniMax-M2.7-GGUF in BROKEN

by dehnhaide - opened about 10 hours ago

Please, pretty please... Unsloth, get well with your QA and abide by the already accepted "GGUF quanting community" minimal QA baseline on HF and transparently provide PPL and KLD data. At least do it internally as a hygene measure to avoid such flops. Rush it not!

For the people askign what is "NaN" in quant PPL measurement that would normally point out the existence of numerical issues with the backend kernels or the quant itself, it's about a rushed in / never checked quant error.

I have checked similar quants from other HF providers (aessedai/MiniMax-M2.7-Q5_K_M --> 157.226 GiB (5.906 BPW) and ubergarm/MiniMax-M2.7-IQ5_K --> 157.771 GiB (5.926 BPW)) and no such error is present. But this is not about backend kernels, nor about the much-hyped "poisoned CUDA 13.2".

EclipseMist

about 10 hours ago

Ive noticed off feeling answers like mostly right but (factual errors on things m2.5 didnt get wrong) with the UD-IQ4_NL quant I wonder if it has NaN issues as well

danielhanchen

Unsloth AI org about 8 hours ago

•

edited about 7 hours ago

When we ran perplexity and KLD benchmarks on MiniMax-M2.7 for all 4-bit quants, it did in fact show unusually high PPL compared with the other quants. AesSedai and ubergam reported seeing similar issues as well.
That said, we initially kept it up because Benjamin Marie’s benchmarks on M2.5 (which uses the same arch as M2.7) suggested that Q4_K_XL performed the best overall, so we did not remove it at the time. In fact this time, our Q4_K_XL had even more layers upcasted than M2.5

In our own internal testing, Q4_K_XL also performed very well, which led us to believe the elevated PPL might have been a fluke, since that does happen from time to time.
But, as a precaution, we’ll remove the Q4_K_XL quant for now in case there are any further issues, and we’ll pay closer attention to PPL in future evaluations.

We're still doing more investigation on the matter on what could be the cause and how we can alleviate the issue.

dehnhaide

about 7 hours ago

Thanks for your reply, Daniel.
A perplexity check (without KLD) on a model sized as MiniMax takes roughly 5 minutes. I can imagine you could batched such test at least for "pure" and/or UD quants so that such accidents won't happen again. Also, even if not published the first day you push the quants, it would still be of good help / assert trust from the community if you publish PPL / KLD at a later time in the model card. It doesn't have to reference any other fellow quanter similar PPL/KLD figures (to avoid useless competition) but this could also do baseline sanity checks for most of the interesting, meaningful quants for the community.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment