nvfp4
could any1 do NVFP4 or Maybe even NVFP3? I can only run 3 bit or maybeee 4 bit currently and standard quant is a bit lossy
I see what you're saying sir. Haven't looked into NVFP4 quants yet.
While this model is strong in some tasks I honestly will recommend our v2 momentarily. Worked for me in Cline pretty well at q3_k_m. (will be out within the next couple hours if all the testing yields positive results.
mine confuses thinking and final output. it does not use think tags, instead just reasons in final output and then stops its answer midway through thinking. ill try fixing using sys prompt
it still does the task mostly, its just the formating. soo yeah
sometimes it does do normal final output but i have yet to see it use think tags once
it reasons, (sometimes) final output, no tags
If using in lm studio, switch to latest llama.cpp for some fixes. If it’s not that then it’s just one of the many weird artifacts from training this model so early. See v2 for some potential fixes
it seems able to use the think blocks, it just does not want to. i tell it in the system prompt to always do it, but only when i correct it, it does so. it will say:
I should have used the thinking format as requested. Let me correct that:
…
You‘re right — I should have …
If using in lm studio, switch to latest llama.cpp for some fixes. If it’s not that then it’s just one of the many weird artifacts from training this model so early. See v2 for some potential fixes
ok. ill probs delete the current one soon then
i got it to do it without correcting, by giving it an example of a correct usage 😀
I will double check this models chat template, at one point I shipped a faulty chat template that disabled think behavior when enable thinking was on
its seems more capable of these formatings, than gpt. Though GPT is more willing to do it, so you kinda have to force gemma to try. Very odd.
Gemma4 is a hybrid thinking model though. Please try giving it a complex task, if it doesn’t reason before it’s answer without extra prompting this is most likely my mistake
It does reason often times. It just does not put it in the think tags
oof. just saw ggufs are outdated with my broken template from earlier. repuploading again
It does reason often times. It just does not put it in the think tags
Try updating to the latest llama cpp release. I had this issue with a different model (minimax m2.5) and it turned out to be a bug in llama cpp that was fixed in a later release.
this should be fixed but you will most likely hit random truncation errors. this is fixed in v2
Is V2 capable of decent tool calls? GPT is inconsistent with it/normaly unable to, but half-decent if I tell it in sys prompt not to mess up tool syntax lol. From exp. with Gemma yesterday, it hated calling tools, but is half good at it