nvfp4

by CryptoAIM - opened 1 day ago

could any1 do NVFP4 or Maybe even NVFP3? I can only run 3 bit or maybeee 4 bit currently and standard quant is a bit lossy

CompactAI

TeichAI org 1 day ago

This comment has been hidden (marked as Off-Topic)

armand0e

TeichAI org 1 day ago

I see what you're saying sir. Haven't looked into NVFP4 quants yet.

While this model is strong in some tasks I honestly will recommend our v2 momentarily. Worked for me in Cline pretty well at q3_k_m. (will be out within the next couple hours if all the testing yields positive results.

CryptoAIM

about 16 hours ago

mine confuses thinking and final output. it does not use think tags, instead just reasons in final output and then stops its answer midway through thinking. ill try fixing using sys prompt

CryptoAIM

about 16 hours ago

•

edited about 16 hours ago

it still does the task mostly, its just the formating. soo yeah

CryptoAIM

about 16 hours ago

sometimes it does do normal final output but i have yet to see it use think tags once

CryptoAIM

about 16 hours ago

it reasons, (sometimes) final output, no tags

armand0e

TeichAI org about 16 hours ago

If using in lm studio, switch to latest llama.cpp for some fixes. If it’s not that then it’s just one of the many weird artifacts from training this model so early. See v2 for some potential fixes

CryptoAIM

about 16 hours ago

it seems able to use the think blocks, it just does not want to. i tell it in the system prompt to always do it, but only when i correct it, it does so. it will say:
I should have used the thinking format as requested. Let me correct that:
…
You‘re right — I should have …

CryptoAIM

about 16 hours ago

If using in lm studio, switch to latest llama.cpp for some fixes. If it’s not that then it’s just one of the many weird artifacts from training this model so early. See v2 for some potential fixes

ok. ill probs delete the current one soon then

CryptoAIM

about 16 hours ago

i got it to do it without correcting, by giving it an example of a correct usage 😀

armand0e

TeichAI org about 16 hours ago

•

edited about 16 hours ago

I will double check this models chat template, at one point I shipped a faulty chat template that disabled think behavior when enable thinking was on

CryptoAIM

about 16 hours ago

its seems more capable of these formatings, than gpt. Though GPT is more willing to do it, so you kinda have to force gemma to try. Very odd.

armand0e

TeichAI org about 16 hours ago

Gemma4 is a hybrid thinking model though. Please try giving it a complex task, if it doesn’t reason before it’s answer without extra prompting this is most likely my mistake

CryptoAIM

about 15 hours ago

It does reason often times. It just does not put it in the think tags

armand0e

TeichAI org about 15 hours ago

oof. just saw ggufs are outdated with my broken template from earlier. repuploading again

EclipseMist

TeichAI org about 8 hours ago

It does reason often times. It just does not put it in the think tags

Try updating to the latest llama cpp release. I had this issue with a different model (minimax m2.5) and it turned out to be a bug in llama cpp that was fixed in a later release.

armand0e

TeichAI org about 2 hours ago

this should be fixed but you will most likely hit random truncation errors. this is fixed in v2

CryptoAIM

15 minutes ago

Is V2 capable of decent tool calls? GPT is inconsistent with it/normaly unable to, but half-decent if I tell it in sys prompt not to mess up tool syntax lol. From exp. with Gemma yesterday, it hated calling tools, but is half good at it

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment