keep doing it!

by pirola - opened 15 days ago

I am a fan already! These models are perfect for being 4-bit quantized to run on more constrained GPUs like mine with 16Gb!

bknyaz

Samsung AI Lab (SAIL) Montreal org 6 days ago

Thanks! We just released the code https://github.com/SamsungSAILMontreal/ream so other models can be REAMed.

pirola

5 days ago

•

edited 5 days ago

I am working on a REAMed version of Nemotron Cascade 2, but boy, it's far away from good. I will try to use your complete process now and see whether I get better results. Thanks for the contribution!
I cannot completly apply your methodology there, though. Curretly:

Expert selection uses sigmoid (Nemotron's routing) while REAM uses softmax-selected top-k
Alignment uses [up_row ‖ down_col] not [gate_row ‖ up_row ‖ down_col] (non-gated MoE)
e_score_correction_bias sliced alongside gate.weight (Nemotron-specific)

would you agree with that, or suggest something otherwise?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment