Different param dtype between Qwen3.5 and Qwen3-Next

#56

by BestJuly7 - opened Mar 3

Mar 3

Hi, we noticed the dtype changes in the checkpoint compared to Qwen3-Next: the GDN A_log and out_norm (output layernorm) parameters in FP32 in its released checkpoints, while the rest of the model is still BF16. Could you please provide the detail about it? Whether these parameters are stored in BF16 and computation should be conducted in FP32, or we should use these params in FP32 for both storage and computation. Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment