Different param dtype between Qwen3.5 and Qwen3-Next

#56
by BestJuly7 - opened

Hi, we noticed the dtype changes in the checkpoint compared to Qwen3-Next: the GDN A_log and out_norm (output layernorm) parameters in FP32 in its released checkpoints, while the rest of the model is still BF16. Could you please provide the detail about it? Whether these parameters are stored in BF16 and computation should be conducted in FP32, or we should use these params in FP32 for both storage and computation. Thanks.

image
image

Sign up or log in to comment