๐Ÿšจ IMPORTANT โ€” RE-DOWNLOAD tokenizer_config.json IF YOU GOT INFINITE LOOPS

Fixed 2026-04-13. Earlier versions of this repo had a tokenizer bug that caused the model to loop forever in stock mlx_lm and other loaders.

Gemma-4 emits <end_of_turn> (id 106) at the end of an assistant turn, but the original tokenizer_config.json only listed <eos> (id 1) as the stop token. Stock loaders never detected the actual end-of-turn marker โ†’ infinite loop.

How to fix:

Option A โ€” Re-download just the tokenizer config (fastest)

huggingface-cli download JANGQ-AI/Gemma-4-31B-it-JANG_4M tokenizer_config.json --local-dir ./your-model-dir

Option B โ€” Re-download the whole repo

huggingface-cli download JANGQ-AI/Gemma-4-31B-it-JANG_4M --local-dir ./your-model-dir

Option C โ€” Pass the stop tokens manually

stop_token_ids = [1, 106, 50]  # <eos>, <end_of_turn>, <end_of_image>

The model weights are unchanged โ€” you only need to update tokenizer_config.json.


Gemma-4-31B-it-JANG_4M

JANG quantized Gemma-4 MoE for Apple Silicon. See JANGQ-AI for the full collection.

Downloads last month
2,247
Safetensors
Model size
6B params
Tensor type
U32
ยท
F16
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support