I simply took the UD-IQ4_XS and merged all the shards (0001.gguf, 0002.gguf, 0003.gguf) and merged them into one .gguf I used llama.cpp gguf-split tool to merge : https://github.com/ggml-org/llama.cpp/tree/master/tools/gguf-split

Usefull for example for vLLM because they don't allow multishards

Feel free to check out my website : https://cheapllm.shop for unlimited FREE inference of this model (during the beta, after that the pricing will be $0.02/M input and $0.10/M output so the cheapest provider by a big margin)

If you're interested in D&D/RP, you can also check out https://fablia.fr for free D&D/RP experiences

Downloads last month: 609

GGUF

Model size

122B params

Architecture

qwen35moe

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Volko76/Qwen3.5-122B-A10B-UD-IQ4_XS-GGUF-MERGED

Base model

Qwen/Qwen3.5-122B-A10B

Quantized

(95)

this model