This is Qwen/Qwen3.5-27B quantized with llm-compressor to NVFP4. The model is compatible with vLLM (tested: v0.16.1rc1). Tested with an H200. Currently under evaluation.

Instructions

uv pip install vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly
uv pip install git+https://github.com/huggingface/transformers.git
vllm serve [this model ID]  --max-model-len 262144 --reasoning-parser qwen3

Acknowledgments

Thank you Verda for providing the needed compute. I used their H200s. Verda is a European, AI-focused cloud and GPU infrastructure provider with sovereignty, sustainability, data privacy, and performance at its core. Check them out if interested.

Downloads last month
211,031
Safetensors
Model size
19B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for kaitchup/Qwen3.5-27B-NVFP4

Base model

Qwen/Qwen3.5-27B
Quantized
(197)
this model