YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
NVFP4-W4A16 quantized version of LGAI-EXAONE/K-EXAONE-236B-A23B. Only the unshared expert layers are quantized.
Evaluation
The following package versions can be used to evalaute the quantized model.
vllm: 0.15.1
compressed-tensors: 0.13.0
transformers: 5.1.0
vllm serve FuriosaAIShareNotaAI/K-EXAONE-236B-A23B-NVFP4A16-GPTQ \
--reasoning-parser deepseek_v3 \
--tensor-parallel-size 8 \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--max-model-len 131072 \
--max-num-seqs 8
python -m simple-evals.simple_evals --eval gpqa --model FuriosaAIShareNotaAI/K-EXAONE-236B-A23B-NVFP4A16-GPTQ --custom --temperature 1.0 --top_p 0.95 --max_tokens None --extra_body '{"chat_template_kwargs": {"enable_thinking": true}}' --n-threads 8
Accuracy Results
The evaluation was done on H100 PCIE x 8
| Benchmark | furiosa-ai/K-EXAONE-236B-A23B-NVFP4A16 | LGAI-EXAONE/K-EXAONE-236B-A23B (report) |
|---|---|---|
| GPQA-DIAMOND(Reasoning) | 76.21 卤 1.51% (N=10) | 79.1 |
- Downloads last month
- 39
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support