You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

NVFP4-W4A16 quantized version of LGAI-EXAONE/K-EXAONE-236B-A23B. Only the unshared expert layers are quantized.

Evaluation

The following package versions can be used to evalaute the quantized model.

vllm: 0.15.1

compressed-tensors: 0.13.0

transformers: 5.1.0

vllm serve FuriosaAIShareNotaAI/K-EXAONE-236B-A23B-NVFP4A16-GPTQ \
    --reasoning-parser deepseek_v3 \
    --tensor-parallel-size 8 \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --max-model-len 131072 \
    --max-num-seqs 8 
python -m simple-evals.simple_evals --eval gpqa --model FuriosaAIShareNotaAI/K-EXAONE-236B-A23B-NVFP4A16-GPTQ --custom --temperature 1.0 --top_p 0.95 --max_tokens None --extra_body '{"chat_template_kwargs": {"enable_thinking": true}}' --n-threads 8

Accuracy Results

The evaluation was done on H100 PCIE x 8

Benchmark furiosa-ai/K-EXAONE-236B-A23B-NVFP4A16 LGAI-EXAONE/K-EXAONE-236B-A23B (report)
GPQA-DIAMOND(Reasoning) 76.21 卤 1.51% (N=10) 79.1
Downloads last month
39
Safetensors
Model size
137B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support