LG-AI-EXAONE commited on
Commit
3873da9
·
1 Parent(s): b0bf986

Update technical report, API, and evaluation results

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +157 -32
  3. assets/main_figure.png +3 -0
.gitattributes CHANGED
@@ -34,4 +34,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  assets/K-EXAONE_Symbol_3d.png filter=lfs diff=lfs merge=lfs -text
 
37
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  assets/K-EXAONE_Symbol_3d.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/main_figure.png filter=lfs diff=lfs merge=lfs -text
38
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -22,19 +22,24 @@ library_name: transformers
22
  <p align="center">
23
  <img src="assets/K-EXAONE_Symbol_3d.png" width="400">
24
  <br>
25
- <!-- <p align="center"> 🤗 <a href="https://huggingface.co/collections/LGAI-EXAONE/k-exaone">Hugging Face</a> &nbsp | &nbsp 📝 <a href="#"> Blog</a> &nbsp | &nbsp 📑 <a href="#"> Technical Report </a>-->
26
  <br>
27
  <br>
28
 
29
  <div align="center">
30
  <a href="https://huggingface.co/collections/LGAI-EXAONE/k-exaone" style="text-decoration: none;">
31
- <img src="https://img.shields.io/badge/🤗-Huggingface-FC926C?style=for-the-badge" alt="Huggingface">
32
  </a>
33
  <a href="#" style="text-decoration: none;">
34
- <img src="https://img.shields.io/badge/📝-Blog_(TBD)-E343BD?style=for-the-badge" alt="Blog">
35
  </a>
36
- <a href="#" style="text-decoration: none;">
37
- <img src="https://img.shields.io/badge/📑-Technical_Report_(TBD)-684CF4?style=for-the-badge" alt="Technical Report">
 
 
 
 
 
 
38
  </a>
39
  </div>
40
 
@@ -52,7 +57,9 @@ We introduce **K-EXAONE**, a large-scale multilingual language model developed b
52
  - **Agentic Capabilities:** Demonstrates superior tool-use and search capabilities via **multi-agent strategies.**
53
  - **Safety & Ethics:** Aligned with **universal human values**, the model uniquely incorporates **Korean cultural and historical contexts** to address regional sensitivities often overlooked by other models. It demonstrates high reliability across diverse risk categories.
54
 
55
- For more details, please refer to the [technical report](#).
 
 
56
 
57
 
58
  ### Model Configuration
@@ -80,7 +87,7 @@ For more details, please refer to the [technical report](#).
80
  - Knowledge Cutoff: Dec 2024 (2024/12)
81
  ## Evaluation Results
82
 
83
- The following table shows the evaluation results of the K-EXAONE model in reasoning mode, compared to our previous model, [EXAONE-4.0](https://github.com/LG-AI-EXAONE/EXAONE-4.0), and other competing models. The evaluation details can be found in the [technical report](#).
84
 
85
  <table>
86
  <tr>
@@ -120,7 +127,7 @@ The following table shows the evaluation results of the K-EXAONE model in reason
120
  </tr>
121
  <tr>
122
  <td align="center">MMLU-Pro</td>
123
- <td align="center">83.9</td>
124
  <td align="center">81.8</td>
125
  <td align="center">80.7</td>
126
  <td align="center">84.4</td>
@@ -128,7 +135,7 @@ The following table shows the evaluation results of the K-EXAONE model in reason
128
  </tr>
129
  <tr>
130
  <td align="center">GPQA-Diamond</td>
131
- <td align="center">80.0</td>
132
  <td align="center">75.4</td>
133
  <td align="center">80.1</td>
134
  <td align="center">81.1</td>
@@ -136,7 +143,7 @@ The following table shows the evaluation results of the K-EXAONE model in reason
136
  </tr>
137
  <tr>
138
  <td align="center">Humanity's Last Exam</td>
139
- <td align="center">13.8</td>
140
  <td align="center">10.6</td>
141
  <td align="center">14.9</td>
142
  <td align="center">18.2</td>
@@ -145,42 +152,106 @@ The following table shows the evaluation results of the K-EXAONE model in reason
145
  <tr>
146
  <td align="center" colspan='7'><i>Math</i></td>
147
  </tr>
 
 
 
 
 
 
 
 
148
  <tr>
149
  <td align="center">AIME 2025</td>
150
- <td align="center">92.6</td>
151
  <td align="center">85.3</td>
152
  <td align="center">92.5</td>
153
  <td align="center">92.3</td>
154
  <td align="center">93.1</td>
155
  </tr>
156
  <tr>
157
- <td align="center" colspan='7'><i>Coding</i></td>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158
  </tr>
159
  <tr>
160
  <td align="center">LiveCodeBench v6</td>
161
- <td align="center">81.1</td>
162
  <td align="center">66.7</td>
163
  <td align="center">81.9</td>
164
  <td align="center">74.1</td>
165
  <td align="center">79.4</td>
166
  </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
  <tr>
168
  <td align="center" colspan='7'><i>Agentic Tool Use</i></td>
169
  </tr>
170
  <tr>
171
- <td align="center">τ<sup>2</sup>-Bench (Telecom)</td>
 
 
 
172
  <td align="center">71.9</td>
 
 
 
 
 
 
 
 
 
 
 
 
 
173
  <td align="center">23.7</td>
174
  <td align="center">60.3</td>
175
  <td align="center">45.6</td>
176
  <td align="center">85.8</td>
177
  </tr>
 
 
 
 
 
 
 
 
178
  <tr>
179
  <td align="center" colspan='7'><i>Instruction Following</i></td>
180
  </tr>
181
  <tr>
182
  <td align="center">IFBench</td>
183
- <td align="center">67.4</td>
184
  <td align="center">36.0</td>
185
  <td align="center">69.5</td>
186
  <td align="center">52.6</td>
@@ -188,7 +259,7 @@ The following table shows the evaluation results of the K-EXAONE model in reason
188
  </tr>
189
  <tr>
190
  <td align="center">IFEval</td>
191
- <td align="center">89.8</td>
192
  <td align="center">84.7</td>
193
  <td align="center">89.5</td>
194
  <td align="center">87.8</td>
@@ -206,7 +277,15 @@ The following table shows the evaluation results of the K-EXAONE model in reason
206
  <td align="center">65.0</td>
207
  </tr>
208
  <tr>
209
- <td align="center" colspan='7'><i>Korean Knowledge & Math</i></td>
 
 
 
 
 
 
 
 
210
  </tr>
211
  <tr>
212
  <td align="center">KMMLU-Pro</td>
@@ -217,12 +296,12 @@ The following table shows the evaluation results of the K-EXAONE model in reason
217
  <td align="center">72.1</td>
218
  </tr>
219
  <tr>
220
- <td align="center">HRM8K</td>
221
- <td align="center">90.7</td>
222
- <td align="center">89.4</td>
223
- <td align="center">91.6</td>
224
- <td align="center">92.0</td>
225
- <td align="center">90.6</td>
226
  </tr>
227
  <tr>
228
  <td align="center">CLIcK</td>
@@ -233,12 +312,58 @@ The following table shows the evaluation results of the K-EXAONE model in reason
233
  <td align="center">86.3</td>
234
  </tr>
235
  <tr>
236
- <td align="center">KoBALT</td>
237
- <td align="center">61.8</td>
238
- <td align="center">25.4</td>
239
- <td align="center">54.3</td>
240
- <td align="center">56.1</td>
241
- <td align="center">62.7</td>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
242
  </tr>
243
  </table>
244
 
@@ -266,6 +391,8 @@ You can install the latest version of SGLang with support for EXAONE-MoE archite
266
 
267
  You can install the latest version of llama.cpp with support for EXAONE-MoE architecture from [this repository](https://github.com/Aim-Highest/llama.cpp).
268
  Please refer to the [official build guide](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md) for details.
 
 
269
  ## Quickstart
270
 
271
  You can use the K-EXAONE model with the Transformers library. For better quality, you should check the [usage guideline](#usage-guideline) section.
@@ -448,8 +575,7 @@ Practically, you can serve the model with a 256K context length using tensor par
448
  ```bash
449
  python -m sglang.launch_server \
450
  --model LGAI-EXAONE/K-EXAONE-236B-A23B \
451
- --reasoning-parser qwen3 \
452
- --disable-hybrid-swa-memory
453
  ```
454
 
455
  A SGLang server will be available at http://localhost:30000.
@@ -499,7 +625,6 @@ If you are interested in in using MTP weights for speculative decoding, add acco
499
  python -m sglang.launch_server \
500
  --model LGAI-EXAONE/K-EXAONE-236B-A23B \
501
  --reasoning-parser qwen3 \
502
- --disable-hybrid-swa-memory \
503
  --speculative-algorithm EAGLE \
504
  --speculative-num-steps 3 \
505
  --speculative-eagle-topk 1 \
 
22
  <p align="center">
23
  <img src="assets/K-EXAONE_Symbol_3d.png" width="400">
24
  <br>
 
25
  <br>
26
  <br>
27
 
28
  <div align="center">
29
  <a href="https://huggingface.co/collections/LGAI-EXAONE/k-exaone" style="text-decoration: none;">
30
+ <img src="https://img.shields.io/badge/🤗-HuggingFace-FC926C?style=for-the-badge" alt="HuggingFace">
31
  </a>
32
  <a href="#" style="text-decoration: none;">
33
+ <img src="https://img.shields.io/badge/📝-Blog_(TBU)-E343BD?style=for-the-badge" alt="Blog">
34
  </a>
35
+ <a href="https://www.lgresearch.ai/data/cdn/upload/K-EXAONE_Technical_Report.pdf" style="text-decoration: none;">
36
+ <img src="https://img.shields.io/badge/📑-Technical_Report-684CF4?style=for-the-badge" alt="Technical Report">
37
+ </a>
38
+ <a href="https://github.com/LG-AI-EXAONE/K-EXAONE" style="text-decoration: none;">
39
+ <img src="https://img.shields.io/badge/🖥️-GitHub-2B3137?style=for-the-badge" alt="GitHub">
40
+ </a>
41
+ <a href="https://friendli.ai/suite/0vabuzmPYUNt/RFZtL3MqChNK/serverless-endpoints/LGAI-EXAONE/K-EXAONE-236B-A23B/overview" style="text-decoration: none;">
42
+ <img src="https://img.shields.io/badge/✈️_API-Try_on_FriendliAI-2649BC?style=for-the-badge" alt="FriendliAI">
43
  </a>
44
  </div>
45
 
 
57
  - **Agentic Capabilities:** Demonstrates superior tool-use and search capabilities via **multi-agent strategies.**
58
  - **Safety & Ethics:** Aligned with **universal human values**, the model uniquely incorporates **Korean cultural and historical contexts** to address regional sensitivities often overlooked by other models. It demonstrates high reliability across diverse risk categories.
59
 
60
+ For more details, please refer to the [technical report](https://www.lgresearch.ai/data/cdn/upload/K-EXAONE_Technical_Report.pdf) and [GitHub](https://github.com/LG-AI-EXAONE/K-EXAONE).
61
+
62
+ ![main_figure](assets/main_figure.png)
63
 
64
 
65
  ### Model Configuration
 
87
  - Knowledge Cutoff: Dec 2024 (2024/12)
88
  ## Evaluation Results
89
 
90
+ The following table shows the evaluation results of the K-EXAONE model in reasoning mode, compared to our previous model, [EXAONE-4.0](https://github.com/LG-AI-EXAONE/EXAONE-4.0), and other competing models. The evaluation details can be found in the [technical report](https://www.lgresearch.ai/data/cdn/upload/K-EXAONE_Technical_Report.pdf).
91
 
92
  <table>
93
  <tr>
 
127
  </tr>
128
  <tr>
129
  <td align="center">MMLU-Pro</td>
130
+ <td align="center">83.8</td>
131
  <td align="center">81.8</td>
132
  <td align="center">80.7</td>
133
  <td align="center">84.4</td>
 
135
  </tr>
136
  <tr>
137
  <td align="center">GPQA-Diamond</td>
138
+ <td align="center">79.1</td>
139
  <td align="center">75.4</td>
140
  <td align="center">80.1</td>
141
  <td align="center">81.1</td>
 
143
  </tr>
144
  <tr>
145
  <td align="center">Humanity's Last Exam</td>
146
+ <td align="center">13.6</td>
147
  <td align="center">10.6</td>
148
  <td align="center">14.9</td>
149
  <td align="center">18.2</td>
 
152
  <tr>
153
  <td align="center" colspan='7'><i>Math</i></td>
154
  </tr>
155
+ <tr>
156
+ <td align="center">IMO-AnswerBench</td>
157
+ <td align="center">76.3</td>
158
+ <td align="center">66.1</td>
159
+ <td align="center">75.6</td>
160
+ <td align="center">74.8</td>
161
+ <td align="center">78.3</td>
162
+ </tr>
163
  <tr>
164
  <td align="center">AIME 2025</td>
165
+ <td align="center">92.8</td>
166
  <td align="center">85.3</td>
167
  <td align="center">92.5</td>
168
  <td align="center">92.3</td>
169
  <td align="center">93.1</td>
170
  </tr>
171
  <tr>
172
+ <td align="center">HMMT Nov 2025</td>
173
+ <td align="center">86.8</td>
174
+ <td align="center">78.1</td>
175
+ <td align="center">84.9</td>
176
+ <td align="center">88.8</td>
177
+ <td align="center">90.2</td>
178
+ </tr>
179
+ <tr>
180
+ <td align="center" colspan='7'><i>Coding / Agentic Coding</i></td>
181
+ </tr>
182
+ <tr>
183
+ <td align="center">LiveCodeBench Pro 25Q2 (Medium)</td>
184
+ <td align="center">25.9</td>
185
+ <td align="center">4.8</td>
186
+ <td align="center">35.4</td>
187
+ <td align="center">16.0</td>
188
+ <td align="center">27.9</td>
189
  </tr>
190
  <tr>
191
  <td align="center">LiveCodeBench v6</td>
192
+ <td align="center">80.7</td>
193
  <td align="center">66.7</td>
194
  <td align="center">81.9</td>
195
  <td align="center">74.1</td>
196
  <td align="center">79.4</td>
197
  </tr>
198
+ <tr>
199
+ <td align="center">Terminal-Bench 2.0</td>
200
+ <td align="center">29.0</td>
201
+ <td align="center">-</td>
202
+ <td align="center">18.7</td>
203
+ <td align="center">13.3</td>
204
+ <td align="center">46.4</td>
205
+ </tr>
206
+ <tr>
207
+ <td align="center">SWE-Bench Verified</td>
208
+ <td align="center">49.4</td>
209
+ <td align="center">-</td>
210
+ <td align="center">62.4</td>
211
+ <td align="center">25.0</td>
212
+ <td align="center">73.1</td>
213
+ </tr>
214
  <tr>
215
  <td align="center" colspan='7'><i>Agentic Tool Use</i></td>
216
  </tr>
217
  <tr>
218
+ <td align="center">τ<sup>2</sup>-Bench (Retail)</td>
219
+ <td align="center">78.6</td>
220
+ <td align="center">67.5</td>
221
+ <td align="center">69.1</td>
222
  <td align="center">71.9</td>
223
+ <td align="center">77.9</td>
224
+ </tr>
225
+ <tr>
226
+ <td align="center">τ<sup>2</sup>-Bench (Airline)</td>
227
+ <td align="center">60.4</td>
228
+ <td align="center">52.0</td>
229
+ <td align="center">60.5</td>
230
+ <td align="center">58.0</td>
231
+ <td align="center">66.0</td>
232
+ </tr>
233
+ <tr>
234
+ <td align="center">τ<sup>2</sup>-Bench (Telecom)</td>
235
+ <td align="center">73.5</td>
236
  <td align="center">23.7</td>
237
  <td align="center">60.3</td>
238
  <td align="center">45.6</td>
239
  <td align="center">85.8</td>
240
  </tr>
241
+ <tr>
242
+ <td align="center">BrowseComp</td>
243
+ <td align="center">31.4</td>
244
+ <td align="center">-</td>
245
+ <td align="center">-</td>
246
+ <td align="center">-</td>
247
+ <td align="center">51.4</td>
248
+ </tr>
249
  <tr>
250
  <td align="center" colspan='7'><i>Instruction Following</i></td>
251
  </tr>
252
  <tr>
253
  <td align="center">IFBench</td>
254
+ <td align="center">67.3</td>
255
  <td align="center">36.0</td>
256
  <td align="center">69.5</td>
257
  <td align="center">52.6</td>
 
259
  </tr>
260
  <tr>
261
  <td align="center">IFEval</td>
262
+ <td align="center">89.7</td>
263
  <td align="center">84.7</td>
264
  <td align="center">89.5</td>
265
  <td align="center">87.8</td>
 
277
  <td align="center">65.0</td>
278
  </tr>
279
  <tr>
280
+ <td align="center">OpenAI-MRCR</td>
281
+ <td align="center">52.3</td>
282
+ <td align="center">20.1</td>
283
+ <td align="center">29.9</td>
284
+ <td align="center">58.6</td>
285
+ <td align="center">57.7</td>
286
+ </tr>
287
+ <tr>
288
+ <td align="center" colspan='7'><i>Korean</i></td>
289
  </tr>
290
  <tr>
291
  <td align="center">KMMLU-Pro</td>
 
296
  <td align="center">72.1</td>
297
  </tr>
298
  <tr>
299
+ <td align="center">KoBALT</td>
300
+ <td align="center">61.8</td>
301
+ <td align="center">25.4</td>
302
+ <td align="center">54.3</td>
303
+ <td align="center">56.1</td>
304
+ <td align="center">62.7</td>
305
  </tr>
306
  <tr>
307
  <td align="center">CLIcK</td>
 
312
  <td align="center">86.3</td>
313
  </tr>
314
  <tr>
315
+ <td align="center">HRM8K</td>
316
+ <td align="center">90.9</td>
317
+ <td align="center">89.4</td>
318
+ <td align="center">91.6</td>
319
+ <td align="center">92.0</td>
320
+ <td align="center">90.6</td>
321
+ </tr>
322
+ <tr>
323
+ <td align="center">Ko-LongBench</td>
324
+ <td align="center">86.8</td>
325
+ <td align="center">68.0</td>
326
+ <td align="center">82.2</td>
327
+ <td align="center">83.2</td>
328
+ <td align="center">87.9</td>
329
+ </tr>
330
+ <tr>
331
+ <td align="center" colspan='7'><i>Multilinguality</i></td>
332
+ </tr>
333
+ <tr>
334
+ <td align="center">MMMLU</td>
335
+ <td align="center">85.7</td>
336
+ <td align="center">83.2</td>
337
+ <td align="center">83.8</td>
338
+ <td align="center">87.3</td>
339
+ <td align="center">88.0</td>
340
+ </tr>
341
+ <tr>
342
+ <td align="center">WMT24++</td>
343
+ <td align="center">90.5</td>
344
+ <td align="center">80.8</td>
345
+ <td align="center">93.6</td>
346
+ <td align="center">94.7</td>
347
+ <td align="center">90.0</td>
348
+ </tr>
349
+ <tr>
350
+ <td align="center" colspan='7'><i>Safety</i></td>
351
+ </tr>
352
+ <tr>
353
+ <td align="center">Wild-Jailbreak</td>
354
+ <td align="center">89.9</td>
355
+ <td align="center">62.8</td>
356
+ <td align="center">98.2</td>
357
+ <td align="center">85.5</td>
358
+ <td align="center">79.1</td>
359
+ </tr>
360
+ <tr>
361
+ <td align="center">KGC-Safety</td>
362
+ <td align="center">96.1</td>
363
+ <td align="center">58.0</td>
364
+ <td align="center">92.5</td>
365
+ <td align="center">66.2</td>
366
+ <td align="center">73.0</td>
367
  </tr>
368
  </table>
369
 
 
391
 
392
  You can install the latest version of llama.cpp with support for EXAONE-MoE architecture from [this repository](https://github.com/Aim-Highest/llama.cpp).
393
  Please refer to the [official build guide](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md) for details.
394
+
395
+
396
  ## Quickstart
397
 
398
  You can use the K-EXAONE model with the Transformers library. For better quality, you should check the [usage guideline](#usage-guideline) section.
 
575
  ```bash
576
  python -m sglang.launch_server \
577
  --model LGAI-EXAONE/K-EXAONE-236B-A23B \
578
+ --reasoning-parser qwen3
 
579
  ```
580
 
581
  A SGLang server will be available at http://localhost:30000.
 
625
  python -m sglang.launch_server \
626
  --model LGAI-EXAONE/K-EXAONE-236B-A23B \
627
  --reasoning-parser qwen3 \
 
628
  --speculative-algorithm EAGLE \
629
  --speculative-num-steps 3 \
630
  --speculative-eagle-topk 1 \
assets/main_figure.png ADDED

Git LFS Details

  • SHA256: 78987fb8ea984e2a5d27c836e354b485860589a6dcf673550ea2134a2b8bc6e6
  • Pointer size: 131 Bytes
  • Size of remote file: 113 kB