PyTorch
English
llama
desaifan-mbzuai commited on
Commit
0435a77
verified
1 Parent(s): 69b3484

Update README.md (#11)

Browse files

- Update README.md (74713718fb8812f89937ede411db6c156c2f3d8d)

Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -61,6 +61,14 @@ Below we report performance across general, reasoning, mathematical, and coding
61
  | **HUMANEVAL** | 50.0 | 51.2 | <u>53.7</u> | **54.3** | **54.3** | **54.3** | 42.1 | 50.6 | 36.0 |
62
 
63
 
 
 
 
 
 
 
 
 
64
  Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.
65
 
66
  ---
 
61
  | **HUMANEVAL** | 50.0 | 51.2 | <u>53.7</u> | **54.3** | **54.3** | **54.3** | 42.1 | 50.6 | 36.0 |
62
 
63
 
64
+ Below we report the evaluation results for K2-V2 after supervised fine-tuning (SFT). These variants correspond to three levels of reasoning effort (Low < Medium < High).
65
+
66
+ | Model Specifications | LongBench V2 | AIME25 | HMMT25 | GSM8K | Minerva | GPQA-D | MBPP | HumanEval | LCBv6 |
67
+ |----------------------|--------------|--------|--------|-------|---------|--------|-------|------------|--------|
68
+ | **K2 Low**<br><sub>Dense 路 70B</sub> | 40.7 | 27.3 | 19.0 | 92.4 | 85.0 | 48.5 | 71.0 | 82.3 | 39.9 |
69
+ | **K2 Medium**<br><sub>Dense 路 70B</sub> | 41.3 | 62.0 | 45.6 | 92.0 | 90.6 | 60.6 | 75.8 | 84.2 | 51.3 |
70
+ | **K2 High**<br><sub>Dense 路 70B</sub> | 42.6 | 80.2 | 71.4 | 94.8 | 94.5 | 69.3 | 84.8 | 91.5 | 67.0 |
71
+
72
  Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.
73
 
74
  ---