Add GPQA evaluation result (#54)
Browse files- Add GPQA evaluation result (0ae26748999a1179b83e7f50e653b32ea404af5a)
- Fix task_id to match benchmark eval.yaml (7524de5db643949641e1bafa14f2d9c37a170763)
Co-authored-by: ben burtenshaw <burtenshaw@users.noreply.huggingface.co>
- .eval_results/gpqa.yaml +9 -0
.eval_results/gpqa.yaml
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
- dataset:
|
| 2 |
+
id: Idavidrein/gpqa
|
| 3 |
+
task_id: diamond
|
| 4 |
+
value: 75.2
|
| 5 |
+
date: '2026-01-27'
|
| 6 |
+
source:
|
| 7 |
+
url: https://huggingface.co/zai-org/GLM-4.7-Flash
|
| 8 |
+
name: Model Card
|
| 9 |
+
user: burtenshaw
|