Add GPQA evaluation result (#54)

- Add GPQA evaluation result (0ae26748999a1179b83e7f50e653b32ea404af5a)
- Fix task_id to match benchmark eval.yaml (7524de5db643949641e1bafa14f2d9c37a170763)

Co-authored-by: ben burtenshaw <burtenshaw@users.noreply.huggingface.co>

Files changed (1) hide show

.eval_results/gpqa.yaml ADDED Viewed

+- dataset:
+    id: Idavidrein/gpqa
+    task_id: diamond
+  value: 75.2
+  date: '2026-01-27'
+  source:
+    url: https://huggingface.co/zai-org/GLM-4.7-Flash
+    name: Model Card
+    user: burtenshaw