Nathan Habib PRO
AI & ML interests
Evals
Recent Activity
new activity 22 minutes ago
QuixiAI/Kimi-K2.6-bf16:Delete .eval_results new activity 22 minutes ago
QuixiAI/Kimi-K2.6-bf16:Delete .eval_results new activity 28 minutes ago
moonshotai/Kimi-K2.6:Add evaluation results for HLE, GPQA, AIME, HMMT, SWE-Bench, and Terminal-BenchOrganizations
RULER Datasets Falcon-H1-3B-Base
RULER Datasets
-
lighteval/RULER-131072-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 37 -
lighteval/RULER-65536-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 58 -
lighteval/RULER-32768-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 30 -
lighteval/RULER-16384-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 44
RULER Datasets Qwen2.5-Instruct
RULER Datasets
RULER Datasets Qwen-3
RULER Datasets
Agents ressources
All the ressources I found / used when getting up to speed with agents.
benchmarks
RULER Datasets Falcon-H1-3B-Base
RULER Datasets
-
lighteval/RULER-131072-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 37 -
lighteval/RULER-65536-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 58 -
lighteval/RULER-32768-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 30 -
lighteval/RULER-16384-Falcon-H1-3B-Base
Viewer • Updated • 6.5k • 44
RULER Datasets Lamma3-Instruct
RULER Datasets
RULER Datasets Qwen2.5-Instruct
RULER Datasets
RULER Datasets Qwen-3-Instruct
RULER Datasets
RULER Datasets Qwen-3
RULER Datasets
agents
Agents ressources
All the ressources I found / used when getting up to speed with agents.