Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published 7 days ago • 18
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 8 days ago • 74
Sparse Auto-Encoders (SAEs) for Mechanistic Interpretability Collection A compilation of sparse auto-encoders trained on large language models. • 37 items • Updated 9 days ago • 14
Nemotron-Cascade Collection Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models • 17 items • Updated 2 days ago • 36
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 193
ProjectTest: A Project-level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms Paper • 2502.06556 • Published Feb 10 • 3
MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks Paper • 2507.12284 • Published Jul 16 • 7
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? Paper • 2309.08963 • Published Sep 16, 2023 • 12
Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs Paper • 2508.15878 • Published Aug 21 • 1
Compact Neural Graphics Primitives with Learned Hash Probing Paper • 2312.17241 • Published Dec 28, 2023 • 8
From Theory to Practice: Plug and Play with Succinct Data Structures Paper • 1311.1249 • Published Nov 5, 2013 • 1
Health system learning achieves generalist neuroimaging models Paper • 2511.18640 • Published Nov 23 • 3
Pre-trained knowledge elevates large language models beyond traditional chemical reaction optimizers Paper • 2509.00103 • Published Aug 27 • 1
CHESS: Contextual Harnessing for Efficient SQL Synthesis Paper • 2405.16755 • Published May 27, 2024 • 2
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data Paper • 2406.14546 • Published Jun 20, 2024 • 3