DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 4 days ago • 190
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents Paper • 2605.05185 • Published 18 days ago • 99
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published 21 days ago • 162
Leveraging Verifier-Based Reinforcement Learning in Image Editing Paper • 2604.27505 • Published 24 days ago • 57
AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents Paper • 2604.02947 • Published Apr 3 • 19
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 629
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published Mar 27 • 364
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published Apr 2 • 101
The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models Paper • 2604.04155 • Published Apr 5 • 12
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 503
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 342
MOOZY: A Patient-First Foundation Model for Computational Pathology Paper • 2603.27048 • Published Mar 27 • 6
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published Feb 11 • 220
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise Paper • 2602.12783 • Published Feb 13 • 246
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published Feb 11 • 244
TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents Paper • 2602.07274 • Published Feb 6 • 210