Convergent Evolution: How Different Language Models Learn Similar Number Representations Paper • 2604.20817 • Published 6 days ago • 7
Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL Paper • 2604.17073 • Published 10 days ago • 9
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts Paper • 2604.19835 • Published 7 days ago • 17
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges Paper • 2604.13602 • Published 13 days ago • 29
DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off Paper • 2604.13902 • Published 13 days ago • 61
Where does output diversity collapse in post-training? Paper • 2604.16027 • Published 11 days ago • 22
QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies Paper • 2604.15151 • Published 12 days ago • 15
Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning Paper • 2604.16029 • Published 11 days ago • 23
Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips Paper • 2502.07408 • Published 12 days ago • 57
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval Paper • 2604.18584 • Published 8 days ago • 14
Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration Paper • 2604.18131 • Published 8 days ago • 9
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play Paper • 2604.17696 • Published 8 days ago • 6
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents Paper • 2604.17308 • Published 9 days ago • 22
GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification Paper • 2604.14258 • Published 13 days ago • 23
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Paper • 2604.18292 • Published 8 days ago • 80
What do Language Models Learn and When? The Implicit Curriculum Hypothesis Paper • 2604.08510 • Published 19 days ago • 4