Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published 4 days ago • 59
r2e-edits/sonnet-end2end-r2e-dev_100pr_combined-maxstep40_context32k-sft_exitreason-agent-v1 Viewer • Updated Apr 13 • 6.44k • 18 • 1
Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents Paper • 2505.02156 • Published May 4 • 18
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models Paper • 2504.10368 • Published Apr 14 • 21
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models Paper • 2504.10368 • Published Apr 14 • 21 • 3
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization Paper • 2411.06208 • Published Nov 9, 2024 • 21 • 8
DeepSolution: Boosting Complex Engineering Solution Design via Tree-based Exploration and Bi-point Thinking Paper • 2502.20730 • Published Feb 28 • 38
SoFA: Shielded On-the-fly Alignment via Priority Rule Following Paper • 2402.17358 • Published Feb 27, 2024 • 1
DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling Paper • 2412.04905 • Published Dec 6, 2024 • 9