FlowRL: Matching Reward Distributions for LLM Reasoning Paper • 2509.15207 • Published Sep 18, 2025 • 114
COSMOS: Predictable and Cost-Effective Adaptation of LLMs Paper • 2505.01449 • Published Apr 30, 2025 • 3
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training Paper • 2505.00358 • Published May 1, 2025 • 26