CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning Paper • 2509.20712 • Published Sep 25, 2025 • 19
RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning Paper • 2507.07451 • Published Jul 10, 2025 • 5
Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval Paper • 2505.19650 • Published May 26, 2025 • 5
Leanabell-Prover: Posttraining Scaling in Formal Reasoning Paper • 2504.06122 • Published Apr 8, 2025 • 5
OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment Paper • 2502.18965 • Published Feb 26, 2025 • 28