PeterLee6094
's Collections
HF Daily
updated
Large Language Diffusion Models
Paper
•
2502.09992
•
Published
•
123
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper
•
2502.10391
•
Published
•
34
Diverse Inference and Verification for Advanced Reasoning
Paper
•
2502.09955
•
Published
•
18
Selective Self-to-Supervised Fine-Tuning for Generalization in Large
Language Models
Paper
•
2502.08130
•
Published
•
9
Jailbreaking to Jailbreak
Paper
•
2502.09638
•
Published
•
6
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
Paper
•
2502.11089
•
Published
•
166
ReLearn: Unlearning via Learning for Large Language Models
Paper
•
2502.11190
•
Published
•
30
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on
Continual Pre-Training
Paper
•
2502.11196
•
Published
•
23
CRANE: Reasoning with constrained LLM generation
Paper
•
2502.09061
•
Published
•
21
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual
Reasoning in Mathematical LLMs
Paper
•
2502.10454
•
Published
•
7
Dyve: Thinking Fast and Slow for Dynamic Process Verification
Paper
•
2502.11157
•
Published
•
7
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated
Fact-Checking
Paper
•
2502.09083
•
Published
•
4
Continuous Diffusion Model for Language Modeling
Paper
•
2502.11564
•
Published
•
53
Rethinking Diverse Human Preference Learning through Principal Component
Analysis
Paper
•
2502.13131
•
Published
•
37
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety
Guardrails in Large Language Models
Paper
•
2502.12464
•
Published
•
28
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly
Possess Test-Time Scaling Capabilities?
Paper
•
2502.12215
•
Published
•
16
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Paper
•
2502.12574
•
Published
•
13
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
Paper
•
2502.12659
•
Published
•
7
Injecting Domain-Specific Knowledge into Large Language Models: A
Comprehensive Survey
Paper
•
2502.10708
•
Published
•
4
Qwen2.5-VL Technical Report
Paper
•
2502.13923
•
Published
•
212
On the Trustworthiness of Generative Foundation Models: Guideline,
Assessment, and Perspective
Paper
•
2502.14296
•
Published
•
45
Small Models Struggle to Learn from Strong Reasoners
Paper
•
2502.12143
•
Published
•
39
LongPO: Long Context Self-Evolution of Large Language Models through
Short-to-Long Preference Optimization
Paper
•
2502.13922
•
Published
•
27
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
•
2502.14499
•
Published
•
193
From RAG to Memory: Non-Parametric Continual Learning for Large Language
Models
Paper
•
2502.14802
•
Published
•
13
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper
•
2506.18254
•
Published
•
31
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language
Models
Paper
•
2506.18369
•
Published
•
2
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement
Learning
Paper
•
2506.18841
•
Published
•
56
Phantom-Data : Towards a General Subject-Consistent Video Generation
Dataset
Paper
•
2506.18851
•
Published
•
30
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
•
2506.18896
•
Published
•
29
Robust Reward Modeling via Causal Rubrics
Paper
•
2506.16507
•
Published
•
9
SRFT: A Single-Stage Method with Supervised and Reinforcement
Fine-Tuning for Reasoning
Paper
•
2506.19767
•
Published
•
15
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
Paper
•
2506.20512
•
Published
•
48
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper
•
2506.20495
•
Published
•
9
MMSearch-R1: Incentivizing LMMs to Search
Paper
•
2506.20670
•
Published
•
64
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Paper
•
2506.21506
•
Published
•
51
Deep Researcher with Test-Time Diffusion
Paper
•
2507.16075
•
Published
•
67
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI
Agents
Paper
•
2507.19478
•
Published
•
31
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper
•
2507.19457
•
Published
•
28
Agentic Reinforced Policy Optimization
Paper
•
2507.19849
•
Published
•
158
A Survey of Self-Evolving Agents: On Path to Artificial Super
Intelligence
Paper
•
2507.21046
•
Published
•
82
Geometric-Mean Policy Optimization
Paper
•
2507.20673
•
Published
•
31
Goal Alignment in LLM-Based User Simulators for Conversational AI
Paper
•
2507.20152
•
Published
•
4
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
Paper
•
2507.16806
•
Published
•
6
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper
•
2507.21183
•
Published
•
14
Persona Vectors: Monitoring and Controlling Character Traits in Language
Models
Paper
•
2507.21509
•
Published
•
32
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced
Multimodal Reasoning
Paper
•
2507.22607
•
Published
•
46
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Paper
•
2507.21802
•
Published
•
17