·
AI & ML interests
Deep RL, NLP
Organizations
asparius/Qwen2.5-7B-Instruct-GRPO-1ep-iter16
Text Generation
• 8B • Updated • 3
asparius/Qwen2.5-7B-Instruct-GRPO-1ep-iter8
Text Generation
• 8B • Updated • 2
asparius/Qwen2.5-1.5B-Instruct-CFPO-1ep-iter16
Text Generation
• 2B • Updated • 3
asparius/Qwen2.5-7B-Instruct-GRPO-1ep-iter4
Text Generation
• 8B • Updated • 2
asparius/Qwen2.5-1.5B-GRPO-1ep-iter4-prompt
Text Generation
• 2B • Updated • 6
asparius/Qwen2.5-7B-Instruct-GRPO-1ep-iter2
Text Generation
• 8B • Updated • 2
asparius/Qwen2.5-7B-Instruct-CFPO-1ep-iter16
Text Generation
• 8B • Updated • 1
asparius/Qwen2.5-3B-Instruct-CFPO-1ep-iter16
Text Generation
• 3B • Updated • 3
asparius/Qwen2.5-7B-Instruct-CFPO-1ep-iter8
Text Generation
• 8B • Updated • 3
asparius/Qwen2.5-7B-Instruct-CFPO-1ep-iter4
Text Generation
• 8B • Updated • 1
asparius/Qwen2.5-7B-Instruct-CFPO-1ep-iter2
Text Generation
• 8B • Updated • 1
asparius/Qwen2.5-3B-Instruct-CFPO-1ep-iter8
Text Generation
• 3B • Updated • 2
asparius/Qwen2.5-3B-Instruct-CFPO-1ep-iter4
Text Generation
• 3B • Updated • 1
asparius/Qwen2.5-3B-Instruct-GRPO-1ep-iter16
Text Generation
• 3B • Updated • 5
asparius/Qwen2.5-1.5B-Instruct-GRPO-1ep-iter16
Text Generation
• 2B • Updated • 6
asparius/Qwen2.5-3B-Instruct-GRPO-1ep-iter8
Text Generation
• 3B • Updated • 1
asparius/Qwen2.5-1.5B-Instruct-CFPO-1ep-iter8
Text Generation
• 2B • Updated • 1
asparius/Qwen2.5-1.5B-Instruct-GRPO-1ep-iter8
Text Generation
• 2B • Updated • 3
asparius/Qwen2.5-1.5B-Instruct-CFPO-1ep-iter4
Text Generation
• 2B • Updated • 3
asparius/Qwen2.5-3B-Instruct-CFPO-1ep-iter2
Text Generation
• 3B • Updated • 2
asparius/Qwen2.5-3B-Instruct-GRPO-1ep-iter4
Text Generation
• 3B • Updated • 2
asparius/Qwen2.5-1.5B-Instruct-CFPO-1ep-iter2
Text Generation
• 2B • Updated • 2
asparius/Qwen2.5-3B-Instruct-GRPO-1ep-iter2
Text Generation
• 3B • Updated • 1
asparius/Qwen2.5-1.5B-Instruct-GRPO-1ep-iter4
Text Generation
• 2B • Updated • 7
asparius/Llama3.2-3B-GRPO-1ep-iter8
Text Generation
• 3B • Updated • 4
asparius/Qwen2.5-1.5B-Instruct-GRPO-1ep-iter2
Text Generation
• 2B • Updated • 3
asparius/Llama3.2-3B-GRPO-1ep-iter4
Text Generation
• 3B • Updated • 2
asparius/Llama3.2-3B-GRPO-1ep-iter2
Text Generation
• 3B • Updated • 2
asparius/Qwen2.5-14B-GRPO-1ep-iter8
Text Generation
• 15B • Updated • 2
asparius/Qwen2.5-14B-GRPO-1ep-iter4
Text Generation
• 15B • Updated • 2