Phantomcloak19/safe-grpo-qlora-Qwen3-4B-long-saftey-grpo-mixed-llm-sources Text Generation • Updated 9 days ago • 17
Phantomcloak19/TV-CGRPO-Qwen2-5-3B-Instruct_no_advantage_adj-QLoRA-TRL 3B • Updated 10 days ago • 18
Phantomcloak19/safe-grpo-qlora-Qwen3-4B-long-saftey-grpo-mixed-merged Text Generation • Updated 11 days ago • 9
Phantomcloak19/TV-CGRPO-Qwen2-5-3B-Instruct_no_lagrangian-QLoRA-TRL 3B • Updated 11 days ago • 14
Phantomcloak19/safe-grpo-qlora-Qwen2.5-3B-Instruct-long-saftey-grpo-mixed-llm-sources Text Generation • Updated 13 days ago • 18
Phantomcloak19/safe-grpo-qlora-Qwen2.5-3B-Instruct-long-saftey-grpo-mixed-merged Text Generation • Updated 13 days ago • 11
Phantomcloak19/safe-grpo-qlora-gemma-2-2b-it-long-saftey-grpo-mixed-llm-sources Text Generation • Updated 14 days ago • 16
Phantomcloak19/safe-grpo-qlora-gemma-2-2b-it-long-saftey-grpo-mixed-merged Text Generation • Updated 14 days ago • 19