Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Open to Work
3
3
19
Tarun Reddi
PRO
Teen-Different
Follow
charvi020's profile picture
1 follower
·
15 following
https://redditarun.github.io/
_TeenDifferent
REDDITARUN
tarunreddi
AI & ML interests
Generative AI, Modular AI Systems, Reinforcement Learning
Recent Activity
posted
an
update
about 2 hours ago
Safety Alignment Collapses Without apply_chat_template(): An Empirical Study This weekend, I ran an experiment on the safety alignment of several small-scale open models (Qwen2.5, Qwen3, Gemma-3, SmolLM). My objective was to measure the robustness of refusal mechanisms when deviating from canonical chat templates. The finding: Safety guarantees effectively collapse when apply_chat_template() is omitted. METHODOLOGY I evaluated models in two states: • In-Distribution: Input wrapped in standard <|im_start|> instruction tokens • Out-of-Distribution: Input provided as a raw string For scalable evaluation, I used Qwen3Guard-Gen-4B as an automated judge, classifying responses as Safe, Unsafe, or Controversial. KEY FINDINGS: REFUSAL COLLAPSE When "Assistant" formatting tokens are removed, models undergo a distributional shift—reverting from a helpful assistant to a raw completion engine. Gemma-3: 100% refusal (aligned) → 60% (raw) Qwen3: 80% refusal (aligned) → 40% (raw) SmolLM2-1.7B: 0% → 0% (no safety tuning to begin with) QUALITATIVE FAILURES The failure modes were not minor. Without the template, models that previously refused harmful queries began outputting high-fidelity harmful content: • Explosives: Qwen3 generated technical detonation mechanisms • Explicit content: Requests flatly refused by aligned models were fulfilled with graphic narratives by unaligned versions This suggests instruction tuning acts as a "soft mask" over the pre-training distribution rather than removing harmful latent knowledge. 👉 Read the full analysis: https://teendifferent.substack.com/p/apply_chat_template-is-the-safety 💻 Reproduction Code: https://github.com/REDDITARUN/experments/tree/main/llm_alignment
updated
a model
10 days ago
Teen-Different/smolvlm-256m-latex
published
a model
25 days ago
Teen-Different/smolvlm-256m-latex
View all activity
Organizations
Teen-Different
's models
8
Sort: Recently updated
Teen-Different/smolvlm-256m-latex
Image-Text-to-Text
•
0.3B
•
Updated
10 days ago
•
39
Teen-Different/Qwen2.5-Coder-3B-KernelBook-Finetuned
3B
•
Updated
Aug 1, 2025
•
1
•
5
Teen-Different/TD-HallOumi-3B
Text Classification
•
3B
•
Updated
Apr 24, 2025
•
9
•
2
Teen-Different/Driver-Drowsiness-Detection
Updated
Mar 31, 2025
•
2
Teen-Different/F.E.A.S.T
Object Detection
•
Updated
Mar 30, 2025
Teen-Different/RxRovers_Roaming_for_Rapid_Relief
Reinforcement Learning
•
Updated
Mar 30, 2025
Teen-Different/squiral_maze
Reinforcement Learning
•
Updated
Mar 30, 2025
Teen-Different/Tabular_RL_For_Multi_Env
Reinforcement Learning
•
Updated
Mar 30, 2025