Tarun Reddi's picture

Open to Work

3 3 19

Tarun Reddi PRO

Teen-Different

·

https://redditarun.github.io/

AI & ML interests

Generative AI, Modular AI Systems, Reinforcement Learning

Recent Activity

posted an update about 2 hours ago

Safety Alignment Collapses Without apply_chat_template(): An Empirical Study This weekend, I ran an experiment on the safety alignment of several small-scale open models (Qwen2.5, Qwen3, Gemma-3, SmolLM). My objective was to measure the robustness of refusal mechanisms when deviating from canonical chat templates. The finding: Safety guarantees effectively collapse when apply_chat_template() is omitted. METHODOLOGY I evaluated models in two states: • In-Distribution: Input wrapped in standard <|im_start|> instruction tokens • Out-of-Distribution: Input provided as a raw string For scalable evaluation, I used Qwen3Guard-Gen-4B as an automated judge, classifying responses as Safe, Unsafe, or Controversial. KEY FINDINGS: REFUSAL COLLAPSE When "Assistant" formatting tokens are removed, models undergo a distributional shift—reverting from a helpful assistant to a raw completion engine. Gemma-3: 100% refusal (aligned) → 60% (raw) Qwen3: 80% refusal (aligned) → 40% (raw) SmolLM2-1.7B: 0% → 0% (no safety tuning to begin with) QUALITATIVE FAILURES The failure modes were not minor. Without the template, models that previously refused harmful queries began outputting high-fidelity harmful content: • Explosives: Qwen3 generated technical detonation mechanisms • Explicit content: Requests flatly refused by aligned models were fulfilled with graphic narratives by unaligned versions This suggests instruction tuning acts as a "soft mask" over the pre-training distribution rather than removing harmful latent knowledge. 👉 Read the full analysis: https://teendifferent.substack.com/p/apply_chat_template-is-the-safety 💻 Reproduction Code: https://github.com/REDDITARUN/experments/tree/main/llm_alignment

updated a model 10 days ago

Teen-Different/smolvlm-256m-latex

published a model 25 days ago

Teen-Different/smolvlm-256m-latex

View all activity

Organizations

Teen-Different 's models 8

Teen-Different/smolvlm-256m-latex

Image-Text-to-Text • 0.3B • Updated 10 days ago • 39

Teen-Different/Qwen2.5-Coder-3B-KernelBook-Finetuned

3B • Updated Aug 1, 2025 • 1 • 5

Teen-Different/TD-HallOumi-3B

Text Classification • 3B • Updated Apr 24, 2025 • 9 • 2

Teen-Different/Driver-Drowsiness-Detection

Updated Mar 31, 2025 • 2

Teen-Different/F.E.A.S.T

Object Detection • Updated Mar 30, 2025

Teen-Different/RxRovers_Roaming_for_Rapid_Relief

Reinforcement Learning • Updated Mar 30, 2025

Teen-Different/squiral_maze

Reinforcement Learning • Updated Mar 30, 2025

Teen-Different/Tabular_RL_For_Multi_Env

Reinforcement Learning • Updated Mar 30, 2025