Locket: Robust Feature-Locking Technique for Language Models Paper • 2510.12117 • Published Oct 14, 2025 • 1 • 2
Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense Paper • 2502.00840 • Published Feb 2, 2025 • 3