SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models Paper • 2509.15661 • Published Sep 19, 2025 • 2 • 1