Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video Paper • 2604.07786 • Published 20 days ago • 6
HandVQA: Diagnosing and Improving Fine-Grained Spatial Reasoning about Hands in Vision-Language Models Paper • 2603.26362 • Published Mar 27
LighthouseGS: Indoor Structure-aware 3D Gaussian Splatting for Panorama-Style Mobile Captures Paper • 2507.06109 • Published Jul 8, 2025
Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions Paper • 2506.00421 • Published May 31, 2025 • 5
On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices Paper • 2502.04363 • Published Feb 5, 2025 • 12
Response Tuning: Aligning Large Language Models without Instruction Paper • 2410.02465 • Published Oct 3, 2024 • 13