Cyril666/whisper-large-v3-encoder Automatic Speech Recognition • 0.6B • Updated 3 days ago • 118
Cyril666/whisper-large-v3-encoder Automatic Speech Recognition • 0.6B • Updated 3 days ago • 118
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published 9 days ago • 19
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing Paper • 2512.16864 • Published 9 days ago • 10
ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement Paper • 2512.13303 • Published 12 days ago • 16