Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence? Paper ⢠2604.03016 ⢠Published 14 days ago ⢠37
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 15 days ago ⢠853
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting Paper ⢠2603.25745 ⢠Published 22 days ago ⢠15
Group3D: MLLM-Driven Semantic Grouping for Open-Vocabulary 3D Object Detection Paper ⢠2603.21944 ⢠Published 25 days ago ⢠26
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper ⢠2603.21986 ⢠Published 25 days ago ⢠123
Hidden Dynamics of Massive Activations in Transformer Training Paper ⢠2508.03616 ⢠Published Aug 5, 2025 ⢠19
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation Paper ⢠2410.17799 ⢠Published Oct 23, 2024 ⢠12
Grounding World Simulation Models in a Real-World Metropolis Paper ⢠2603.15583 ⢠Published Mar 16 ⢠153
FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning Paper ⢠2401.08553 ⢠Published Jan 16, 2024 ⢠2