SpotEdit: Selective Region Editing in Diffusion Transformers Paper • 2512.22323 • Published 15 days ago • 37
LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published 18 days ago • 53
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion Paper • 2512.19678 • Published 19 days ago • 29
DeContext as Defense: Safe Image Editing in Diffusion Transformers Paper • 2512.16625 • Published 23 days ago • 24
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published 25 days ago • 67
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published Nov 24, 2025 • 31
Artificial Hippocampus Networks for Efficient Long-Context Modeling Paper • 2510.07318 • Published Oct 8, 2025 • 30
SparseD: Sparse Attention for Diffusion Language Models Paper • 2509.24014 • Published Sep 28, 2025 • 30
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Paper • 2505.18675 • Published May 24, 2025 • 26
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding Paper • 2505.16990 • Published May 22, 2025 • 22
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering Paper • 2503.16422 • Published Mar 20, 2025 • 14