EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing Paper • 2512.06065 • Published 26 days ago • 28
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images Paper • 2511.22805 • Published Nov 27 • 3
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images Paper • 2511.22805 • Published Nov 27 • 3 • 2
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Paper • 2509.26625 • Published Sep 30 • 43
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Paper • 2509.26625 • Published Sep 30 • 43 • 2
PersonaFeedback: A Large-scale Human-annotated Benchmark For Personalization Paper • 2506.12915 • Published Jun 15 • 20
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper • 2506.09991 • Published Jun 11 • 55
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model Paper • 2503.16282 • Published Mar 20 • 5
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation Paper • 2410.00890 • Published Oct 1, 2024 • 20
3D-GPT: Procedural 3D Modeling with Large Language Models Paper • 2310.12945 • Published Oct 19, 2023 • 59