Unleashing the Power of Visual Prompting At the Pixel Level Paper โข 2212.10556 โข Published Dec 20, 2022
CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions Paper โข 2411.16828 โข Published Nov 25, 2024 โข 1
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Paper โข 2505.04601 โข Published May 7 โข 29
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Paper โข 2509.01644 โข Published Sep 1 โข 33
Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers Paper โข 2509.24317 โข Published Sep 29 โข 10
Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers Paper โข 2509.24317 โข Published Sep 29 โข 10
Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers Paper โข 2509.24317 โข Published Sep 29 โข 10 โข 2
Running 41 Leaderboard: Physical Reasoning from Video ๐ 41 Submit model evaluations and view leaderboard results