Computer-Use Agents as Judges for Generative User Interface Paper • 2511.15567 • Published Nov 19 • 52
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published Nov 4 • 101
Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published Oct 20 • 67
view reply ok, i have found something, VLM are like taking 1 min to decode the tokens and Imagesso i guess it is a terrible idea (for now) : )
Vision Models (GGUF) Collection How to use: Download a "mmproj" model file + one or more of the primary model files. • 5 items • Updated Dec 22, 2023 • 45
second-state/All-MiniLM-L6-v2-Embedding-GGUF Feature Extraction • 22.6M • Updated May 1, 2024 • 7.77k • 17