MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe Paper โข 2509.18154 โข Published Sep 16 โข 51
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Paper โข 2410.10594 โข Published Oct 14, 2024 โข 28
GUICourse: From General Vision Language Models to Versatile GUI Agents Paper โข 2406.11317 โข Published Jun 17, 2024 โข 1