RADIO Collection A collection of Foundation Vision Models that combine multiple models (CLIP, DINOv2, SAM, etc.). β’ 16 items β’ Updated 4 days ago β’ 26
GLM-4.5 Collection GLM-4.5: An open-source large language model designed for intelligent agents by Z.ai β’ 11 items β’ Updated Aug 11 β’ 250
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity Paper β’ 2503.07677 β’ Published Mar 10 β’ 86
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org β’ 12 items β’ Updated 4 days ago β’ 140
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper β’ 2412.14171 β’ Published Dec 18, 2024 β’ 24
DateLogicQA: Benchmarking Temporal Biases in Large Language Models Paper β’ 2412.13377 β’ Published Dec 17, 2024 β’ 3
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper β’ 2412.15213 β’ Published Dec 19, 2024 β’ 28
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities Paper β’ 2412.14123 β’ Published Dec 18, 2024 β’ 11
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper β’ 2412.14161 β’ Published Dec 18, 2024 β’ 51
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. β’ 26 items β’ Updated May 1 β’ 574
HelpSteer2-Preference: Complementing Ratings with Preferences Paper β’ 2410.01257 β’ Published Oct 2, 2024 β’ 24
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. β’ 6 items β’ Updated 4 days ago β’ 155
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders Paper β’ 2407.13036 β’ Published Jul 17, 2024 β’ 4
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper β’ 2409.12568 β’ Published Sep 19, 2024 β’ 50
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy +4 Sep 18, 2024 β’ 272
ColPali: Efficient Document Retrieval with Vision Language Models Paper β’ 2407.01449 β’ Published Jun 27, 2024 β’ 49
Qwen2-VL Collection Vision-language model series based on Qwen2 β’ 16 items β’ Updated Jul 21 β’ 226