view article Article Llama‑Embed‑Nemotron‑8B Text Embedding Model Ranks First on Multilingual MTEB Leaderboard Oct 21, 2025 • 14
ViDoRe Benchmark V3 Collection ViDoRe V3 is our latest benchmark, engineered to set a new industry gold standard for multi-modal, enterprise document retrieval evaluation. • 8 items • Updated Nov 5, 2025 • 16
view article Article ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases Nov 5, 2025 • 57
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing Paper • 1808.06226 • Published Aug 19, 2018 • 3
ModernVBERT: Towards Smaller Visual Document Retrievers Paper • 2510.01149 • Published Oct 1, 2025 • 30
view article Article *Context Is Gold to Find the Gold Passage*: Evaluating and Training Contextual Document Embeddings Jun 2, 2025 • 27
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4, 2025 • 253
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 Jul 5, 2024 • 306
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering Paper • 2409.06595 • Published Sep 10, 2024 • 38