zembed-1 (Q4_K_M GGUF)

This is a 4-bit quantized GGUF version of the highly performant zeroentropy/zembed-1 embedding model.

Model Details

  • Original Model: zeroentropy/zembed-1
  • Architecture: Qwen3 (4 Billion Parameters)
  • Quantization: Q4_K_M
  • Max Context: 32,768 tokens
  • Output Dimensions: 2560 (native llama.cpp pooling)

Usage with llama-cpp-python

from llama_cpp import Llama
import numpy as np

# Load the model in embedding mode
llm = Llama(model_path="zembed-1-Q4_K_M.gguf", embedding=True)

# Generate an embedding
text = "How do I optimize a local LLM to run smoothly?"
embedding = llm.embed(text)
Downloads last month
121
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Abiray/zembed-1-Q4_K_M-GGUF

Finetuned
Qwen/Qwen3-4B
Quantized
(3)
this model