zembed-1 (Q4_K_M GGUF)

This is a 4-bit quantized GGUF version of the highly performant zeroentropy/zembed-1 embedding model.

Model Details

Original Model: zeroentropy/zembed-1
Architecture: Qwen3 (4 Billion Parameters)
Quantization: Q4_K_M
Max Context: 32,768 tokens
Output Dimensions: 2560 (native llama.cpp pooling)

Usage with `llama-cpp-python`

from llama_cpp import Llama
import numpy as np

# Load the model in embedding mode
llm = Llama(model_path="zembed-1-Q4_K_M.gguf", embedding=True)

# Generate an embedding
text = "How do I optimize a local LLM to run smoothly?"
embedding = llm.embed(text)

Downloads last month: 121

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

Model tree for Abiray/zembed-1-Q4_K_M-GGUF

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

zeroentropy/zembed-1

Quantized

(3)

this model

zembed-1 (Q4_K_M GGUF)

Model Details

Usage with llama-cpp-python

Model tree for Abiray/zembed-1-Q4_K_M-GGUF

Usage with `llama-cpp-python`