zembed-1 (Q4_K_M GGUF)
This is a 4-bit quantized GGUF version of the highly performant zeroentropy/zembed-1 embedding model.
Model Details
- Original Model: zeroentropy/zembed-1
- Architecture: Qwen3 (4 Billion Parameters)
- Quantization: Q4_K_M
- Max Context: 32,768 tokens
- Output Dimensions: 2560 (native llama.cpp pooling)
Usage with llama-cpp-python
from llama_cpp import Llama
import numpy as np
# Load the model in embedding mode
llm = Llama(model_path="zembed-1-Q4_K_M.gguf", embedding=True)
# Generate an embedding
text = "How do I optimize a local LLM to run smoothly?"
embedding = llm.embed(text)
- Downloads last month
- 121
Hardware compatibility
Log In to add your hardware
4-bit