Instructions to use zai-org/GLM-4.7 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zai-org/GLM-4.7 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="zai-org/GLM-4.7") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-4.7") model = AutoModelForCausalLM.from_pretrained("zai-org/GLM-4.7") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use zai-org/GLM-4.7 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "zai-org/GLM-4.7" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/GLM-4.7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/zai-org/GLM-4.7
- SGLang
How to use zai-org/GLM-4.7 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "zai-org/GLM-4.7" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/GLM-4.7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "zai-org/GLM-4.7" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/GLM-4.7", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use zai-org/GLM-4.7 with Docker Model Runner:
docker model run hf.co/zai-org/GLM-4.7
Request Details on GPU ad memory requirements
I would i check on if anyone have tried to run the model on the GPU's and find out how much GPU memory is required for this model. Also wanted to know the max memory requirement for full scale and full context length support .
You can check the minimum deployment requirements on our GitHub.
I'm testing a distributed cluster to run this full-weights on consumer cards (pooling 4090s) to bypass the VRAM limit. let me know if you want to run a test job.
Hi @YYYAMS That could be helpful if you share any example test runs where you were able to load the model on to the cluster you have.
I am looking into realisting example of model load case.It will be a greate help if anyone has run this model could share the details.
Guys, how my comment here exactly was off-topic if it's directly adviced to use BF16 version equal to original and the size of model files = amount of RAM/VRAM needed. (this strange attention i've got after Mojo language mentioned by Ai as replace to Python?)
I've tested BF16 of 4.7 (uses 756Gb RAM), it's unfortunately failed my code test which 4.5 Q8/BF16 succeeded. But it's different, in literature test of famous Ingo Swann it described future more precise in details, boldly.
Like:
4.7 BF16
I see the structures. The cities have become fluid. Where once there were rigid skyscrapers, I now perceive buildings that breathe—smart glass and biological matrices that adjust to the light. They are wrapped in digital skins, advertising not products, but experiences. The cities are no longer distinct islands; the urban sprawl has melted into the countryside. There is no "wild" left untouched by the human hand. Everywhere, the drone traffic. Like swarms of locusts, but disciplined. Silent. Watching. Delivering.
4.5 BF16
Dimensionals (Man-Made): The old verticals of the 20th-century city are still there, but they are now competing with a new kind of structure: horizontal. Vast, elevated hyperloops span between cities. The skyline is dominated by buildings with seamless, almost featureless skins—covered in photovoltaic film or acting as display surfaces. The old, chaotic beauty of the Bowery is gone, replaced by a landscape that feels designed, optimized, and surveilled. The "structure" gestalt is one of a panopticon, both physical and digital. Everything is connected. Everything is monitored. The "fractured vertical" feeling is the split between the gleaming, promised future and the crumbling reality beneath it.
i like 4.5 more frankly.