vllm / sglang support?

by mtcl - opened 15 days ago

Discussion

mtcl

15 days ago

is there a support for sglang/vllm?

macandchiz

15 days ago

vLLM are working on it as per their Github.

cpatonn

15 days ago

I think the PR is just merged an hour ago.

mtcl

15 days ago

i hope official instructions are updated in the docs soon here.

macandchiz

15 days ago

•

edited 15 days ago

Here is a custom vllm image I've built. It works as intended: https://hub.docker.com/r/infantryman77/vllm-gemma4. Tested with Cline and Open-Webui. Not completely production ready for it works.

services:
  vllm:
    image: infantryman77/vllm-gemma4:nightly-20260402
    container_name: gemma4
    command:
      - /models/gemma-4-31B-it-AWQ-8bit
      - --served-model-name
      - gemma4-31b
      - --max-model-len
      - "131072"
      - --tensor-parallel-size
      - "4"
      - --gpu-memory-utilization
      - "0.97"
      - --reasoning-parser
      - gemma4
      - --enable-auto-tool-choice
      - --tool-call-parser
      - gemma4
      - --host
      - 0.0.0.0
      - --limit-mm-per-prompt
      - '{"image":4}'
      - --max-num-batched-tokens
      - "2096"
      - --max-num-seqs
      - "4"
      - --port
      - "8080"
      - --disable-custom-all-reduce
      - --override-generation-config
      - '{"temperature":1.0,"top_p":0.95,"top_k":64}'
    volumes:
      - /home/infantryman/vllm/models:/models
    ports:
      - "8080:8080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    environment:
      - PYTORCH_ALLOC_CONF=expandable_segments:True
      - LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:/usr/local/cuda/lib64
      - OMP_NUM_THREADS=1
      - PYTHONWARNINGS=ignore::FutureWarning
      - VLLM_WORKER_MULTIPROC_METHOD=spawn
    ipc: host
    restart: unless-stopped

Just-Bax

14 days ago

•

edited 14 days ago

Official VLLM doc on how to use gemma 4 models: https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html

VarunGumma

2 days ago

is SGLang support available?

sonali-kumari11

Google org 1 day ago

Hi all,

Yes, both vLLM and SGLang offer official support for Gemma 4. Just make sure you're running the latest versions of these frameworks to handle the new model architecture and tokenizers correctly. For implementation details and setup, you can check out these official resources:

vLLM official guide: https://docs.vllm.ai/projects/recipes/en/latest/Google/Gemma4.html
Gemma 4 Optimized Support & NVFP4 Integration: https://github.com/sgl-project/sglang/issues/22129

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment