Update installation instructions

#29

by owao - opened Mar 31

base: refs/heads/main

←

from: refs/pr/29

Discussion Files changed

-7

owao

Mar 31

•

edited Mar 31

Couldn't get it work when installing vllm before vllm-omni encountering this:

ImportError: /mnt/storage/workspaces/voxtral-tts/.venv/lib/python3.10/site-packages/vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib

Everything good if installing vllm-omni before vllm (at least when using python 3.10)

Update installation instructions26a71588

owao

Mar 31

By the way, incredible model, thanks for offering it to the community <3

patrickvonplaten

Mistral AI_ org Mar 31

Updated them with latest version of vllm-omni - can you try again? :-)

owao

Mar 31

Hey @patrickvonplaten I don't understand, how the latest pypi version could be in advance compared to vllm-omni main at github?
Anyway, I'll try again a bit later today and let you know of course

owao

Mar 31

Nope, still the issue, all the command ran in a row:

/m/s/w/Voxtral-4B-TTS-2603 ❯❯❯ uv venv .venv --python=3.10
Using CPython 3.10.19
Creating virtual environment at: .venv
Activate with: source .venv/bin/activate

/m/s/w/Voxtral-4B-TTS-2603 ❯❯❯ source .venv/bin/activate.fish

(.venv) /m/s/w/Voxtral-4B-TTS-2603 ❯❯❯ python --version
Python 3.10.19

(.venv) /m/s/w/Voxtral-4B-TTS-2603 ❯❯❯ uv pip install -U vllm
Resolved 175 packages in 367ms
Prepared 175 packages in 6ms
Installed 175 packages in 249ms
 + aiohappyeyeballs==2.6.1
 + aiohttp==3.13.4
 + aiosignal==1.4.0
 + annotated-doc==0.0.4
 + annotated-types==0.7.0
 + anthropic==0.86.0
 + anyio==4.13.0
 + apache-tvm-ffi==0.1.9
 + astor==0.8.1
 + async-timeout==5.0.1
 + attrs==26.1.0
 + blake3==1.0.8
 + cachetools==7.0.5
 + cbor2==5.9.0
 + certifi==2026.2.25
 + cffi==2.0.0
 + charset-normalizer==3.4.6
 + click==8.3.1
 + cloudpickle==3.1.2
 + compressed-tensors==0.13.0
 + cryptography==46.0.6
 + cuda-bindings==12.9.4
 + cuda-pathfinder==1.5.0
 + cuda-python==12.9.4
 + depyf==0.20.0
 + dill==0.4.1
 + diskcache==5.6.3
 + distro==1.9.0
 + dnspython==2.8.0
 + docstring-parser==0.17.0
 + einops==0.8.2
 + email-validator==2.3.0
 + exceptiongroup==1.3.1
 + fastapi==0.135.2
 + fastapi-cli==0.0.24
 + fastapi-cloud-cli==0.15.1
 + fastar==0.9.0
 + filelock==3.25.2
 + flashinfer-python==0.6.6
 + frozenlist==1.8.0
 + fsspec==2026.3.0
 + gguf==0.18.0
 + googleapis-common-protos==1.73.1
 + grpcio==1.80.0
 + h11==0.16.0
 + hf-xet==1.4.2
 + httpcore==1.0.9
 + httptools==0.7.1
 + httpx==0.28.1
 + httpx-sse==0.4.3
 + huggingface-hub==0.36.2
 + idna==3.11
 + ijson==3.5.0
 + importlib-metadata==8.7.1
 + interegular==0.3.3
 + jinja2==3.1.6
 + jiter==0.13.0
 + jmespath==1.1.0
 + jsonschema==4.26.0
 + jsonschema-specifications==2025.9.1
 + lark==1.2.2
 + llguidance==1.3.0
 + llvmlite==0.44.0
 + lm-format-enforcer==0.11.3
 + loguru==0.7.3
 + markdown-it-py==4.0.0
 + markupsafe==3.0.3
 + mcp==1.26.0
 + mdurl==0.1.2
 + mistral-common==1.10.0
 + model-hosting-container-standards==0.1.14
 + mpmath==1.3.0
 + msgspec==0.20.0
 + multidict==6.7.1
 + networkx==3.4.2
 + ninja==1.13.0
 + numba==0.61.2
 + numpy==2.2.6
 + nvidia-cublas-cu12==12.8.4.1
 + nvidia-cuda-cupti-cu12==12.8.90
 + nvidia-cuda-nvrtc-cu12==12.8.93
 + nvidia-cuda-runtime-cu12==12.8.90
 + nvidia-cudnn-cu12==9.10.2.21
 + nvidia-cudnn-frontend==1.18.0
 + nvidia-cufft-cu12==11.3.3.83
 + nvidia-cufile-cu12==1.13.1.3
 + nvidia-curand-cu12==10.3.9.90
 + nvidia-cusolver-cu12==11.7.3.90
 + nvidia-cusparse-cu12==12.5.8.93
 + nvidia-cusparselt-cu12==0.7.1
 + nvidia-cutlass-dsl==4.4.2
 + nvidia-cutlass-dsl-libs-base==4.4.2
 + nvidia-ml-py==13.595.45
 + nvidia-nccl-cu12==2.27.5
 + nvidia-nvjitlink-cu12==12.8.93
 + nvidia-nvshmem-cu12==3.4.5
 + nvidia-nvtx-cu12==12.8.90
 + openai==2.24.0
 + openai-harmony==0.0.8
 + opencv-python-headless==4.13.0.92
 + opentelemetry-api==1.40.0
 + opentelemetry-exporter-otlp==1.40.0
 + opentelemetry-exporter-otlp-proto-common==1.40.0
 + opentelemetry-exporter-otlp-proto-grpc==1.40.0
 + opentelemetry-exporter-otlp-proto-http==1.40.0
 + opentelemetry-proto==1.40.0
 + opentelemetry-sdk==1.40.0
 + opentelemetry-semantic-conventions==0.61b0
 + opentelemetry-semantic-conventions-ai==0.5.1
 + outlines-core==0.2.11
 + packaging==26.0
 + partial-json-parser==0.2.1.1.post7
 + pillow==12.1.1
 + prometheus-client==0.24.1
 + prometheus-fastapi-instrumentator==7.1.0
 + propcache==0.4.1
 + protobuf==6.33.6
 + psutil==7.2.2
 + py-cpuinfo==9.0.0
 + pybase64==1.4.3
 + pycountry==26.2.16
 + pycparser==3.0
 + pydantic==2.12.5
 + pydantic-core==2.41.5
 + pydantic-extra-types==2.11.1
 + pydantic-settings==2.13.1
 + pygments==2.20.0
 + pyjwt==2.12.1
 + python-dotenv==1.2.2
 + python-json-logger==4.1.0
 + python-multipart==0.0.22
 + pyyaml==6.0.3
 + pyzmq==27.1.0
 + quack-kernels==0.3.7
 + referencing==0.37.0
 + regex==2026.3.32
 + requests==2.33.1
 + rich==14.3.3
 + rich-toolkit==0.19.7
 + rignore==0.7.6
 + rpds-py==0.30.0
 + safetensors==0.7.0
 + sentencepiece==0.2.1
 + sentry-sdk==2.57.0
 + setproctitle==1.3.7
 + setuptools==82.0.1
 + shellingham==1.5.4
 + sniffio==1.3.1
 + sse-starlette==3.3.4
 + starlette==0.52.1
 + supervisor==4.3.0
 + sympy==1.14.0
 + tabulate==0.10.0
 + tiktoken==0.12.0
 + tokenizers==0.22.2
 + tomli==2.4.1
 + torch==2.10.0
 + torch-c-dlpack-ext==0.1.5
 + torchaudio==2.10.0
 + torchvision==0.25.0
 + tqdm==4.67.3
 + transformers==4.57.6
 + triton==3.6.0
 + typer==0.24.1
 + typing-extensions==4.15.0
 + typing-inspection==0.4.2
 + urllib3==2.6.3
 + uvicorn==0.42.0
 + uvloop==0.22.1
 + vllm==0.18.1
 + watchfiles==1.1.1
 + websockets==16.0
 + xgrammar==0.1.33
 + yarl==1.23.0
 + zipp==3.23.0

(.venv) /m/s/w/Voxtral-4B-TTS-2603 ❯❯❯ uv pip install vllm-omni --upgrade
Resolved 140 packages in 257ms
Prepared 89 packages in 3ms
Uninstalled 13 packages in 159ms
Installed 89 packages in 253ms
 + accelerate==1.12.0
 + aenum==3.1.16
 + aiofiles==24.1.0
 + antlr4-python3-runtime==4.9.3
 + audioread==3.1.0
 + brotli==1.2.0
 + cache-dit==1.3.0
 + coloredlogs==15.0.1
 - cuda-bindings==12.9.4
 + cuda-bindings==13.2.0
 + cuda-toolkit==13.0.2
 + decorator==5.2.1
 + diffusers==0.37.1
 + einx==0.4.2
 + ema-pytorch==0.7.9
 + fa3-fwd==0.0.2
 + ffmpy==1.0.0
 + fire==0.7.1
 + flatbuffers==25.12.19
 + frozendict==2.4.7
 + gradio==5.50.0
 + gradio-client==1.14.0
 + groovy==0.1.2
 - huggingface-hub==0.36.2
 + huggingface-hub==1.8.0
 + humanfriendly==10.0
 + imageio==2.37.3
 + imageio-ffmpeg==0.6.0
 - importlib-metadata==8.7.1
 + importlib-metadata==9.0.0
 + janus==2.0.0
 + joblib==1.5.3
 + lazy-loader==0.5
 + librosa==0.11.0
 - llvmlite==0.44.0
 + llvmlite==0.46.0
 + more-itertools==10.8.0
 + msgpack==1.1.2
 - numba==0.61.2
 + numba==0.64.0
 + nvidia-cublas==13.1.0.3
 + nvidia-cuda-cupti==13.0.85
 + nvidia-cuda-nvrtc==13.0.88
 + nvidia-cuda-runtime==13.0.96
 + nvidia-cudnn-cu13==9.19.0.56
 + nvidia-cufft==12.0.0.61
 + nvidia-cufile==1.15.1.6
 + nvidia-curand==10.4.0.35
 + nvidia-cusolver==12.0.4.66
 + nvidia-cusparse==12.6.3.3
 + nvidia-cusparselt-cu13==0.8.0
 + nvidia-nccl-cu13==2.28.9
 + nvidia-nvjitlink==13.0.88
 + nvidia-nvshmem-cu13==3.4.5
 + nvidia-nvtx==13.0.85
 + omegaconf==2.3.0
 + onnxruntime==1.23.2
 + openai-whisper==20250625
 + orjson==3.11.7
 + pandas==2.3.3
 - pillow==12.1.1
 + pillow==11.3.0
 + platformdirs==4.9.4
 + pooch==1.9.0
 + prettytable==3.17.0
 - protobuf==6.33.6
 + protobuf==7.34.1
 - pydantic==2.12.5
 + pydantic==2.12.3
 - pydantic-core==2.41.5
 + pydantic-core==2.41.4
 + pydub==0.25.1
 + python-dateutil==2.9.0.post0
 + python-fire==0.1.0
 + pytz==2026.1.post1
 + resampy==0.4.3
 + ruff==0.15.8
 + safehttpx==0.1.7
 + scikit-learn==1.7.2
 + scipy==1.15.3
 + semantic-version==2.10.0
 - setuptools==82.0.1
 + setuptools==81.0.0
 + six==1.17.0
 + soundfile==0.13.1
 + sox==1.5.0
 + soxr==1.0.0
 + termcolor==3.3.0
 + threadpoolctl==3.6.0
 + tomlkit==0.13.3
 - torch==2.10.0
 + torch==2.11.0
 + torchsde==0.2.6
 + trampoline==0.1.2
 - transformers==4.57.6
 + transformers==5.4.0
 + tzdata==2025.3
 + vllm-omni==0.18.0
 + wcwidth==0.6.0
 - websockets==16.0
 + websockets==15.0.1
 + x-transformers==2.17.9

(.venv) /m/s/w/Voxtral-4B-TTS-2603 ❯❯❯ python3 -c "import mistral_common; print(mistral_common.__version__)"
1.10.0
(.venv) /m/s/w/Voxtral-4B-TTS-2603 ❯❯❯ vllm serve mistralai/Voxtral-4B-TTS-2603 --omni
Traceback (most recent call last):
  File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/bin/vllm", line 4, in <module>
    from vllm_omni.entrypoints.cli.main import main
  File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm_omni/__init__.py", line 16, in <module>
    from . import patch  # noqa: F401
  File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm_omni/patch.py", line 5, in <module>
    from vllm.model_executor.layers.rotary_embedding import (
  File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/model_executor/__init__.py", line 4, in <module>
    from vllm.model_executor.parameter import BasevLLMParameter, PackedvLLMParameter
  File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/model_executor/parameter.py", line 11, in <module>
    from vllm.distributed import (
  File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/distributed/__init__.py", line 4, in <module>
    from .communication_op import *
  File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/distributed/communication_op.py", line 9, in <module>
    from .parallel_state import get_tp_group
  File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 49, in <module>
    from vllm.distributed.utils import StatelessProcessGroup
  File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/distributed/utils.py", line 33, in <module>
    from vllm.utils.system_utils import suppress_stdout
  File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/utils/system_utils.py", line 19, in <module>
    from vllm.platforms import current_platform
  File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/platforms/__init__.py", line 279, in __getattr__
    _current_platform = resolve_obj_by_qualname(platform_cls_qualname)()
  File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/utils/import_utils.py", line 111, in resolve_obj_by_qualname
    module = importlib.import_module(module_name)
  File "/home/user/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/platforms/cuda.py", line 19, in <module>
    import vllm._C  # noqa
ImportError: /mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib

owao

Mar 31

•

edited Mar 31

Alright my whole process was a mess I couldn't remember exactly what uv pip command I ran, cause at some point I also tried to reinstall torch from https://download.pytorch.org/whl/torch/ etc... I was a bit desperate lol

So now I have a simple, working workflow that works:

uv venv .venv --python=3.10
source .venv/bin/activate
uv pip install vllm
uv pip install vllm-omni
vllm serve mistralai/Voxtral-4B-TTS-2603 --omni

So, basically just removed the --upgrade for uv pip install vllm-omni

The --upgrade was modifying those:

- cuda-bindings==12.9.4
+ cuda-bindings==13.2.0
- huggingface-hub==0.36.2
+ huggingface-hub==1.8.0
- importlib-metadata==8.7.1
+ importlib-metadata==9.0.0
- llvmlite==0.44.0
+ llvmlite==0.46.0
- numba==0.61.2
+ numba==0.64.0
- numpy==2.2.6
+ numpy==2.4.4
- pillow==12.1.1
+ pillow==11.3.0
- protobuf==6.33.6
+ protobuf==7.34.1
- pydantic==2.12.5
+ pydantic==2.12.3
- pydantic-core==2.41.5
+ pydantic-core==2.41.4
- setuptools==80.10.2
+ setuptools==81.0.0
- torch==2.10.0
+ torch==2.11.0
- transformers==4.57.6
+ transformers==5.4.0
- websockets==16.0
+ websockets==15.0.1

Found it!

The issue is coming from

 - torch==2.10.0
 + torch==2.11.0

reinstalling torch with uv pip install torch==2.10.0 makes it work!

owao

Mar 31

•

edited Mar 31

Sorry for the mess, for some reason I miss pasted what the --upgrade was modifying! Edited my previous answer!

kyzcreig

27 days ago

✅ Verified working 2026-05-12: Python 3.10 + vllm-omni 0.18.0 + the critical `--stage-configs-path` flag

After hitting the acoustic_transformer not found in MistralForCausalLM error across multiple version combinations (~3.5 hours of dead ends), here's the recipe that actually works on a fresh Linux box today.

Three insights that unblocked it

--omni flag is essential. The vllm-omni CLI silently passes through to plain vllm without it (see vllm_omni/entrypoints/cli/main.py:12). Must invoke vllm-omni serve ... --omni, not vllm serve --omni and not vllm-omni serve without the flag.
--stage-configs-path must point to the YAML. Voxtral TTS is a 2-stage model. Without the explicit YAML path, vllm-omni's auto-detect is broken in the 0.18.0 wheel and the loader falls back to MistralForCausalLM, then crashes looking for acoustic_transformer. The original Mistral README included this flag, then commit 47f5eea removed it expecting auto-detect to work — it doesn't.
GPU memory tuning. Default config asks for 0.8 utilization on stage 0; on a busy 32 GB card this causes serving to hang on first request. Drop to ~0.35.

Exact working steps on a fresh Ubuntu/WSL box

# 1. Python 3.10 (vllm-omni 0.20.x needs StrEnum from 3.11, but 0.20.x has its own bugs; stick with 3.10 + 0.18)
sudo apt install -y python3.10 python3.10-venv python3.10-dev
python3.10 -m venv ~/.venvs/voxtral-tts
source ~/.venvs/voxtral-tts/bin/activate
pip install pip==24.0  # newer pip hits a Py3.10 stdlib bug

# 2. Install vllm-omni FIRST (without --upgrade), then vllm
pip install vllm-omni
pip install vllm

# 3. Pin torch back to 2.10.0 (vllm 0.18.1 was built against it)
pip install torch==2.10.0 torchaudio==2.10.0 torchvision==0.25.0

# 4. Fix flashinfer-cubin pin
pip install flashinfer-cubin==0.6.6

# 5. Tune stage config
VOXTRAL_YAML=$(python -c 'import vllm_omni; import os; print(os.path.join(os.path.dirname(vllm_omni.__file__), "model_executor/stage_configs/voxtral_tts.yaml"))')
cp "$VOXTRAL_YAML" /tmp/voxtral_tts_tuned.yaml
sed -i 's/gpu_memory_utilization: 0.8/gpu_memory_utilization: 0.35/g' /tmp/voxtral_tts_tuned.yaml

# 6. Launch
vllm-omni serve mistralai/Voxtral-4B-TTS-2603 \
  --omni \
  --stage-configs-path /tmp/voxtral_tts_tuned.yaml \
  --port 8003 --host 127.0.0.1 --enforce-eager

Health check returns 200 after ~40s cold load. GPU usage ~15 GB.

Smoke test

curl -s -X POST http://127.0.0.1:8003/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{"input":"Hello world","model":"mistralai/Voxtral-4B-TTS-2603","voice":"casual_female","response_format":"wav"}' \
  --output /tmp/test.wav
file /tmp/test.wav  # RIFF WAVE, 16 bit, mono 24000 Hz

Verified versions in the working venv: torch 2.10.0, vllm 0.18.1, vllm-omni 0.18.0, transformers 4.57.6, Python 3.10.20.

Failed approaches (so you don't repeat them)

Python 3.11 or 3.12 — vllm-omni 0.20.0 needs StrEnum (Py 3.11+) but has its own acoustic_transformer registration bug
vllm serve --omni (plain vllm binary) — flag is rejected by vllm's argparse
vllm-omni serve without --omni — silently passes through to plain vllm
vllm-omni serve --omni without --stage-configs-path — fallback to MistralForCausalLM, crashes
Upgrading vllm to 0.20.x — pulls CUDA 13 runtime libs that aren't present on most boxes
transformers from git — fixes voxtral_realtime model type but the underlying registry patch is still broken in the 0.18 stack

Voice cloning endpoint works

Server exposes /v1/audio/voices for uploading new reference audio. Spec says 5-25s WAV/MP3/FLAC, with optional ref_text transcript for higher-quality in-context cloning. Untested by me as of post time — will follow up.

Happy to answer questions or update the recipe if Mistral / vllm-omni team publishes a fix that simplifies it.

Full writeup (with all the dead ends documented): our project repo

SNVBX

23 days ago

To complement what was said here earlier, here's what got it working on my setup:

# # commands I used to retry in a fresh env
# deactivate
# trash test_voxtral
# uv venv test_voxtral --python 3.10
# source test_voxtral/bin/activate

# installing stuff
uv pip install vllm==0.18.1 vllm-omni==0.18.0

# without config (haven't tested this)
# vllm serve mistralai/Voxtral-4B-TTS-2603 --omni

# with config
VOXTRAL_YAML=$(python -c 'import vllm_omni; import os; print(os.path.join(os.path.dirname(vllm_omni.__file__), "model_executor/stage_configs/voxtral_tts.yaml"))')
cp "$VOXTRAL_YAML" /tmp/voxtral_tts_tuned.yaml
# sed -i 's/gpu_memory_utilization: 0.8/gpu_memory_utilization: 0.35/g' /tmp/voxtral_tts_tuned.yaml
vllm-omni serve mistralai/Voxtral-4B-TTS-2603 \
 --omni \
 --stage-configs-path /tmp/voxtral_tts_tuned.yaml \
 --port 8003 --host 127.0.0.1 --enforce-eager

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Cannot merge

This branch has merge conflicts in the following files:

README.md

· Sign up or log in to comment

Update installation instructions

✅ Verified working 2026-05-12: Python 3.10 + vllm-omni 0.18.0 + the critical --stage-configs-path flag

Three insights that unblocked it

Exact working steps on a fresh Ubuntu/WSL box

Smoke test

Failed approaches (so you don't repeat them)

Voice cloning endpoint works

✅ Verified working 2026-05-12: Python 3.10 + vllm-omni 0.18.0 + the critical `--stage-configs-path` flag