Update installation instructions
Couldn't get it work when installing vllm before vllm-omni encountering this:
ImportError: /mnt/storage/workspaces/voxtral-tts/.venv/lib/python3.10/site-packages/vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib
Everything good if installing vllm-omni before vllm (at least when using python 3.10)
By the way, incredible model, thanks for offering it to the community <3
Updated them with latest version of vllm-omni - can you try again? :-)
Hey @patrickvonplaten I don't understand, how the latest pypi version could be in advance compared to vllm-omni main at github?
Anyway, I'll try again a bit later today and let you know of course
Nope, still the issue, all the command ran in a row:
/m/s/w/Voxtral-4B-TTS-2603 β―β―β― uv venv .venv --python=3.10
Using CPython 3.10.19
Creating virtual environment at: .venv
Activate with: source .venv/bin/activate
/m/s/w/Voxtral-4B-TTS-2603 β―β―β― source .venv/bin/activate.fish
(.venv) /m/s/w/Voxtral-4B-TTS-2603 β―β―β― python --version
Python 3.10.19
(.venv) /m/s/w/Voxtral-4B-TTS-2603 β―β―β― uv pip install -U vllm
Resolved 175 packages in 367ms
Prepared 175 packages in 6ms
Installed 175 packages in 249ms
+ aiohappyeyeballs==2.6.1
+ aiohttp==3.13.4
+ aiosignal==1.4.0
+ annotated-doc==0.0.4
+ annotated-types==0.7.0
+ anthropic==0.86.0
+ anyio==4.13.0
+ apache-tvm-ffi==0.1.9
+ astor==0.8.1
+ async-timeout==5.0.1
+ attrs==26.1.0
+ blake3==1.0.8
+ cachetools==7.0.5
+ cbor2==5.9.0
+ certifi==2026.2.25
+ cffi==2.0.0
+ charset-normalizer==3.4.6
+ click==8.3.1
+ cloudpickle==3.1.2
+ compressed-tensors==0.13.0
+ cryptography==46.0.6
+ cuda-bindings==12.9.4
+ cuda-pathfinder==1.5.0
+ cuda-python==12.9.4
+ depyf==0.20.0
+ dill==0.4.1
+ diskcache==5.6.3
+ distro==1.9.0
+ dnspython==2.8.0
+ docstring-parser==0.17.0
+ einops==0.8.2
+ email-validator==2.3.0
+ exceptiongroup==1.3.1
+ fastapi==0.135.2
+ fastapi-cli==0.0.24
+ fastapi-cloud-cli==0.15.1
+ fastar==0.9.0
+ filelock==3.25.2
+ flashinfer-python==0.6.6
+ frozenlist==1.8.0
+ fsspec==2026.3.0
+ gguf==0.18.0
+ googleapis-common-protos==1.73.1
+ grpcio==1.80.0
+ h11==0.16.0
+ hf-xet==1.4.2
+ httpcore==1.0.9
+ httptools==0.7.1
+ httpx==0.28.1
+ httpx-sse==0.4.3
+ huggingface-hub==0.36.2
+ idna==3.11
+ ijson==3.5.0
+ importlib-metadata==8.7.1
+ interegular==0.3.3
+ jinja2==3.1.6
+ jiter==0.13.0
+ jmespath==1.1.0
+ jsonschema==4.26.0
+ jsonschema-specifications==2025.9.1
+ lark==1.2.2
+ llguidance==1.3.0
+ llvmlite==0.44.0
+ lm-format-enforcer==0.11.3
+ loguru==0.7.3
+ markdown-it-py==4.0.0
+ markupsafe==3.0.3
+ mcp==1.26.0
+ mdurl==0.1.2
+ mistral-common==1.10.0
+ model-hosting-container-standards==0.1.14
+ mpmath==1.3.0
+ msgspec==0.20.0
+ multidict==6.7.1
+ networkx==3.4.2
+ ninja==1.13.0
+ numba==0.61.2
+ numpy==2.2.6
+ nvidia-cublas-cu12==12.8.4.1
+ nvidia-cuda-cupti-cu12==12.8.90
+ nvidia-cuda-nvrtc-cu12==12.8.93
+ nvidia-cuda-runtime-cu12==12.8.90
+ nvidia-cudnn-cu12==9.10.2.21
+ nvidia-cudnn-frontend==1.18.0
+ nvidia-cufft-cu12==11.3.3.83
+ nvidia-cufile-cu12==1.13.1.3
+ nvidia-curand-cu12==10.3.9.90
+ nvidia-cusolver-cu12==11.7.3.90
+ nvidia-cusparse-cu12==12.5.8.93
+ nvidia-cusparselt-cu12==0.7.1
+ nvidia-cutlass-dsl==4.4.2
+ nvidia-cutlass-dsl-libs-base==4.4.2
+ nvidia-ml-py==13.595.45
+ nvidia-nccl-cu12==2.27.5
+ nvidia-nvjitlink-cu12==12.8.93
+ nvidia-nvshmem-cu12==3.4.5
+ nvidia-nvtx-cu12==12.8.90
+ openai==2.24.0
+ openai-harmony==0.0.8
+ opencv-python-headless==4.13.0.92
+ opentelemetry-api==1.40.0
+ opentelemetry-exporter-otlp==1.40.0
+ opentelemetry-exporter-otlp-proto-common==1.40.0
+ opentelemetry-exporter-otlp-proto-grpc==1.40.0
+ opentelemetry-exporter-otlp-proto-http==1.40.0
+ opentelemetry-proto==1.40.0
+ opentelemetry-sdk==1.40.0
+ opentelemetry-semantic-conventions==0.61b0
+ opentelemetry-semantic-conventions-ai==0.5.1
+ outlines-core==0.2.11
+ packaging==26.0
+ partial-json-parser==0.2.1.1.post7
+ pillow==12.1.1
+ prometheus-client==0.24.1
+ prometheus-fastapi-instrumentator==7.1.0
+ propcache==0.4.1
+ protobuf==6.33.6
+ psutil==7.2.2
+ py-cpuinfo==9.0.0
+ pybase64==1.4.3
+ pycountry==26.2.16
+ pycparser==3.0
+ pydantic==2.12.5
+ pydantic-core==2.41.5
+ pydantic-extra-types==2.11.1
+ pydantic-settings==2.13.1
+ pygments==2.20.0
+ pyjwt==2.12.1
+ python-dotenv==1.2.2
+ python-json-logger==4.1.0
+ python-multipart==0.0.22
+ pyyaml==6.0.3
+ pyzmq==27.1.0
+ quack-kernels==0.3.7
+ referencing==0.37.0
+ regex==2026.3.32
+ requests==2.33.1
+ rich==14.3.3
+ rich-toolkit==0.19.7
+ rignore==0.7.6
+ rpds-py==0.30.0
+ safetensors==0.7.0
+ sentencepiece==0.2.1
+ sentry-sdk==2.57.0
+ setproctitle==1.3.7
+ setuptools==82.0.1
+ shellingham==1.5.4
+ sniffio==1.3.1
+ sse-starlette==3.3.4
+ starlette==0.52.1
+ supervisor==4.3.0
+ sympy==1.14.0
+ tabulate==0.10.0
+ tiktoken==0.12.0
+ tokenizers==0.22.2
+ tomli==2.4.1
+ torch==2.10.0
+ torch-c-dlpack-ext==0.1.5
+ torchaudio==2.10.0
+ torchvision==0.25.0
+ tqdm==4.67.3
+ transformers==4.57.6
+ triton==3.6.0
+ typer==0.24.1
+ typing-extensions==4.15.0
+ typing-inspection==0.4.2
+ urllib3==2.6.3
+ uvicorn==0.42.0
+ uvloop==0.22.1
+ vllm==0.18.1
+ watchfiles==1.1.1
+ websockets==16.0
+ xgrammar==0.1.33
+ yarl==1.23.0
+ zipp==3.23.0
(.venv) /m/s/w/Voxtral-4B-TTS-2603 β―β―β― uv pip install vllm-omni --upgrade
Resolved 140 packages in 257ms
Prepared 89 packages in 3ms
Uninstalled 13 packages in 159ms
Installed 89 packages in 253ms
+ accelerate==1.12.0
+ aenum==3.1.16
+ aiofiles==24.1.0
+ antlr4-python3-runtime==4.9.3
+ audioread==3.1.0
+ brotli==1.2.0
+ cache-dit==1.3.0
+ coloredlogs==15.0.1
- cuda-bindings==12.9.4
+ cuda-bindings==13.2.0
+ cuda-toolkit==13.0.2
+ decorator==5.2.1
+ diffusers==0.37.1
+ einx==0.4.2
+ ema-pytorch==0.7.9
+ fa3-fwd==0.0.2
+ ffmpy==1.0.0
+ fire==0.7.1
+ flatbuffers==25.12.19
+ frozendict==2.4.7
+ gradio==5.50.0
+ gradio-client==1.14.0
+ groovy==0.1.2
- huggingface-hub==0.36.2
+ huggingface-hub==1.8.0
+ humanfriendly==10.0
+ imageio==2.37.3
+ imageio-ffmpeg==0.6.0
- importlib-metadata==8.7.1
+ importlib-metadata==9.0.0
+ janus==2.0.0
+ joblib==1.5.3
+ lazy-loader==0.5
+ librosa==0.11.0
- llvmlite==0.44.0
+ llvmlite==0.46.0
+ more-itertools==10.8.0
+ msgpack==1.1.2
- numba==0.61.2
+ numba==0.64.0
+ nvidia-cublas==13.1.0.3
+ nvidia-cuda-cupti==13.0.85
+ nvidia-cuda-nvrtc==13.0.88
+ nvidia-cuda-runtime==13.0.96
+ nvidia-cudnn-cu13==9.19.0.56
+ nvidia-cufft==12.0.0.61
+ nvidia-cufile==1.15.1.6
+ nvidia-curand==10.4.0.35
+ nvidia-cusolver==12.0.4.66
+ nvidia-cusparse==12.6.3.3
+ nvidia-cusparselt-cu13==0.8.0
+ nvidia-nccl-cu13==2.28.9
+ nvidia-nvjitlink==13.0.88
+ nvidia-nvshmem-cu13==3.4.5
+ nvidia-nvtx==13.0.85
+ omegaconf==2.3.0
+ onnxruntime==1.23.2
+ openai-whisper==20250625
+ orjson==3.11.7
+ pandas==2.3.3
- pillow==12.1.1
+ pillow==11.3.0
+ platformdirs==4.9.4
+ pooch==1.9.0
+ prettytable==3.17.0
- protobuf==6.33.6
+ protobuf==7.34.1
- pydantic==2.12.5
+ pydantic==2.12.3
- pydantic-core==2.41.5
+ pydantic-core==2.41.4
+ pydub==0.25.1
+ python-dateutil==2.9.0.post0
+ python-fire==0.1.0
+ pytz==2026.1.post1
+ resampy==0.4.3
+ ruff==0.15.8
+ safehttpx==0.1.7
+ scikit-learn==1.7.2
+ scipy==1.15.3
+ semantic-version==2.10.0
- setuptools==82.0.1
+ setuptools==81.0.0
+ six==1.17.0
+ soundfile==0.13.1
+ sox==1.5.0
+ soxr==1.0.0
+ termcolor==3.3.0
+ threadpoolctl==3.6.0
+ tomlkit==0.13.3
- torch==2.10.0
+ torch==2.11.0
+ torchsde==0.2.6
+ trampoline==0.1.2
- transformers==4.57.6
+ transformers==5.4.0
+ tzdata==2025.3
+ vllm-omni==0.18.0
+ wcwidth==0.6.0
- websockets==16.0
+ websockets==15.0.1
+ x-transformers==2.17.9
(.venv) /m/s/w/Voxtral-4B-TTS-2603 β―β―β― python3 -c "import mistral_common; print(mistral_common.__version__)"
1.10.0
(.venv) /m/s/w/Voxtral-4B-TTS-2603 β―β―β― vllm serve mistralai/Voxtral-4B-TTS-2603 --omni
Traceback (most recent call last):
File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/bin/vllm", line 4, in <module>
from vllm_omni.entrypoints.cli.main import main
File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm_omni/__init__.py", line 16, in <module>
from . import patch # noqa: F401
File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm_omni/patch.py", line 5, in <module>
from vllm.model_executor.layers.rotary_embedding import (
File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/model_executor/__init__.py", line 4, in <module>
from vllm.model_executor.parameter import BasevLLMParameter, PackedvLLMParameter
File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/model_executor/parameter.py", line 11, in <module>
from vllm.distributed import (
File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/distributed/__init__.py", line 4, in <module>
from .communication_op import *
File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/distributed/communication_op.py", line 9, in <module>
from .parallel_state import get_tp_group
File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/distributed/parallel_state.py", line 49, in <module>
from vllm.distributed.utils import StatelessProcessGroup
File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/distributed/utils.py", line 33, in <module>
from vllm.utils.system_utils import suppress_stdout
File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/utils/system_utils.py", line 19, in <module>
from vllm.platforms import current_platform
File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/platforms/__init__.py", line 279, in __getattr__
_current_platform = resolve_obj_by_qualname(platform_cls_qualname)()
File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/utils/import_utils.py", line 111, in resolve_obj_by_qualname
module = importlib.import_module(module_name)
File "/home/user/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/platforms/cuda.py", line 19, in <module>
import vllm._C # noqa
ImportError: /mnt/storage/workspaces/Voxtral-4B-TTS-2603/.venv/lib/python3.10/site-packages/vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib
Alright my whole process was a mess I couldn't remember exactly what uv pip command I ran, cause at some point I also tried to reinstall torch from https://download.pytorch.org/whl/torch/ etc... I was a bit desperate lol
So now I have a simple, working workflow that works:
uv venv .venv --python=3.10
source .venv/bin/activate
uv pip install vllm
uv pip install vllm-omni
vllm serve mistralai/Voxtral-4B-TTS-2603 --omni
So, basically just removed the --upgrade for uv pip install vllm-omni
The --upgrade was modifying those:
- cuda-bindings==12.9.4
+ cuda-bindings==13.2.0
- huggingface-hub==0.36.2
+ huggingface-hub==1.8.0
- importlib-metadata==8.7.1
+ importlib-metadata==9.0.0
- llvmlite==0.44.0
+ llvmlite==0.46.0
- numba==0.61.2
+ numba==0.64.0
- numpy==2.2.6
+ numpy==2.4.4
- pillow==12.1.1
+ pillow==11.3.0
- protobuf==6.33.6
+ protobuf==7.34.1
- pydantic==2.12.5
+ pydantic==2.12.3
- pydantic-core==2.41.5
+ pydantic-core==2.41.4
- setuptools==80.10.2
+ setuptools==81.0.0
- torch==2.10.0
+ torch==2.11.0
- transformers==4.57.6
+ transformers==5.4.0
- websockets==16.0
+ websockets==15.0.1
Found it!
The issue is coming from
- torch==2.10.0
+ torch==2.11.0
reinstalling torch with uv pip install torch==2.10.0 makes it work!
Sorry for the mess, for some reason I miss pasted what the --upgrade was modifying! Edited my previous answer!
β
Verified working 2026-05-12: Python 3.10 + vllm-omni 0.18.0 + the critical --stage-configs-path flag
After hitting the acoustic_transformer not found in MistralForCausalLM error across multiple version combinations (~3.5 hours of dead ends), here's the recipe that actually works on a fresh Linux box today.
Three insights that unblocked it
--omniflag is essential. Thevllm-omniCLI silently passes through to plainvllmwithout it (seevllm_omni/entrypoints/cli/main.py:12). Must invokevllm-omni serve ... --omni, notvllm serve --omniand notvllm-omni servewithout the flag.--stage-configs-pathmust point to the YAML. Voxtral TTS is a 2-stage model. Without the explicit YAML path, vllm-omni's auto-detect is broken in the 0.18.0 wheel and the loader falls back toMistralForCausalLM, then crashes looking foracoustic_transformer. The original Mistral README included this flag, then commit 47f5eea removed it expecting auto-detect to work β it doesn't.- GPU memory tuning. Default config asks for 0.8 utilization on stage 0; on a busy 32 GB card this causes serving to hang on first request. Drop to ~0.35.
Exact working steps on a fresh Ubuntu/WSL box
# 1. Python 3.10 (vllm-omni 0.20.x needs StrEnum from 3.11, but 0.20.x has its own bugs; stick with 3.10 + 0.18)
sudo apt install -y python3.10 python3.10-venv python3.10-dev
python3.10 -m venv ~/.venvs/voxtral-tts
source ~/.venvs/voxtral-tts/bin/activate
pip install pip==24.0 # newer pip hits a Py3.10 stdlib bug
# 2. Install vllm-omni FIRST (without --upgrade), then vllm
pip install vllm-omni
pip install vllm
# 3. Pin torch back to 2.10.0 (vllm 0.18.1 was built against it)
pip install torch==2.10.0 torchaudio==2.10.0 torchvision==0.25.0
# 4. Fix flashinfer-cubin pin
pip install flashinfer-cubin==0.6.6
# 5. Tune stage config
VOXTRAL_YAML=$(python -c 'import vllm_omni; import os; print(os.path.join(os.path.dirname(vllm_omni.__file__), "model_executor/stage_configs/voxtral_tts.yaml"))')
cp "$VOXTRAL_YAML" /tmp/voxtral_tts_tuned.yaml
sed -i 's/gpu_memory_utilization: 0.8/gpu_memory_utilization: 0.35/g' /tmp/voxtral_tts_tuned.yaml
# 6. Launch
vllm-omni serve mistralai/Voxtral-4B-TTS-2603 \
--omni \
--stage-configs-path /tmp/voxtral_tts_tuned.yaml \
--port 8003 --host 127.0.0.1 --enforce-eager
Health check returns 200 after ~40s cold load. GPU usage ~15 GB.
Smoke test
curl -s -X POST http://127.0.0.1:8003/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{"input":"Hello world","model":"mistralai/Voxtral-4B-TTS-2603","voice":"casual_female","response_format":"wav"}' \
--output /tmp/test.wav
file /tmp/test.wav # RIFF WAVE, 16 bit, mono 24000 Hz
Verified versions in the working venv: torch 2.10.0, vllm 0.18.1, vllm-omni 0.18.0, transformers 4.57.6, Python 3.10.20.
Failed approaches (so you don't repeat them)
- Python 3.11 or 3.12 β vllm-omni 0.20.0 needs StrEnum (Py 3.11+) but has its own
acoustic_transformerregistration bug vllm serve --omni(plain vllm binary) β flag is rejected by vllm's argparsevllm-omni servewithout--omniβ silently passes through to plain vllmvllm-omni serve --omniwithout--stage-configs-pathβ fallback to MistralForCausalLM, crashes- Upgrading vllm to 0.20.x β pulls CUDA 13 runtime libs that aren't present on most boxes
- transformers from git β fixes voxtral_realtime model type but the underlying registry patch is still broken in the 0.18 stack
Voice cloning endpoint works
Server exposes /v1/audio/voices for uploading new reference audio. Spec says 5-25s WAV/MP3/FLAC, with optional ref_text transcript for higher-quality in-context cloning. Untested by me as of post time β will follow up.
Happy to answer questions or update the recipe if Mistral / vllm-omni team publishes a fix that simplifies it.
Full writeup (with all the dead ends documented): our project repo
To complement what was said here earlier, here's what got it working on my setup:
# # commands I used to retry in a fresh env
# deactivate
# trash test_voxtral
# uv venv test_voxtral --python 3.10
# source test_voxtral/bin/activate
# installing stuff
uv pip install vllm==0.18.1 vllm-omni==0.18.0
# without config (haven't tested this)
# vllm serve mistralai/Voxtral-4B-TTS-2603 --omni
# with config
VOXTRAL_YAML=$(python -c 'import vllm_omni; import os; print(os.path.join(os.path.dirname(vllm_omni.__file__), "model_executor/stage_configs/voxtral_tts.yaml"))')
cp "$VOXTRAL_YAML" /tmp/voxtral_tts_tuned.yaml
# sed -i 's/gpu_memory_utilization: 0.8/gpu_memory_utilization: 0.35/g' /tmp/voxtral_tts_tuned.yaml
vllm-omni serve mistralai/Voxtral-4B-TTS-2603 \
--omni \
--stage-configs-path /tmp/voxtral_tts_tuned.yaml \
--port 8003 --host 127.0.0.1 --enforce-eager