# 1. Do NOT pin torch/torchaudio here – keep the CUDA builds that come with the image transformers==4.48.0 # Removed flash-attention since faster-whisper handles this internally # https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.8/flash_attn-2.7.4.post1+cu126torch2.4-cp310-cp310-linux_x86_64.whl pydantic==2.10.6 # 2. Main whisper model faster-whisper==1.1.1 ctranslate2==4.5.0 torch # 3. Extra libs your app really needs gradio==5.0.1 spaces>=0.19.0 pyannote.audio==3.3.1 pandas>=1.5.0 numpy<1.24.0 librosa>=0.10.0 soundfile>=0.12.0 ffmpeg-python>=0.2.0 requests>=2.28.0 nvidia-cudnn-cu12==9.1.0.70 # any 9.1.x that pip can find is fine webrtcvad>=2.0.10