Implement audio preprocessing and speaker diarization enhancements in WhisperTranscriber. Introduce methods for audio chunk preparation, VAD-based trimming, and speaker embedding extraction. Update process_audio methods to utilize task JSON for improved workflow and metadata handling. Add webrtcvad dependency for voice activity detection.
70438f0
liuyangcommited on
requirements.txt
4916125
liuyangcommited on
requirements.txt
120583d
liuyangcommited on
update numpy version constraint to be less than 1.24.0 in requirements.txt
c4b04a5
liuyangcommited on
requirements
4906209
liuyangcommited on
requirements
e91cdbe
liuyangcommited on
nvidia-cudnn-cu12
4042b26
liuyangcommited on
nvidia-cudnn-cu12
175f81b
liuyangcommited on
fast whisper
99ff812
liuyangcommited on
update Gradio SDK version to 5.0.1 in README and requirements