Instructions to use reazon-research/reazonspeech-nemo-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use reazon-research/reazonspeech-nemo-v2 with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("reazon-research/reazonspeech-nemo-v2") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
reazonspeech-nemo-v2
reazonspeech-nemo-v2 is an automatic speech recognition model trained
on ReazonSpeech v2.0 corpus.
This model supports inference of long-form Japanese audio clips up to several hours.
Model Architecture
The model features an improved Conformer architecture from Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition.
Subword-based RNN-T model. The total parameter count is 619M.
Encoder uses Longformer attention with local context size of 256, and has a single global token.
Decoder has a vocabulary space of 3000 tokens constructed by SentencePiece unigram tokenizer.
We trained this model for 1 million steps using AdamW optimizer following Noam annealing schedule.
Usage
We recommend to use this model through our reazonspeech library.
from reazonspeech.nemo.asr import load_model, transcribe, audio_from_path
audio = audio_from_path("speech.wav")
model = load_model()
ret = transcribe(model, audio)
print(ret.text)
License
- Downloads last month
- 1,330