msingiai/sauti-asr-track-b-preview

This repo stages the current Sauti ASR Track B research preview. The checkpoint is a fine-tuned derivative of omniASR_LLM_300M_v2 and is intended for research use, inspection, and side-by-side evaluation against the main Track A release. The public-facing training-data summary omits restricted internal sources.

Release Status

  • Release type: research preview
  • Best saved checkpoint: step_250
  • Best validation WER: 15.13%
  • Validation split: dev
  • Validation samples: 1000
  • Source run: track_b_omnilingual_llm_300m_v2_tuned_v1

Important Packaging Note

  • The model/ directory in this repo is an Omnilingual / fairseq2 checkpoint bundle, not a transformers checkpoint.
  • This repo is not intended for hosted Hugging Face inference endpoints.
  • Use it with the Omnilingual ASR codebase and a local asset-card file.

Training Data

The current preview was trained from the repo Track B Swahili dataset mix:

Dataset License Notes
mozilla-common-voice Common Voice (CC0) Used in repo Track B pipeline
google-fleurs FLEURS (CC-BY-4.0) Used in repo Track B pipeline
alffa-swahili-news ALFFA / OpenSLR (MIT) Used in repo Track B pipeline
keystats-swahili-asr-data KeyStats (Apache-2.0) Used in repo Track B pipeline

Current Strengths

  • More useful than the current Track A path on the long ANC consultation spot check used in this repo.
  • Better suited to conversational Swahili and mixed clinical speech than the current Track A service output.

Current Limitations

  • Not benchmark-leading yet relative to the best Track A held-out result.
  • Long-form decoding in the repo still uses simple chunk-and-stitch inference.
  • Clinical conversations still show many phonetic substitutions and code-switching errors.
  • The checkpoint is packaged for research tooling rather than turnkey hosted inference.

Local Usage

  1. Download or clone this repo locally.
  2. Copy sauti_asr_track_b_preview.asset.yaml into an Omnilingual ASR checkout.
  3. Replace the placeholder checkpoint path with the absolute path to the local model/ directory from this repo.
  4. Load the checkpoint via ASRInferencePipeline(model_card="sauti_asr_track_b_preview").

Asset Card Template

The staged folder includes sauti_asr_track_b_preview.asset.yaml with:

  • model_family: wav2vec2_llama
  • model_arch: 300m_v2
  • tokenizer_ref: omniASR_tokenizer_written_v2

Source Repository

The training, evaluation, and serving code lives in:

  • Msingi-AI/sauti-asr

Responsible Use

This research preview transcribes speech and may be inaccurate on sensitive audio, including clinical conversations. Users are responsible for consent, privacy handling, and downstream review before any real-world use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

  • Word Error Rate on Swahili dev split preview
    self-reported
    15.13%