msingiai/sauti-asr-track-b-preview

This repo stages the current Sauti ASR Track B research preview. The checkpoint is a fine-tuned derivative of omniASR_LLM_300M_v2 and is intended for research use, inspection, and side-by-side evaluation against the main Track A release. The public-facing training-data summary omits restricted internal sources.

Release Status

Release type: research preview
Best saved checkpoint: step_250
Best validation WER: 15.13%
Validation split: dev
Validation samples: 1000
Source run: track_b_omnilingual_llm_300m_v2_tuned_v1

Important Packaging Note

The model/ directory in this repo is an Omnilingual / fairseq2 checkpoint bundle, not a transformers checkpoint.
This repo is not intended for hosted Hugging Face inference endpoints.
Use it with the Omnilingual ASR codebase and a local asset-card file.

Training Data

The current preview was trained from the repo Track B Swahili dataset mix:

Dataset	License	Notes
`mozilla-common-voice`	Common Voice (CC0)	Used in repo Track B pipeline
`google-fleurs`	FLEURS (CC-BY-4.0)	Used in repo Track B pipeline
`alffa-swahili-news`	ALFFA / OpenSLR (MIT)	Used in repo Track B pipeline
`keystats-swahili-asr-data`	KeyStats (Apache-2.0)	Used in repo Track B pipeline

Current Strengths

More useful than the current Track A path on the long ANC consultation spot check used in this repo.
Better suited to conversational Swahili and mixed clinical speech than the current Track A service output.

Current Limitations

Not benchmark-leading yet relative to the best Track A held-out result.
Long-form decoding in the repo still uses simple chunk-and-stitch inference.
Clinical conversations still show many phonetic substitutions and code-switching errors.
The checkpoint is packaged for research tooling rather than turnkey hosted inference.

Local Usage

Download or clone this repo locally.
Copy sauti_asr_track_b_preview.asset.yaml into an Omnilingual ASR checkout.
Replace the placeholder checkpoint path with the absolute path to the local model/ directory from this repo.
Load the checkpoint via ASRInferencePipeline(model_card="sauti_asr_track_b_preview").

Asset Card Template

The staged folder includes sauti_asr_track_b_preview.asset.yaml with:

model_family: wav2vec2_llama
model_arch: 300m_v2
tokenizer_ref: omniASR_tokenizer_written_v2

Source Repository

The training, evaluation, and serving code lives in:

Msingi-AI/sauti-asr

Responsible Use

This research preview transcribes speech and may be inaccurate on sensitive audio, including clinical conversations. Users are responsible for consent, privacy handling, and downstream review before any real-world use.

Downloads last month: -; Downloads are not tracked for this model. How to track

Evaluation results

Word Error Rate on Swahili dev split preview
self-reported

15.13%