HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Paper • 2010.05646 • Published • 1
How to use nineninesix/nemo-nano-codec-22khz-0.6kbps-12.5fps-MLX with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir nemo-nano-codec-22khz-0.6kbps-12.5fps-MLX nineninesix/nemo-nano-codec-22khz-0.6kbps-12.5fps-MLX
This is an MLX implementation of NVIDIA NeMo NanoCodec, a lightweight neural audio codec.
pip install nanocodec-mlx soundfile
from nanocodec_mlx.models.audio_codec import AudioCodecModel
import soundfile as sf
import mlx.core as mx
import numpy as np
# Load model from HuggingFace Hub
model = AudioCodecModel.from_pretrained("nineninesix/nemo-nano-codec-22khz-0.6kbps-12.5fps-MLX")
# Load audio
audio, sr = sf.read("input.wav")
audio_mlx = mx.array(audio, dtype=mx.float32)[None, None, :]
audio_len = mx.array([len(audio)], dtype=mx.int32)
# Encode and decode
tokens, tokens_len = model.encode(audio_mlx, audio_len)
reconstructed, recon_len = model.decode(tokens, tokens_len)
# Save output
output = np.array(reconstructed[0, 0, :int(recon_len[0])])
sf.write("output.wav", output, 22050)
This code is licensed under the Apache License 2.0.
The original NVIDIA NeMo NanoCodec model weights and architecture are developed by NVIDIA and are licensed under the NVIDIA Open Model License. See NOTICE for attribution.
When using this project, you must comply with both licenses.
This is an MLX implementation of NVIDIA NeMo NanoCodec. If you use this work, please cite the original:
Quantized