Instructions to use SebastianBodza/Kartoffelbox-v0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Chatterbox
How to use SebastianBodza/Kartoffelbox-v0.1 with Chatterbox:
# pip install chatterbox-tts import torchaudio as ta from chatterbox.tts import ChatterboxTTS model = ChatterboxTTS.from_pretrained(device="cuda") text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill." wav = model.generate(text) ta.save("test-1.wav", wav, model.sr) # If you want to synthesize with a different voice, specify the audio prompt AUDIO_PROMPT_PATH="YOUR_FILE.wav" wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH) ta.save("test-2.wav", wav, model.sr) - Notebooks
- Google Colab
- Kaggle
Wan't to share my Training-Set
Hey,
I am from Leipzig and I created around 12.000 German samples with Elevenlabs. I would like to share these Dataset (wav+txt) with you. I also able to create some more with specific texts, like mathematical formulary and so on. Furthermore, I trained my model with this set on my 5090 RTX for around 5 Days and still had some problems in order of the words, but the words itself sounds german enough for me. I then merged it with your model and was satisfied :D
Hey, that would be super helpful 😊. I am right now in the process of preparing a bigger dataset.
What length are the audio files? And how could you share it?
Hello,
I have contacted you via LinkedIn. The samples vary in length. I have generated samples from one word like “Hello!” to 2 minutes in some cases. I wasn't sure at first how long the samples should be. I trained my model for almost 7 days on my 5090 - around 1mio passes. I don't know if the weights are of any use to you? I can send it to you if you like. I found my model with your 65% merged quite good. Best regards.
Multilingual:
https://www.openslr.org/94/ (audiobook based libritts)
https://github.com/freds0/CML-TTS-Dataset (more than 3000 hours, CS licensed)
German: TTS dataset from a university (high quality, 6 main speakers, I think 40-50 hours of studio quality recordings)
https://opendata.iisys.de/dataset/hui-audio-corpus-german/ (https://github.com/iisys-hof/HUI-Audio-Corpus-German)
https://github.com/thorstenMueller/Thorsten-Voice (11 hours, one person)
There is a lot of german spoken data, maybe something is useful