title: Reachy Mini ChatBox
emoji: π¬
colorFrom: red
colorTo: blue
sdk: static
pinned: false
short_description: Live chat with Reachy Mini!
suggested_storage: large
tags:
- reachy_mini
- reachy_mini_python_app
Reachy Mini ChatBox
A modular voice conversation app for the Reachy Mini robot. Cheap to run, versatile, customizable, and fun.
ChatBox uses a cascade pipeline β ASR β LLM β TTS β where each stage is a swappable provider. Mix cloud APIs and local models, tweak the personality, add live reactions to keywords, and let the robot dance.
How it works
π€ Microphone
β
βΌ
βββββββββββ βββββββββββ βββββββββββ π Speaker
β ASR β βββΆ β LLM β βββΆ β TTS β βββΆ π€ Robot moves
βββββββββββ βββββββββββ βββββββββββ
Speech to Reasoning Text to
text + tool calls speech
Voice Activity Detection (VAD) segments microphone input. The ASR provider transcribes it, the LLM generates a response (with optional tool calls for head movements, dances, emotions, cameraβ¦), and the TTS provider speaks it back β all while the robot animates.
Installation
Before using this app, install Reachy Mini's SDK.
Using uv (recommended)
# Create venv
uv venv --python python3.12 .venv
source .venv/bin/activate
# Install with cascade pipeline
uv sync --extra cascade
Install additional providers as needed:
uv sync --extra cascade_parakeet_progressive # Parakeet ASR (Apple Silicon)
uv sync --extra cascade_kokoro # Kokoro TTS (local)
uv sync --extra cascade_gemini # Gemini LLM
uv sync --extra cascade_elevenlabs # ElevenLabs TTS
uv sync --extra cascade_deepgram # Deepgram ASR
uv sync --extra cascade_parakeet # Parakeet batch/RNNT ASR (Apple Silicon)
uv sync --extra cascade_voxtral_mlx # Voxtral ASR (Apple Silicon)
uv sync --extra cascade_nemotron # NeMo ASR (CUDA)
uv sync --extra cascade_gradium # Gradium TTS (Python β₯3.12)
uv sync --extra cascade_all # All cascade providers
Vision and head-tracking extras:
uv sync --extra yolo_vision # YOLO head-tracking
uv sync --extra mediapipe_vision # MediaPipe head-tracking
uv sync --extra local_vision # Local VLM (SmolVLM2)
uv sync --extra all_vision # All vision features
Combine extras freely:
uv sync --extra cascade --extra cascade_kokoro --extra cascade_gemini --extra yolo_vision --group dev
Tip: Use
uv sync --frozento install from the lockfile without re-resolving.
Using pip
python -m venv .venv
source .venv/bin/activate
pip install -e ".[cascade]"
# Add providers
pip install -e ".[cascade_kokoro]"
pip install -e ".[cascade_gemini]"
# etc.
Optional extras reference
| Extra | Purpose | Notes |
|---|---|---|
cascade |
Base cascade pipeline (sounddevice, librosa, PyYAML) | Required for cascade mode |
cascade_parakeet_progressive |
Parakeet MLX progressive ASR | Apple Silicon only |
cascade_kokoro |
Kokoro local TTS | Any PyTorch platform |
cascade_gemini |
Google Gemini LLM | Requires GEMINI_API_KEY |
cascade_deepgram |
Deepgram streaming ASR | Requires DEEPGRAM_API_KEY |
cascade_elevenlabs |
ElevenLabs TTS | Requires ELEVENLABS_API_KEY |
cascade_parakeet |
Parakeet batch/RNNT ASR | Apple Silicon only |
cascade_voxtral_mlx |
Voxtral Mini 4B multilingual ASR | Apple Silicon only |
cascade_nemotron |
NeMo Parakeet/Nemotron ASR | CUDA GPU required |
cascade_silero_vad |
Silero VAD (torch-based) | Optional VAD backend |
cascade_gradium |
Gradium streaming TTS | Python β₯3.12, requires GRADIUM_API_KEY |
cascade_all |
All cascade providers | Excludes gradium (Python β₯3.12) |
local_vision |
Local VLM (SmolVLM2) via PyTorch/Transformers | GPU recommended |
yolo_vision |
YOLO head-tracking | CPU or GPU |
mediapipe_vision |
MediaPipe head-tracking | CPU |
all_vision |
All vision extras | β |
Configuration
Environment variables
Copy .env.example to .env and fill in the API keys for the providers you use:
| Variable | Used by |
|---|---|
OPENAI_API_KEY |
Whisper ASR, OpenAI Realtime ASR, GPT LLMs, OpenAI TTS |
GEMINI_API_KEY |
Gemini LLMs |
DEEPGRAM_API_KEY |
Deepgram ASR |
ELEVENLABS_API_KEY |
ElevenLabs TTS |
GRADIUM_API_KEY |
Gradium TTS |
HF_HOME |
Cache dir for Hugging Face models (default: ./cache) |
HF_TOKEN |
Hugging Face access for gated models |
REACHY_MINI_CUSTOM_PROFILE |
Select a profile (default: default) |
REACHY_MINI_EXTERNAL_PROFILES_DIRECTORY |
Path to external profiles |
REACHY_MINI_EXTERNAL_TOOLS_DIRECTORY |
Path to external tool modules |
AUTOLOAD_EXTERNAL_TOOLS |
Set 1 to auto-load all external tools |
cascade.yaml
The cascade.yaml file at the project root controls which providers are used and their settings. Structure:
asr:
provider: parakeet_mlx_progressive # Which ASR to use
providers:
parakeet_mlx_progressive: # Provider definitions
module: parakeet_mlx_progressive
class: ParakeetMLXProgressiveASR
streaming: true
location: local
hardware: apple_silicon
# Provider-specific settings
model: mlx-community/parakeet-tdt-0.6b-v3
precision: float16
llm:
provider: gemini-2.5-flash-lite # Which LLM to use
temperature: 1.0
providers: { ... }
tts:
provider: kokoro # Which TTS to use
trim_silence: true
providers: { ... }
Change provider: under each section to switch. You can also override from the CLI with --asr-provider, --llm-provider, or --tts-provider.
ASR providers
| Provider | Location | Streaming | Hardware | Install extra | API key |
|---|---|---|---|---|---|
parakeet_mlx_progressive |
Local | Yes | Apple Silicon | cascade_parakeet_progressive |
β |
voxtral_mlx |
Local | Yes | Apple Silicon | cascade_voxtral_mlx |
β |
parakeet_nemo_progressive |
Local | Yes | CUDA | cascade_nemotron |
β |
nemotron |
Local | Yes | CUDA | cascade_nemotron |
β |
deepgram |
Cloud | Yes | β | cascade_deepgram |
DEEPGRAM_API_KEY |
openai_realtime_asr |
Cloud | Yes | β | β | OPENAI_API_KEY |
whisper_openai |
Cloud | No (batch) | β | β | OPENAI_API_KEY |
LLM providers
| Provider | Location | Model | Install extra | API key |
|---|---|---|---|---|
gemini-2.5-flash-lite |
Cloud | gemini-2.5-flash-lite |
cascade_gemini |
GEMINI_API_KEY |
gemini-3.1-flash-lite |
Cloud | gemini-3.1-flash-lite-preview |
cascade_gemini |
GEMINI_API_KEY |
gpt-4o-mini |
Cloud | gpt-4o-mini |
β | OPENAI_API_KEY |
gpt-5.2-chat |
Cloud | gpt-5.2-chat-latest |
β | OPENAI_API_KEY |
TTS providers
| Provider | Location | Hardware | Install extra | API key |
|---|---|---|---|---|
kokoro |
Local | Any (PyTorch) | cascade_kokoro |
β |
tts_openai |
Cloud | β | β | OPENAI_API_KEY |
elevenlabs |
Cloud | β | cascade_elevenlabs |
ELEVENLABS_API_KEY |
gradium |
Cloud | β | cascade_gradium |
GRADIUM_API_KEY |
Running the app
Make sure the Reachy Mini daemon is running before launching. See Reachy Mini's SDK for setup.
reachy-mini-conversation-app --gradio
CLI options
| Option | Default | Description |
|---|---|---|
--gradio |
False |
Launch Gradio web UI at http://127.0.0.1:7860/. Without this, runs in console mode with VAD. |
--asr-provider NAME |
from yaml | Override ASR provider (e.g. deepgram, whisper_openai) |
--llm-provider NAME |
from yaml | Override LLM provider (e.g. gpt-4o-mini, gemini-2.5-flash-lite) |
--tts-provider NAME |
from yaml | Override TTS provider (e.g. kokoro, elevenlabs) |
--head-tracker {yolo,mediapipe} |
None |
Enable head-tracking. Requires the matching vision extra. |
--no-camera |
False |
Run without camera. |
--local-vision |
False |
Use local VLM (SmolVLM2) instead of cloud vision. Requires local_vision extra. |
--autotest [FILE] |
β | Run automated testing with text utterances (default file: cascade/autotest.txt). |
--realtime |
False |
Use OpenAI realtime audio-to-audio API instead of cascade. |
--robot-name NAME |
None |
Connect to a specific robot when multiple daemons run on the same subnet. |
--debug |
False |
Verbose logging. |
Examples
# Gradio UI with YOLO head-tracking
reachy-mini-conversation-app --gradio --head-tracker yolo
# Console mode (no UI, VAD-based)
reachy-mini-conversation-app
# Override providers from CLI
reachy-mini-conversation-app --gradio --asr-provider deepgram --llm-provider gpt-4o-mini
# Audio-only (no camera)
reachy-mini-conversation-app --gradio --no-camera
# Automated testing
reachy-mini-conversation-app --autotest
Profiles
Profiles define the robot's personality: what it says, how it sounds, and which tools it can use.
Each profile is a folder under src/reachy_mini_conversation_app/profiles/<name>/ containing:
| File | Required | Purpose |
|---|---|---|
instructions.txt |
Yes | System prompt β the robot's personality and behavior rules |
tools.txt |
Recommended | Enabled tools, one per line. Falls back to default/tools.txt if missing. |
voice.txt |
Optional | TTS voice name override (single line) |
reactions.yaml |
Optional | Live reaction triggers (see Live reactions) |
*.py |
Optional | Custom tool implementations |
Selecting a profile
- Environment variable:
REACHY_MINI_CUSTOM_PROFILE=piratein.env - Gradio UI: Open the "Personality" accordion to switch profiles, edit instructions, or create new ones.
Template placeholders in instructions.txt
Reuse shared prompt snippets by referencing files under src/reachy_mini_conversation_app/prompts/:
[passion_for_lobster_jokes]
[identities/witty_identity]
Locked profile mode
Set LOCKED_PROFILE in src/reachy_mini_conversation_app/config.py to lock the app to a single profile. The Gradio UI shows "(locked)" and disables profile editing. Useful for dedicated app variants.
Live reactions
Reactions let the robot respond to keywords or entities in real time β while the user is still speaking β without waiting for the LLM.
Define them in profiles/<name>/reactions.yaml:
- name: music_excitement
callback: excited_about_music
trigger:
words: [music, guitar, piano, drum, violin]
- name: food_reaction
callback: react_to_food_entity
trigger:
entities: [food]
repeatable: true
- name: groovy_dance
callback: do_groovy_dance
trigger:
all:
- words: [danc*]
- words: [groov*]
- name: reachy_name
callback: react_to_name
trigger:
words: [reachy, richie, reechy]
params:
emotion: helpful1
Trigger types
| Type | Description | Example |
|---|---|---|
words |
Keyword/glob match on transcript | [music, danc*, "grand piano"] |
entities |
Named entity recognition via GLiNER | [food, person, location] |
all |
Boolean AND β all sub-triggers must match | See groovy_dance above |
Behavior
- Reactions fire at most once per conversation turn by default.
- Set
repeatable: trueto allow multiple triggers per turn (deduplicated by entity text for entity triggers). paramsare passed as**kwargsto the callback.
Callback signature
Each callback value maps to a Python module in the profile folder exporting an async function:
async def my_callback(deps: ToolDependencies, match: TriggerMatch, **kwargs) -> None:
...
match.words contains matched keywords; match.entities contains matched entities (with .text, .label, .confidence).
LLM tools
Tools the LLM can call during a conversation:
| Tool | Action |
|---|---|
speak |
Synthesize and play a speech segment (used internally by the pipeline) |
dance |
Queue a dance from the dances library |
stop_dance |
Clear queued dances |
play_emotion |
Play a recorded emotion clip from pollen-robotics/reachy-mini-emotions-library |
stop_emotion |
Clear queued emotions |
move_head |
Move head to a named position (left/right/up/down/front) |
see_image_through_camera |
Capture a camera frame and send it to the LLM for analysis |
describe_camera_image |
Describe the current camera view |
head_tracking |
Enable/disable head-tracking (requires --head-tracker) |
task_status |
Check status of a background task |
task_cancel |
Cancel a running background task |
do_nothing |
Explicitly remain idle |
Profiles can also define custom tools (e.g. turn_left, turn_right, center_position in the default profile).
Advanced features
External profiles and tools
Store profiles and tools outside the source tree:
external_content/
βββ external_profiles/
β βββ my_profile/
β βββ instructions.txt
β βββ tools.txt
β βββ voice.txt
βββ external_tools/
βββ my_custom_tool.py
Set in .env:
REACHY_MINI_CUSTOM_PROFILE=my_profile
REACHY_MINI_EXTERNAL_PROFILES_DIRECTORY=./external_content/external_profiles
REACHY_MINI_EXTERNAL_TOOLS_DIRECTORY=./external_content/external_tools
- Default mode:
tools.txtmust list every tool explicitly. Names resolve against built-in tools first, then external tools. - Auto-load mode (
AUTOLOAD_EXTERNAL_TOOLS=1): all*.pymodules in the external tools directory are loaded automatically.
Multiple robots on the same subnet
reachy-mini-conversation-app --robot-name <name>
<name> must match the daemon's --robot-name value.
Autotest mode
Run the full pipeline with synthetic text utterances instead of a microphone:
reachy-mini-conversation-app --autotest
reachy-mini-conversation-app --autotest my_test_script.txt
Each line in the test file is treated as a user utterance. Useful for end-to-end testing without audio hardware.
OpenAI Realtime mode (legacy)
The original audio-to-audio mode using OpenAI's realtime API is still available:
reachy-mini-conversation-app --realtime --gradio
This bypasses the cascade pipeline entirely. Requires OPENAI_API_KEY.
Contributing
We welcome bug fixes, features, profiles, and documentation improvements. Please review our contribution guide for branch conventions, quality checks, and PR workflow.
Quick start:
- Fork and clone the repo
- Follow the installation steps (include the
devdependency group) - Run contributor checks listed in CONTRIBUTING.md
License
Apache 2.0
