reachy-mini-chatbox / README.md
dlouapre's picture
dlouapre HF Staff
Deploy reachy-mini-chatbox
534b431
metadata
title: Reachy Mini ChatBox
emoji: πŸ’¬
colorFrom: red
colorTo: blue
sdk: static
pinned: false
short_description: Live chat with Reachy Mini!
suggested_storage: large
tags:
  - reachy_mini
  - reachy_mini_python_app

Reachy Mini ChatBox

A modular voice conversation app for the Reachy Mini robot. Cheap to run, versatile, customizable, and fun.

ChatBox uses a cascade pipeline β€” ASR β†’ LLM β†’ TTS β€” where each stage is a swappable provider. Mix cloud APIs and local models, tweak the personality, add live reactions to keywords, and let the robot dance.

Reachy Mini Dance

How it works

🎀 Microphone
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     πŸ”Š Speaker
β”‚   ASR   β”‚ ──▢ β”‚   LLM   β”‚ ──▢ β”‚   TTS   β”‚ ──▢ πŸ€– Robot moves
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 Speech to       Reasoning       Text to
 text            + tool calls    speech

Voice Activity Detection (VAD) segments microphone input. The ASR provider transcribes it, the LLM generates a response (with optional tool calls for head movements, dances, emotions, camera…), and the TTS provider speaks it back β€” all while the robot animates.

Installation

Before using this app, install Reachy Mini's SDK.

Using uv (recommended)
# Create venv
uv venv --python python3.12 .venv
source .venv/bin/activate

# Install with cascade pipeline
uv sync --extra cascade

Install additional providers as needed:

uv sync --extra cascade_parakeet_progressive  # Parakeet ASR (Apple Silicon)
uv sync --extra cascade_kokoro                # Kokoro TTS (local)
uv sync --extra cascade_gemini                # Gemini LLM
uv sync --extra cascade_elevenlabs            # ElevenLabs TTS
uv sync --extra cascade_deepgram              # Deepgram ASR
uv sync --extra cascade_parakeet              # Parakeet batch/RNNT ASR (Apple Silicon)
uv sync --extra cascade_voxtral_mlx           # Voxtral ASR (Apple Silicon)
uv sync --extra cascade_nemotron              # NeMo ASR (CUDA)
uv sync --extra cascade_gradium               # Gradium TTS (Python β‰₯3.12)
uv sync --extra cascade_all                   # All cascade providers

Vision and head-tracking extras:

uv sync --extra yolo_vision          # YOLO head-tracking
uv sync --extra mediapipe_vision     # MediaPipe head-tracking
uv sync --extra local_vision         # Local VLM (SmolVLM2)
uv sync --extra all_vision           # All vision features

Combine extras freely:

uv sync --extra cascade --extra cascade_kokoro --extra cascade_gemini --extra yolo_vision --group dev

Tip: Use uv sync --frozen to install from the lockfile without re-resolving.

Using pip
python -m venv .venv
source .venv/bin/activate
pip install -e ".[cascade]"

# Add providers
pip install -e ".[cascade_kokoro]"
pip install -e ".[cascade_gemini]"
# etc.

Optional extras reference

Extra Purpose Notes
cascade Base cascade pipeline (sounddevice, librosa, PyYAML) Required for cascade mode
cascade_parakeet_progressive Parakeet MLX progressive ASR Apple Silicon only
cascade_kokoro Kokoro local TTS Any PyTorch platform
cascade_gemini Google Gemini LLM Requires GEMINI_API_KEY
cascade_deepgram Deepgram streaming ASR Requires DEEPGRAM_API_KEY
cascade_elevenlabs ElevenLabs TTS Requires ELEVENLABS_API_KEY
cascade_parakeet Parakeet batch/RNNT ASR Apple Silicon only
cascade_voxtral_mlx Voxtral Mini 4B multilingual ASR Apple Silicon only
cascade_nemotron NeMo Parakeet/Nemotron ASR CUDA GPU required
cascade_silero_vad Silero VAD (torch-based) Optional VAD backend
cascade_gradium Gradium streaming TTS Python β‰₯3.12, requires GRADIUM_API_KEY
cascade_all All cascade providers Excludes gradium (Python β‰₯3.12)
local_vision Local VLM (SmolVLM2) via PyTorch/Transformers GPU recommended
yolo_vision YOLO head-tracking CPU or GPU
mediapipe_vision MediaPipe head-tracking CPU
all_vision All vision extras β€”

Configuration

Environment variables

Copy .env.example to .env and fill in the API keys for the providers you use:

Variable Used by
OPENAI_API_KEY Whisper ASR, OpenAI Realtime ASR, GPT LLMs, OpenAI TTS
GEMINI_API_KEY Gemini LLMs
DEEPGRAM_API_KEY Deepgram ASR
ELEVENLABS_API_KEY ElevenLabs TTS
GRADIUM_API_KEY Gradium TTS
HF_HOME Cache dir for Hugging Face models (default: ./cache)
HF_TOKEN Hugging Face access for gated models
REACHY_MINI_CUSTOM_PROFILE Select a profile (default: default)
REACHY_MINI_EXTERNAL_PROFILES_DIRECTORY Path to external profiles
REACHY_MINI_EXTERNAL_TOOLS_DIRECTORY Path to external tool modules
AUTOLOAD_EXTERNAL_TOOLS Set 1 to auto-load all external tools

cascade.yaml

The cascade.yaml file at the project root controls which providers are used and their settings. Structure:

asr:
  provider: parakeet_mlx_progressive   # Which ASR to use
  providers:
    parakeet_mlx_progressive:          # Provider definitions
      module: parakeet_mlx_progressive
      class: ParakeetMLXProgressiveASR
      streaming: true
      location: local
      hardware: apple_silicon
      # Provider-specific settings
      model: mlx-community/parakeet-tdt-0.6b-v3
      precision: float16

llm:
  provider: gemini-2.5-flash-lite      # Which LLM to use
  temperature: 1.0
  providers: { ... }

tts:
  provider: kokoro                     # Which TTS to use
  trim_silence: true
  providers: { ... }

Change provider: under each section to switch. You can also override from the CLI with --asr-provider, --llm-provider, or --tts-provider.

ASR providers

Provider Location Streaming Hardware Install extra API key
parakeet_mlx_progressive Local Yes Apple Silicon cascade_parakeet_progressive β€”
voxtral_mlx Local Yes Apple Silicon cascade_voxtral_mlx β€”
parakeet_nemo_progressive Local Yes CUDA cascade_nemotron β€”
nemotron Local Yes CUDA cascade_nemotron β€”
deepgram Cloud Yes β€” cascade_deepgram DEEPGRAM_API_KEY
openai_realtime_asr Cloud Yes β€” β€” OPENAI_API_KEY
whisper_openai Cloud No (batch) β€” β€” OPENAI_API_KEY

LLM providers

Provider Location Model Install extra API key
gemini-2.5-flash-lite Cloud gemini-2.5-flash-lite cascade_gemini GEMINI_API_KEY
gemini-3.1-flash-lite Cloud gemini-3.1-flash-lite-preview cascade_gemini GEMINI_API_KEY
gpt-4o-mini Cloud gpt-4o-mini β€” OPENAI_API_KEY
gpt-5.2-chat Cloud gpt-5.2-chat-latest β€” OPENAI_API_KEY

TTS providers

Provider Location Hardware Install extra API key
kokoro Local Any (PyTorch) cascade_kokoro β€”
tts_openai Cloud β€” β€” OPENAI_API_KEY
elevenlabs Cloud β€” cascade_elevenlabs ELEVENLABS_API_KEY
gradium Cloud β€” cascade_gradium GRADIUM_API_KEY

Running the app

Make sure the Reachy Mini daemon is running before launching. See Reachy Mini's SDK for setup.

reachy-mini-conversation-app --gradio

CLI options

Option Default Description
--gradio False Launch Gradio web UI at http://127.0.0.1:7860/. Without this, runs in console mode with VAD.
--asr-provider NAME from yaml Override ASR provider (e.g. deepgram, whisper_openai)
--llm-provider NAME from yaml Override LLM provider (e.g. gpt-4o-mini, gemini-2.5-flash-lite)
--tts-provider NAME from yaml Override TTS provider (e.g. kokoro, elevenlabs)
--head-tracker {yolo,mediapipe} None Enable head-tracking. Requires the matching vision extra.
--no-camera False Run without camera.
--local-vision False Use local VLM (SmolVLM2) instead of cloud vision. Requires local_vision extra.
--autotest [FILE] β€” Run automated testing with text utterances (default file: cascade/autotest.txt).
--realtime False Use OpenAI realtime audio-to-audio API instead of cascade.
--robot-name NAME None Connect to a specific robot when multiple daemons run on the same subnet.
--debug False Verbose logging.

Examples

# Gradio UI with YOLO head-tracking
reachy-mini-conversation-app --gradio --head-tracker yolo

# Console mode (no UI, VAD-based)
reachy-mini-conversation-app

# Override providers from CLI
reachy-mini-conversation-app --gradio --asr-provider deepgram --llm-provider gpt-4o-mini

# Audio-only (no camera)
reachy-mini-conversation-app --gradio --no-camera

# Automated testing
reachy-mini-conversation-app --autotest

Profiles

Profiles define the robot's personality: what it says, how it sounds, and which tools it can use.

Each profile is a folder under src/reachy_mini_conversation_app/profiles/<name>/ containing:

File Required Purpose
instructions.txt Yes System prompt β€” the robot's personality and behavior rules
tools.txt Recommended Enabled tools, one per line. Falls back to default/tools.txt if missing.
voice.txt Optional TTS voice name override (single line)
reactions.yaml Optional Live reaction triggers (see Live reactions)
*.py Optional Custom tool implementations

Selecting a profile

  • Environment variable: REACHY_MINI_CUSTOM_PROFILE=pirate in .env
  • Gradio UI: Open the "Personality" accordion to switch profiles, edit instructions, or create new ones.

Template placeholders in instructions.txt

Reuse shared prompt snippets by referencing files under src/reachy_mini_conversation_app/prompts/:

[passion_for_lobster_jokes]
[identities/witty_identity]

Locked profile mode

Set LOCKED_PROFILE in src/reachy_mini_conversation_app/config.py to lock the app to a single profile. The Gradio UI shows "(locked)" and disables profile editing. Useful for dedicated app variants.

Live reactions

Reactions let the robot respond to keywords or entities in real time β€” while the user is still speaking β€” without waiting for the LLM.

Define them in profiles/<name>/reactions.yaml:

- name: music_excitement
  callback: excited_about_music
  trigger:
    words: [music, guitar, piano, drum, violin]

- name: food_reaction
  callback: react_to_food_entity
  trigger:
    entities: [food]
  repeatable: true

- name: groovy_dance
  callback: do_groovy_dance
  trigger:
    all:
      - words: [danc*]
      - words: [groov*]

- name: reachy_name
  callback: react_to_name
  trigger:
    words: [reachy, richie, reechy]
  params:
    emotion: helpful1

Trigger types

Type Description Example
words Keyword/glob match on transcript [music, danc*, "grand piano"]
entities Named entity recognition via GLiNER [food, person, location]
all Boolean AND β€” all sub-triggers must match See groovy_dance above

Behavior

  • Reactions fire at most once per conversation turn by default.
  • Set repeatable: true to allow multiple triggers per turn (deduplicated by entity text for entity triggers).
  • params are passed as **kwargs to the callback.

Callback signature

Each callback value maps to a Python module in the profile folder exporting an async function:

async def my_callback(deps: ToolDependencies, match: TriggerMatch, **kwargs) -> None:
    ...

match.words contains matched keywords; match.entities contains matched entities (with .text, .label, .confidence).

LLM tools

Tools the LLM can call during a conversation:

Tool Action
speak Synthesize and play a speech segment (used internally by the pipeline)
dance Queue a dance from the dances library
stop_dance Clear queued dances
play_emotion Play a recorded emotion clip from pollen-robotics/reachy-mini-emotions-library
stop_emotion Clear queued emotions
move_head Move head to a named position (left/right/up/down/front)
see_image_through_camera Capture a camera frame and send it to the LLM for analysis
describe_camera_image Describe the current camera view
head_tracking Enable/disable head-tracking (requires --head-tracker)
task_status Check status of a background task
task_cancel Cancel a running background task
do_nothing Explicitly remain idle

Profiles can also define custom tools (e.g. turn_left, turn_right, center_position in the default profile).

Advanced features

External profiles and tools

Store profiles and tools outside the source tree:

external_content/
β”œβ”€β”€ external_profiles/
β”‚   └── my_profile/
β”‚       β”œβ”€β”€ instructions.txt
β”‚       β”œβ”€β”€ tools.txt
β”‚       └── voice.txt
└── external_tools/
    └── my_custom_tool.py

Set in .env:

REACHY_MINI_CUSTOM_PROFILE=my_profile
REACHY_MINI_EXTERNAL_PROFILES_DIRECTORY=./external_content/external_profiles
REACHY_MINI_EXTERNAL_TOOLS_DIRECTORY=./external_content/external_tools
  • Default mode: tools.txt must list every tool explicitly. Names resolve against built-in tools first, then external tools.
  • Auto-load mode (AUTOLOAD_EXTERNAL_TOOLS=1): all *.py modules in the external tools directory are loaded automatically.
Multiple robots on the same subnet
reachy-mini-conversation-app --robot-name <name>

<name> must match the daemon's --robot-name value.

Autotest mode

Run the full pipeline with synthetic text utterances instead of a microphone:

reachy-mini-conversation-app --autotest
reachy-mini-conversation-app --autotest my_test_script.txt

Each line in the test file is treated as a user utterance. Useful for end-to-end testing without audio hardware.

OpenAI Realtime mode (legacy)

The original audio-to-audio mode using OpenAI's realtime API is still available:

reachy-mini-conversation-app --realtime --gradio

This bypasses the cascade pipeline entirely. Requires OPENAI_API_KEY.

Contributing

We welcome bug fixes, features, profiles, and documentation improvements. Please review our contribution guide for branch conventions, quality checks, and PR workflow.

Quick start:

License

Apache 2.0