This project implements a voice-controlled robotic fish assistant using Google's Gemini Live API. The fish can listen to voice commands, process them with Gemini, and respond with both voice and physical movements.
- Real-time voice interaction using Gemini Live API
- Live mouth movement synchronized with audio output
- Physical movements (head and tail) through tool calling
- Wake word detection based on sound energy
- Ambient sound effects during processing
- Python 3.8+
- Google API key with access to Gemini models
- ElevenLabs API key (for the action processor)
- Arduino-based fish hardware connected via serial
-
Create a virtual environment:
python -m venv .venv source .venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables in
.env
:GOOGLE_API_KEY=your_google_api_key ELEVENLABS_API_KEY=your_elevenlabs_api_key
-
Make sure the fish hardware is connected via USB
To start the fish assistant:
python gemini_fish.py
The fish will wait for a "wake word" (loud noise) and then:
- Play a beep sound and start listening
- Process your spoken command using Gemini
- Respond with voice output and physical movements
- Return to listening mode
gemini_fish.py
- Main script with Gemini Live API integrationaction_processor.py
- Handles audio processing and fish movementstooling.py
- Controls the physical hardware movements- Sound files (beep.wav, microwave_ambient.wav) - Audio cues
The system uses:
- Gemini's Live API for real-time voice conversation
- Function calling to trigger physical movements during speech
- Energy-based audio analysis for mouth movement synchronization
- Simple energy threshold for wake word detection
You can:
- Add more movement tools by extending the tool declarations
- Improve wake word detection with a more sophisticated model
- Customize the system instruction to change the fish's personality