Monika is an AI-powered assistant that combines speech-to-text (STT), natural language processing (NLP), and text-to-speech (TTS) capabilities. It uses Whisper for transcription, Gemini for text processing, RealtimeTTS for speech synthesis, and Orpheus for expressing emotions during conversations.
- Speech-to-Text (STT): Converts spoken audio into text using OpenAI's Whisper.
- Natural Language Processing (NLP): Processes user input with Google Gemini for refined responses.
- Text-to-Speech (TTS): Synthesizes natural-sounding speech using RealtimeTTS.
- Emotional Expression: Utilizes Orpheus to express emotions during conversations.
- Voice Activity Detection (VAD): Automatically detects when the user is speaking.
- Interactive Web Interface: A user-friendly interface for seamless interaction.
Watch Monika in action:
- Python 3.8 or higher
-
Clone the repository:
git clone <repository-url> cd my_app
-
Install requirements:
pip install -r requirements.txt
-
Set up environment variables:
- Create a
.env
file in the root directory. - Add the following variables:
GEMINI_API_KEY=your-gemini-api-key
- Create a
-
Start the Flask server:
python app.py
-
Open the web interface:
- Navigate to
http://localhost:5000
in your browser.
- Navigate to
-
Interact with Monika:
- Speak into your microphone to start a conversation.
- Monika will transcribe, process, and respond to your input.
/
: Main web interface./transcribe
: Handles audio transcription./gemini_process
: Processes text with Gemini./tts
: Streams synthesized speech.
- Whisper model not loading: Ensure the
whisper
library is installed and the model size is supported. - TTS issues: Verify the RealtimeTTS engine is properly configured.
- Gemini errors: Check if the API key is valid and the environment variable is set.
This project is licensed under the MIT License. See the LICENSE
file for details.
- Reduce TTS Latency: Address latency issues in the text-to-speech model for more fluid conversations.
- Interruption Handling: Implement the ability for users to interrupt Monika while she's speaking.
- Expanded Language Support: Add support for multiple languages in both STT and TTS modules.
- Custom Voice Options: Allow users to select different voices for the assistant.
- Offline Mode: Develop capabilities for basic functionality without internet connectivity.