This application is a conversational AI assistant named Ada, built with a Python backend (Flask, SocketIO, Google Gemini) and a React frontend. It supports text input, client-side speech-to-text (Web Speech API), text-to-speech (ElevenLabs), webcam video frame processing, and integrates with external APIs for weather (python_weather), maps/directions (googlemaps), and web search (googlesearch-python, aiohttp, BeautifulSoup). Communication between the frontend and backend happens in real-time using WebSockets (SocketIO).
-
Backend (
backend/app.py
,backend/ADA_Online.py
):- A Flask server manages HTTP requests and SocketIO connections.
- SocketIO handles real-time bidirectional communication with the React frontend.
- An
ADA
class instance (ADA_Online.py
) encapsulates the core assistant logic. - It uses
asyncio
within a separate thread to manage asynchronous tasks like interacting with the Gemini API, handling TTS streams, and processing inputs without blocking the Flask server. - It connects to the Google Gemini API (
google-generativeai
) for conversational AI capabilities, configured with specific system instructions and tool functions (weather, travel duration, search). - It receives text input, transcribed speech, and video frames from the client via SocketIO.
- It processes text and video frames, sending them to the Gemini API.
- It handles function calls requested by Gemini, executing corresponding Python functions (e.g.,
get_weather
,get_travel_duration
,get_search_results
). - The search function fetches URLs and then asynchronously extracts content (title, snippet, paragraph text) from those pages using
aiohttp
andBeautifulSoup
. - It streams Gemini's text responses back to the client chunk by chunk via SocketIO.
- It streams text responses to the ElevenLabs TTS API via WebSocket to generate audio.
- Generated audio chunks (PCM) are received from ElevenLabs and streamed back to the client via SocketIO.
- API results (weather, maps, search) are also emitted to the client via dedicated SocketIO events to update specific UI widgets.
-
Frontend (
frontend/src/App.jsx
, components):- A React application provides the user interface.
- It establishes a SocketIO connection to the backend server.
- It renders components for chat display (
ChatBox
), user input (InputArea
), status messages (StatusDisplay
), AI visualization (AiVisualizer
), webcam feed (WebcamFeed
), and widgets for weather (WeatherWidget
), maps (MapWidget
), code execution (CodeExecutionWidget
), and search results (SearchResultsWidget
). - Input:
- Text input is sent via the
send_text_message
SocketIO event. - The Web Speech API is used for client-side speech recognition. Final transcripts are sent via the
send_transcribed_text
event. - If the webcam is enabled, video frames are captured periodically from a
<video>
element onto a<canvas>
, converted to JPEG data URLs, and sent via thesend_video_frame
event.
- Text input is sent via the
- Output:
- Status messages and errors from the backend are displayed.
- Text chunks received via
receive_text_chunk
are assembled and displayed in the chatbox. - Base64 encoded audio chunks received via
receive_audio_chunk
are queued and played back using the Web Audio API (AudioContext
). - Data received via
weather_update
,map_update
,executable_code_received
, andsearch_results_update
updates the state, causing the respective widgets to render or update.
- The
AiVisualizer
component changes appearance based on Ada's status (idle, listening, speaking). - The
WebcamFeed
component handles accessing the user's camera, displaying the feed (mirrored), and capturing frames. - Other widgets (
Weather
,Map
,Code
,Search
) are displayed conditionally when relevant data is received from the backend.
These instructions assume you have Git, Python 3.7+, pip, and Node.js (with npm) installed on your system.
-
Clone the Repository: Open your terminal or command prompt and clone the project repository from its source (replace
<repository_url>
with the actual URL):git clone <repository_url> cd <repository_directory_name> # Navigate into the cloned project directory
-
Backend Setup: Follow the steps in the Backend Setup (Python) section.
-
Frontend Setup: Follow the steps in the Frontend Setup (React) section.
-
Configuration: Create and populate the
.env
file as described in the Configuration section. -
Run the Application: Follow the steps in the Running the Application section.
-
Navigate to Backend Directory: From the root project directory you cloned, navigate into the backend folder:
cd backend # Or the name of your backend directory
-
Create & Activate Virtual Environment: It's highly recommended to use a virtual environment to manage dependencies.
python -m venv venv source venv/bin/activate # On Linux/macOS # OR # venv\Scripts\activate # On Windows Command Prompt/PowerShell
You should see
(venv)
prefixing your terminal prompt. -
Install Dependencies: Ensure you have a
requirements.txt
file in thebackend
directory with the following content:# requirements.txt Flask Flask-SocketIO python-dotenv google-generativeai torch # Or torch-cpu if no CUDA GPU / for simpler setup python-weather googlemaps websockets googlesearch-python aiohttp beautifulsoup4 lxml # Parser for BeautifulSoup requests # Often a dependency eventlet # Recommended async mode for Flask-SocketIO
Install the packages using pip:
pip install -r requirements.txt
(Note:
torch
installation can be complex. If you encounter issues or don't have an NVIDIA GPU, consider usingtorch-cpu
inrequirements.txt
. Visit the PyTorch website for specific installation instructions for your system if needed.) -
Configuration: Make sure you have created the
.env
file inside thisbackend
directory as detailed in the Configuration section.
-
Navigate to Frontend Directory: From the root project directory you cloned, navigate into the frontend folder:
cd ../frontend # Or the name of your frontend directory (use 'cd ..' first if still in 'backend')
-
Install Dependencies: This command reads the
package.json
file and installs all the necessary Node.js modules.npm install
This will install React,
socket.io-client
,react-youtube
,prop-types
, and any other dependencies defined inpackage.json
.
-
Locate Backend Directory: Ensure you are in the backend directory (e.g.,
cd backend
from the root project folder). -
Create
.env
file: Create a file named exactly.env
. -
Add API Keys and Secrets: Open the
.env
file and add the following lines, replacing the placeholder values with your actual keys and desired settings:# --- Backend API Keys --- # Get from ElevenLabs website ELEVENLABS_API_KEY="YOUR_ELEVENLABS_API_KEY" # Get from Google AI Studio (for Gemini Models) GOOGLE_API_KEY="YOUR_GOOGLE_GEMINI_API_KEY" # Get from Google Cloud Console (Enabled for Directions API) MAPS_API_KEY="YOUR_Maps_API_KEY" # --- Flask Server Settings --- # Used for session security, generate a random string FLASK_SECRET_KEY="a_very_strong_and_random_secret_key_please_change_me" # --- Frontend Settings (for Backend CORS) --- # Port the React frontend development server runs on REACT_APP_PORT="5173" # Default for Vite. Use 3000 for Create React App, or your custom port.
Important:
- Never commit your
.env
file to Git. Add.env
to your.gitignore
file in the backend directory. - Ensure the
MAPS_API_KEY
corresponds to a Google Cloud project where the Directions API is enabled. - Ensure the
GOOGLE_API_KEY
is for Google Gemini models (available via Google AI Studio). - Generate a truly random and strong
FLASK_SECRET_KEY
.
- Never commit your
You will need two separate terminals open: one for the backend and one for the frontend.
-
Start the Backend Server:
- Open Terminal 1.
- Navigate to the
backend
directory. - Activate the Python virtual environment:
source venv/bin/activate
(orvenv\Scripts\activate
on Windows). - Run the Flask application:
python app.py
- Wait for output indicating the server is running (e.g.,
* Running on http://0.0.0.0:5000
and WebSocket server started messages). Leave this terminal running.
-
Start the Frontend Development Server:
- Open Terminal 2.
- Navigate to the
frontend
directory. - Run the start script (use the command appropriate for your project setup):
npm run dev # If using Vite (likely based on main.jsx structure) # OR # npm start # If using Create React App
- This should automatically open the application in your default web browser, pointing to
http://localhost:5173
(or the port specified inREACT_APP_PORT
if configured differently).
-
Use the Application:
- Interact with the interface in your browser. Grant microphone and webcam permissions when prompted if you wish to use those features.
- Stop the Frontend Server: Go to Terminal 2 (where the frontend is running) and press
Ctrl + C
. Confirm if prompted. - Stop the Backend Server: Go to Terminal 1 (where the backend is running) and press
Ctrl + C
. - Deactivate Virtual Environment (Optional): In Terminal 1, you can type
deactivate
.