A Twilio-powered application that converts Indian English accents to British English in real-time during phone calls using streaming audio processing.
- Receive phone calls via a Twilio number
- Real-time transcription using Deepgram Speech-to-Text API
- Convert text to British English voice using Google Cloud Text-to-Speech
- Stream converted speech back to the caller in real-time
- WebSocket support for bidirectional audio streaming
- No file storage - all processing happens in memory for better performance and privacy
- Node.js (v14 or higher)
- npm or yarn
- A Twilio account with a phone number
- Google Cloud Platform account with Text-to-Speech API enabled
- Deepgram account for speech-to-text processing
git clone https://github.com/algorithmio/accent-conversion-ai
cd accent-conversion-ai
npm install
- Go to Google Cloud Console
- Create a new project or select an existing one
- Enable the Text-to-Speech API
- Go to IAM & Admin > Service Accounts
- Click Create Service Account
- Give it a name like "accent-converter"
- Grant the Cloud Text-to-Speech Client role
- Click Done
- Click on your newly created service account
- Go to Keys tab
- Click Add Key > Create New Key
- Choose JSON format
- Download the file
- Copy the downloaded JSON file to
config/creds.json
in your project - The file should look like this:
{
"type": "service_account",
"project_id": "your-project-id",
"private_key_id": "...",
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"client_email": "accent-converter@your-project.iam.gserviceaccount.com",
"client_id": "...",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "..."
}
Create a .env
file in the root directory:
# Twilio Configuration
TWILIO_ACCOUNT_SID=your_twilio_account_sid_here
TWILIO_AUTH_TOKEN=your_twilio_auth_token_here
# Deepgram Configuration
DEEPGRAM_API_KEY=your_deepgram_api_key_here
# Server Configuration
PORT=4001
NODE_ENV=development
npm start
Or run directly:
node server.js
- Set up your Twilio phone number to use a webhook for incoming calls
- For the "A Call Comes In" webhook, use:
https://your-server.com/voice
Note: Your server needs to be accessible via HTTPS for Twilio to connect to it. Use ngrok for local development.
- Install ngrok:
npm install -g ngrok
- Start your server locally:
node server.js
- In another terminal, start ngrok:
ngrok http 4001
- Use the provided HTTPS URL from ngrok in your Twilio webhook configuration.
POST /voice
- Handles incoming Twilio voice calls and sets up WebSocket streaming
GET /health
- Server health status and active connection count
WS /stream
- WebSocket endpoint for Twilio media streaming
The application uses a streaming architecture with the following components:
- DeepgramStreamingService - Real-time speech-to-text transcription
- StreamingAccentConverterV2 - Manages streaming TTS sessions
- StreamingTTSService - Google Cloud Text-to-Speech streaming integration
- Audio is processed and streamed back to Twilio without saving files
- Uses MULAW encoding at 8kHz for Twilio compatibility
- Supports real-time transcription and accent conversion with minimal latency
- Audio files are not saved to disk
- All processing happens in memory for better performance and privacy
- Audio is streamed directly back to the caller
- Advanced text deduplication to prevent repeated audio
- Natural conversation flow with timing-based decisions
- Handles both interim and final transcription results
├── server.js # Main application server
├── src/
│ ├── services/
│ │ ├── DeepgramStreamingService.js # Speech-to-text service
│ │ ├── StreamingAccentConverterV2.js # Main streaming converter
│ │ └── StreamingTTSService.js # Text-to-speech service
│ └── config/
│ └── tts-config.js # TTS configuration
├── config/
│ └── creds.json # Google Cloud credentials
└── package.json
⚠️ Never commit credentials to git (already excluded in .gitignore)- 🔒 Keep your credentials file secure
- 🔄 Rotate keys regularly for production use
- File not found: Make sure
config/creds.json
exists - Invalid format: Check JSON syntax with a validator
- API errors: Ensure Text-to-Speech API is enabled in Google Cloud Console
- Permission errors: Verify service account has correct roles
- Authentication errors: Verify your Deepgram API key in
.env
- Connection issues: Check your internet connection and API limits
- Webhook errors: Ensure your server is accessible via HTTPS
- Audio quality: Verify MULAW encoding and 8kHz sample rate
MIT