Kokoro TTS Demo is a mobile application that demonstrates high-quality text-to-speech capabilities running entirely on-device using ONNX Runtime. This app showcases how modern neural TTS models can be deployed on mobile devices without requiring cloud connectivity.
Click the image above to watch the demo video on YouTube
- 🔊 High-quality neural text-to-speech
- 📱 Runs 100% on-device (no internet required after initial download)
- 🎭 Multiple voices with different accents and styles
- 🔄 Adjustable speech speed
- 📊 Performance metrics for speech generation
- 📦 Multiple model options with different size/quality tradeoffs
Kokoro TTS uses a neural text-to-speech model converted to ONNX format, which allows it to run efficiently on mobile devices using ONNX Runtime. The app follows these steps to generate speech:
- Text Normalization: Prepares the input text for processing
- Phonemization: Converts text to phonetic representation
- Tokenization: Converts phonemes to token IDs
- Neural Inference: Processes tokens through the ONNX model
- Audio Generation: Converts model output to audio waveforms
- Playback: Plays the generated audio through device speakers
- React Native: Core framework for cross-platform mobile development
- Expo: Development platform for React Native
- ONNX Runtime: High-performance inference engine for ONNX models
- Expo AV: Audio playback capabilities
- Expo FileSystem: File management for model and voice data
- Node.js (v14 or later)
- Expo CLI
- iOS device with iOS 13+ (for development)
- Xcode (for iOS builds)
- Android Studio (for Android builds)
-
Clone the repository:
git clone https://github.com/isaiahbjork/expo-kokoro-onnx.git cd expo-kokoro-onnx
-
Install dependencies:
npm install
-
Start the Expo development server:
npx expo start
To run the app on a physical device with full ONNX Runtime support:
npx eas build --platform ios --profile development
- Select a Model: Choose from different model sizes based on your device capabilities
- Download a Voice: Select and download one of the available voices
- Adjust Speed: Set the speech rate using the speed controls
- Enter Text: Type or paste the text you want to convert to speech
- Generate Speech: Press the "Generate Speech" button to create and play the audio
The app supports multiple model variants with different size and quality tradeoffs:
Model | Size | Quality | Description |
---|---|---|---|
Full Precision | 326 MB | Highest | Best quality, largest size |
FP16 | 163 MB | High | High quality, reduced size |
Q8F16 | 86 MB | Good | Balanced quality and size |
Quantized | 92.4 MB | Medium | Reduced quality, smaller size |
The app includes multiple voices with different characteristics:
- American English (Male/Female)
- British English (Male/Female)
- Various voice styles and characteristics
/kokoro
: Core TTS implementationkokoroOnnx.ts
: Main TTS engine implementationmodels.ts
: Model management and downloadingvoices.ts
: Voice definitions and management
App.tsx
: Main application UIapp.json
: Expo configurationmetro.config.js
: Metro bundler configuration
- KokoroOnnx: Main class that handles TTS functionality
- Model Management: Functions for downloading and managing models
- Voice Management: Functions for downloading and using voice data
- UI Components: React Native components for the user interface
This project is licensed under the MIT License - see the LICENSE file for details.
- ONNX Runtime team for the mobile inference engine
- Expo team for the development platform
- Contributors to the open-source TTS models