A fast, local pronunciation and speech synthesis assistant powered by piper-phonemize, optimized for language learners and anyone looking to improve their spoken fluency across dozens of languages. This GUI application supports IPA phoneme visualization and real-time audio playback using neural voice models.
- Text-to-IPA phonemization using eSpeak-ng
- Real-time audio synthesis with neural voice models (ONNX format)
- Adjustable length scale (speaking speed) and volume
- Multi-language support with auto-discovered voices
- Simple and responsive GUI built with wxWidgets
- Runs fully offline on Windows and Linux
- Download a voice model (see Voices below)
- Place both
.onnx
and.onnx.json
files into themodels/
directory - Run the app, choose a model, type a phrase, and click Pronounce
Welcome to the world of speech synthesis!
→ wˈɛlkʌm tə ðə wˈɜːld ʌv spˈiːtʃ sˈɪnθəsˌɪs
Then listen to the synthesized audio with realistic pronunciation.
This application uses the same voices as the Piper TTS project, trained with VITS and exported for ONNX Runtime.
- Arabic (ar_JO)
- Catalan (ca_ES)
- Czech (cs_CZ)
- Welsh (cy_GB)
- Danish (da_DK)
- German (de_DE)
- Greek (el_GR)
- English (en_GB, en_US)
- Spanish (es_ES, es_MX)
- Finnish (fi_FI)
- French (fr_FR)
- Hungarian (hu_HU)
- Icelandic (is_IS)
- Italian (it_IT)
- Georgian (ka_GE)
- Kazakh (kk_KZ)
- Luxembourgish (lb_LU)
- Nepali (ne_NP)
- Dutch (nl_BE, nl_NL)
- Norwegian (no_NO)
- Polish (pl_PL)
- Portuguese (pt_BR, pt_PT)
- Romanian (ro_RO)
- Russian (ru_RU)
- Serbian (sr_RS)
- Swedish (sv_SE)
- Swahili (sw_CD)
- Turkish (tr_TR)
- Ukrainian (uk_UA)
- Vietnamese (vi_VN)
- Chinese (zh_CN)
📁 Each voice requires:
- A
.onnx
file (neural model)- A corresponding
.onnx.json
configuration file
You can download voices from the Piper Voices repository or see the VOICES.md list.
MODEL_CARD
file before use.
Download prebuilt releases from the Releases page, or build from source:
- Clone this repo
- Run
cmake
first, and thenmake
or open in your IDE of choice
Your voice models should be placed in a models/
folder next to the binary.
After launching the app:
- Select a language model from the dropdown
- Enter any word or phrase in the text box
- Click Look up to see the IPA symbols
- Click Pronounce to hear the voice synthesis
- Adjust speed and volume using the sliders
All synthesis and audio playback happen locally and offline.
- GUI: wxWidgets
- TTS Engine: Piper (ONNX Runtime, VITS)
- Phonemizer: espeak-ng via piper-phonemize
- Audio Output: PortAudio
- Language Support: Multilingual, via ONNX+JSON model pairs
This app is open source and intended for educational, research, and personal language learning use. See LICENSE for more information.