A web application that converts speech to text, processes it through an AI language model, and converts the response back to speech using advanced text-to-speech technology.
This all happens in the local browser, nothing is sent to any server.
Important: You NEED a local running chat LLM server like llama-server
- Speech Recognition: Uses Moonshine to transcribe spoken english language only into text
- AI Processing: Sends transcribed text to a language model API for intelligent responses
- Text-to-Speech: Converts the AI response back to speech using Kokoro TTS
- Dark Mode: Modern dark-themed UI for comfortable use
- Moonshine: Speech recognition model by Useful Sensors
- Kokoro: Advanced text-to-speech synthesis engine
- Hugging Face Transformers.js: Client-side machine learning models
- Web Audio API: For audio recording and playback
- Modern JavaScript: ES6+ features including modules, classes, and async/await
- Modern web browser with JavaScript AND WebGPU enabled
- Local or remote server to host the application
- Optional: Web server that supports the language model API endpoint
- Clone the repository
- Host the files on a web server
- Open the application in a web browser
In the Settings tab:
- Set the Chat Inference Server URL to your language model endpoint
- Configure the System Prompt to control the AI assistant's behavior
- Navigate to the Conversation tab
- Click "Start Recording" to begin speaking
- Click "Stop Recording" when finished to process the audio
- Wait for the AI to generate a response
- The response will be spoken aloud using the selected voice
- View the conversation history in the Transcription section
/css
: Stylesheet files for the UI/js
: JavaScript modules for application logicAudioPlayer.js
: Handles audio playbackconversation.js
: Manages the conversation flowkokoro.js
: Text-to-speech implementationstt.js
: Speech-to-text functionalityui.js
: User interface interactions
/index.html
: Main application page
This project uses the following open source technologies:
- Moonshine - Speech recognition model by Useful Sensors
- Kokoro - Text-to-speech synthesis engine
- Hugging Face Transformers.js - Machine learning models in the browser
Apache 2.0