This project provides a flexible text-to-speech (TTS) solution that can process input from a file or direct text input, with both CLI and web interfaces. It translates the input to the target language if requested, converts it to speech using the Microsoft Edge TTS engine, and plays the audio while displaying synchronized text output. The system processes the text in chunks, allowing for smoother playback of long texts.
- Multiple Interfaces: CLI, web, and MCP server interfaces for maximum flexibility
- Input Options: Accepts input from a file or direct text input
- Translation Support: Optional translation to target language using Google Translate
- High-Quality TTS: Uses Microsoft Edge TTS for natural-sounding speech
- Streaming Playback: Processes text in chunks for smoother playback
- Session Management: Unified file management with session-based organization
- MCP Integration: Model Context Protocol server for AI assistant integration
- Docker Support: Easy deployment with Docker and docker-compose
- Python 3.12+
- See dependencies in
pyproject.toml
- Ensure you have Python 3.12 or later installed on your system.
- Clone or download the project to your local machine.
- From the project root, install the package and dependencies:
pip install .
This will install all required dependencies and make the tts
CLI available globally.
tts/
├── docs/ # Documentation
│ ├── QUICK_START.md # Quick start guide
│ ├── FILE_MANAGEMENT.md # File management documentation
│ └── ... # Other documentation
├── src/ # Source code
│ ├── audio_player.py # Audio playback management
│ ├── file_manager.py # Unified file management system
│ ├── main.py # CLI entry point
│ ├── text_processor.py # Text processing
│ ├── translator.py # Translation functionality
│ ├── tts_generator.py # TTS generation
│ ├── tts_service.py # Unified TTS service layer
│ └── web_server.py # Web interface
├── tests/ # Test files
│ ├── test_unified_file_management.py
│ └── test_unified_system.py
├── examples/ # Example files
│ └── data/ # Sample data files
├── scripts/ # Utility scripts
│ └── convert_pdfs_to_txt.py
├── static/ # Static web files
├── temp/ # Generated audio and text files
│ └── session_*/ # Session-based directories for artifacts
├── config.yaml # Configuration file
├── Dockerfile # Docker configuration
├── docker-compose.yml # Docker Compose configuration
├── pyproject.toml # Python project configuration
└── Makefile # Build automation
After installation, you can use the CLI from anywhere:
- Default mode (uses
input_file
from config):
tts
- Process a specific file:
tts -f examples/data/input.txt
- Process direct text input:
tts -t "Hello, how are you?"
- Process input from stdin (pipe):
echo "Hello world" | tts -t -
cat input.txt | tts -t -
date | tts -t -
By default, the tool uses the original text with no translation.
To translate to a specific language, use the --language
(or -l
) flag with the target language code (e.g., en
for English, fr
for French):
tts -t "Bonjour tout le monde" --language en
tts -f examples/data/input.txt --language fr
If --language
is not provided, the original text is used without translation.
You can start the web server in two ways:
# Start the web server with default settings
python -m src.main --server
# Start with custom host and port
python -m src.main --server --host 0.0.0.0 --port 8080
# Development mode with auto-reload
python -m src.main --server --reload --verbose
# Or use make shortcuts
make run-web # Development mode
make run-web-prod # Production mode
# Start the web server directly
python -m src.web_server
# Or use make
make run-web-direct
# Access the web interface at http://localhost:8000
The TTS system includes a Model Context Protocol (MCP) server that allows AI assistants to use TTS functionality directly.
# Install and configure MCP server
make install-mcp
# Test MCP server functionality
make test-mcp
# Run examples
python examples/mcp_usage_examples.py
Available MCP Tools:
synthesize_speech
: Convert text to speechtranslate_text
: Translate text to another languageget_available_voices
: List available TTS voicesstream_tts_synthesis
: Stream TTS for long textsget_system_status
: Check system healthcleanup_files
: Clean up old audio files
MCP Resources:
tts://config
: View system configurationtts://voices
: List all available voicestts://status
: Check system status
See MCP_README.md for detailed MCP integration documentation.
# Build and run with docker-compose
docker-compose up -d
You can customize the tool's behavior by editing the config.yaml
file in the project root:
input_file
: Default input text file path (relative to project root)translated_file
: Path for the translated text file (relative to project root)output_directory
: Directory to store generated audio filesspecial_characters
: Characters used to split the text into chunksdelimiter
: Delimiter used in word boundary filestts_voice
: Default TTS voice (seeedge-tts --list-voices
for options)
You can also override any setting using environment variables (e.g., TTS_INPUT_FILE
, TTS_OUTPUT_DIRECTORY
).
The system uses a unified file management approach that organizes all artifacts in session-based directories:
temp/
├── session_YYYYMMDD_HHMMSS_uuid/ # Session-specific directory
│ ├── audio/ # Audio files (.wav)
│ ├── text/ # Text files with word boundaries (.txt)
│ ├── translation/ # Translation files
│ └── cache/ # Cache files (voices, etc.)
└── session_*/ # Other sessions...
For more details, see docs/FILE_MANAGEMENT.md.
The main application class that orchestrates the entire TTS process.
Unified service layer that bridges CLI and web functionality.
Handles text translation using Google Translate.
Generates TTS audio using the Microsoft Edge TTS engine.
Manages audio playback and synchronized text display.
Processes and splits text into chunks.
Handles file operations, organization, and cleanup with session-based management.
# Install development dependencies
make install-dev
# Run tests
make test
# Format code
make format
# Run linting
make lint
# Clean up temporary files
make clean-output
- Quick Start Guide: Get started quickly with the TTS system
- File Management: Learn about the unified file management system
- Component Comparison: Understand the different components
The tool includes error handling for common issues such as missing input files or TTS generation errors. Error messages will be displayed in the console if any issues occur during execution.
- Audio playback relies on the pygame library, which may have platform-specific limitations.
- Translation quality depends on the Google Translate service.
Feel free to fork this project, submit issues, or provide pull requests to improve the tool.
This project is licensed under the MIT License. See the LICENSE file for details.