Stars
Production First and Production Ready End-to-End Text-to-Speech Toolkit
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Speech-to-text server framework with next-gen Kaldi
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
sw005320 / espnet-1
Forked from espnet/espnetEnd-to-End Speech Processing Toolkit
a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi
Pytorch Implementation (unofficial) of the paper "Mean Flows for One-step Generative Modeling" by Geng et al.
Repo for SeedVR2 & SeedVR (CVPR2025 Highlight)
A tool for creating and running Linux containers using lightweight virtual machines on a Mac. It's written in Swift, and optimized for Apple silicon.
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Faster Whisper transcription with CTranslate2
Port of OpenAI's Whisper model in C/C++
vits2 backbone with multilingual-bert
Text Normalization & Inverse Text Normalization