Starred repositories
0x5446 / async_cosyvoice
Forked from qi-hua/async_cosyvoice使用vllm加速cosyvoice2的推理
A TTS model capable of generating ultra-realistic dialogue in one pass.
hexisyztem / CosyVoice
Forked from FunAudioLLM/CosyVoiceMulti-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Utilizes ONNX Runtime for speech activity detection.
A Conversational Speech Generation Model
Memory for AI Agents; SOTA in AI Agent Memory; Announcing OpenMemory MCP - local and secure memory management.
Pseudo Streaming SenseVoice with Hotwords
API and websocket server for sensevoice. It has inherited some enhanced features, such as VAD detection, real-time streaming recognition, and speaker verification.
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
X-T-E-R / GPT-SoVITS-Inference
Forked from RVC-Boss/GPT-SoVITSInference Specialization
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Python interface to the WebRTC Voice Activity Detector
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
A generative speech model for daily dialogue.
🔊 Text-Prompted Generative Audio Model
Example projects built with the Hume AI APIs
Whisper realtime streaming for long speech-to-text transcription and translation