Starred repositories
Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report
Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers
A TTS model capable of generating ultra-realistic dialogue in one pass.
欢迎来到 LLM-Dojo,这里是一个开源大模型学习场所,使用简洁且易阅读的代码构建模型训练框架(支持各种主流模型如Qwen、Llama、GLM等等)、RLHF框架(DPO/CPO/KTO/PPO)等各种功能。👩🎓👨🎓
zero-shot voice conversion & singing voice conversion, with real-time support
The open source code for SimpleSpeech series
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
A generative speech model for daily dialogue.
A ggml (C++) re-implementation of tortoise-tts
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
A timeline of the latest AI models for audio generation, starting in 2023!
DLAS - A configuration-driven trainer for generative models
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Versatile audio super resolution (any -> 48kHz) with AudioSR.
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching