8000 huukim136 (Kim Nguyen) / Starred · GitHub

More Web Proxy on the site http://driver.im/

huukim136

Follow

Kim Nguyen huukim136

Follow

Speech Signal Processing, Speech Synthesis, Voice Conversion, Deep Learning

12 followers · 12 following

Achievements

Achievements

Stars

Shubhamsaboo / awesome-llm-apps

Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

Python 44,294 4,980 Updated Jun 18, 2025

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 14,644 1,538 Updated Jun 12, 2025

resemble-ai / chatterbox

SoTA open-source TTS

Python 8,461 874 Updated Jun 13, 2025

VITA-MLLM / VITA-Audio

✨✨VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Python 582 47 Updated May 24, 2025

daily-co / nimble-pipecat

Voice Agent Framework for Conversational AI

Jupyter Notebook 53 16 Updated May 5, 2025

livekit / agents

A powerful framework for building realtime voice AI agents 🤖🎙️📹

Python 6,356 978 Updated Jun 19, 2025

MYZY-AI / Muyan-TTS

Python 423 40 Updated May 19, 2025

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,813 246 Updated Jun 3, 2025

nari-labs / dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 17,008 1,374 Updated May 28, 2025

VITA-MLLM / LUCY

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

Python 42 3 Updated Apr 14, 2025

OpenBMB / MiniCPM-o

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19,653 1,428 Updated Jun 17, 2025

pipecat-ai / pipecat

Open Source framework for voice and multimodal conversational AI

Python 6,511 953 Updated Jun 19, 2025

SparkAudio / Spark-TTS

Spark-TTS Inference Code

Python 9,832 1,042 Updated Apr 9, 2025

stepfun-ai / Step-Audio

Python 4,355 354 Updated Jun 12, 2025

canopyai / Orpheus-TTS

Towards Human-Sounding Speech

Python 5,045 414 Updated May 6, 2025

SesameAILabs / csm

A Conversational Speech Generation Model

Python 13,552 1,307 Updated May 27, 2025

hexgrad / kokoro

https://hf.co/hexgrad/Kokoro-82M

JavaScript 3,273 357 Updated May 3, 2025

facebookresearch / audiobox-aesthetics

Unified automatic quality assessment for speech, music, and sound.

Python 512 34 Updated Jun 5, 2025

AK391 / ai-gradio

A Python package that makes it easy for developers to create AI apps powered by various AI providers.

Python 1,620 197 Updated Apr 8, 2025

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,443 714 Updated Jun 18, 2025

hkchengrex / MMAudio

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 1,608 180 Updated May 8, 2025

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 36,856 4,002 Updated May 23, 2025

e2b-dev / awesome-ai-agents

A list of AI autonomous agents

18,756 1,436 Updated Feb 26, 2025

ZhangXInFD / SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 572 53 Updated Jun 9, 2024

Significant-Gravitas / AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python 176,257 45,825 Updated Jun 19, 2025

spotify-research / llark

Code for the paper "LLark: A Multimodal Instruction-Following Language Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.

Jupyter Notebook 352 28 Updated May 30, 2024

ga642381 / speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

1,043 63 Updated Jun 14, 2025

huggingface / dataspeech

Python 365 59 Updated Sep 3, 2024

afadil / wealthfolio

A Beautiful Private and Secure Desktop Investment Tracking Application

TypeScript 5,177 272 Updated Jun 17, 2025

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 12,356 1,775 Updated Jun 11, 2025

0