8000 huukim136 (Kim Nguyen) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View huukim136's full-sized avatar

Block or report huukim136

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

Python 44,294 4,980 Updated Jun 18, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 14,644 1,538 Updated Jun 12, 2025

SoTA open-source TTS

Python 8,461 874 Updated Jun 13, 2025

✨✨VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Python 582 47 Updated May 24, 2025

Voice Agent Framework for Conversational AI

Jupyter Notebook 53 16 Updated May 5, 2025

A powerful framework for building realtime voice AI agents 🤖🎙️📹

Python 6,356 978 Updated Jun 19, 2025
Python 423 40 Updated May 19, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,813 246 Updated Jun 3, 2025

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 17,008 1,374 Updated May 28, 2025

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

Python 42 3 Updated Apr 14, 2025

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19,653 1,428 Updated Jun 17, 2025

Open Source framework for voice and multimodal conversational AI

Python 6,511 953 Updated Jun 19, 2025

Spark-TTS Inference Code

Python 9,832 1,042 Updated Apr 9, 2025
Python 4,355 354 Updated Jun 12, 2025

Towards Human-Sounding Speech

Python 5,045 414 Updated May 6, 2025

A Conversational Speech Generation Model

Python 13,552 1,307 Updated May 27, 2025

https://hf.co/hexgrad/Kokoro-82M

JavaScript 3,273 357 Updated May 3, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 512 34 Updated Jun 5, 2025

A Python package that makes it easy for developers to create AI apps powered by various AI providers.

Python 1,620 197 Updated Apr 8, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,443 714 Updated Jun 18, 2025

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 1,608 180 Updated May 8, 2025

A generative speech model for daily dialogue.

Python 36,856 4,002 Updated May 23, 2025

A list of AI autonomous agents

18,756 1,436 Updated Feb 26, 2025

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 572 53 Updated Jun 9, 2024

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python 176,257 45,825 Updated Jun 19, 2025

Code for the paper "LLark: A Multimodal Instruction-Following Language Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.

Jupyter Notebook 352 28 Updated May 30, 2024

Awesome speech/audio LLMs, representation learning, and codec models

1,043 63 Updated Jun 14, 2025
Python 365 59 Updated Sep 3, 2024

A Beautiful Private and Secure Desktop Investment Tracking Application

TypeScript 5,177 272 Updated Jun 17, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 12,356 1,775 Updated Jun 11, 2025
Next
0