Lists (32)
Sort Name ascending (A-Z)
alg
architecture
audio
awesome
backend
conditioning
diffusion
disentangle
fast_inference
flow
frontend
gan
infra
language
llm
lora
manifold
ml_materials
mlops
MoE
music
nas
neural_ode
optimization
personalization
quantization
Scala
style_transfer
svc
video
vision
web
Starred repositories
An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Open-Sora: Democratizing Efficient Video Production for All
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
anan235 / dia-multilingual
Forked from nari-labs/diaA TTS model capable of generating ultra-realistic dialogue in one pass.
stlohrey / dia-finetuning
Forked from nari-labs/diaA TTS model capable of generating ultra-realistic dialogue in one pass.
An extremely fast Python package and project manager, written in Rust.
Self-host the powerful Dia TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), support for SafeTensors/BF16, voice cloning, dialogue generation, …
High-performance Text-to-Speech server with OpenAI-compatible API, 8 voices, emotion tags, and modern web UI. Optimized for RTX GPUs.
finetune llm part for spark-tts model
Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥
Ideas and demonstrations of named tuples to the max
Documentation that simply works
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
ICLR 2025 - official implementation for "I-Con: A Unifying Framework for Representation Learning"
FULL v0, Cursor, Manus, Same.dev, Lovable, Devin, Replit Agent, Windsurf Agent, VSCode Agent, Dia Browser & Trae AI (And other Open Sourced) System Prompts, Tools & AI Models.
A TTS model capable of generating ultra-realistic dialogue in one pass.
Typesafe HTML templates and static site generator in pure Scala
[ICML 2025] Gaussian Mixture Flow Matching Models (GMFlow)
VietTTS: An Open-Source Vietnamese Text to Speech