Stars
💬 MaxKB is an open-source AI assistant for enterprise. It seamlessly integrates RAG pipelines, supports robust workflows, and provides MCP tool-use capabilities.
Real time interactive streaming digital human
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction…
Avatars for Zoom, Skype and other video-conferencing apps.
ICASSP 2022: "Text2Video: text-driven talking-head video synthesis with phonetic dictionary".
[ICCV 2023 Oral] Text-to-Image Diffusion Models are Zero-Shot Video Generators
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and …
Multilingual Voice Understanding Model
ModelScope: bring the notion of Model-as-a-Service to life.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
A generative speech model for daily dialogue.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
[AAAI 2025] EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Industry leading face manipulation platform
😍FeHelper--Web前端助手(Awesome!Chrome & Firefox & MS-Edge Extension, All in one Toolbox!)
The UI design language and React library for Conversational UI
Community interface for generative AI