juntengzhang

isaac juntengzhang

1 follower · 0 following

Stars

ga642381 / speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

1,058 63 Updated Jun 27, 2025

maitrix-org / Voila

Python 419 40 Updated May 6, 2025

THUDM / GLM-4-Voice

GLM-4-Voice | 端到端中英语音对话模型

Python 2,967 252 Updated Dec 5, 2024

jzq2000 / MoonCast

Python 265 31 Updated Apr 11, 2025

SparkAudio / Spark-TTS

Spark-TTS Inference Code

Python 9,938 1,052 Updated Apr 9, 2025

deepseek-ai / DeepSeek-R1

90,368 11,659 Updated Jun 27, 2025

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 22,941 2,536 Updated Aug 12, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

15,719 1,021 Updated Jul 1, 2025

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 14,929 1,577 Updated Jun 29, 2025

gemelo-ai / vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 947 112 Updated Aug 7, 2024

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 37,021 4,012 Updated May 23, 2025

webdataset / webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Python 2,676 213 Updated Jun 19, 2025

huggingface / parler-tts

Inference and training library for high-quality TTS models.

Python 5,331 569 Updated Dec 10, 2024

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 48,351 5,316 Updated Jul 2, 2025

LuChengTHU / dpm-solver

Official code for "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps" (Neurips 2022 Oral)

Python 1,723 129 Updated Feb 6, 2024

declare-lab / MELD

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation

Python 921 220 Updated Mar 10, 2024

thuhcsi / SECap

Python 167 13 Updated Jul 9, 2024

baaivision / Emu

Emu Series: Generative Multimodal Models from BAAI

Python 1,731 85 Updated Sep 27, 2024

fishaudio / fish-speech

SOTA Open Source TTS

Python 22,176 1,818 Updated Jul 2, 2025

myshell-ai / OpenVoice

Instant voice cloning by MIT and MyShell. Audio foundation model.

Python 32,846 3,450 Updated Apr 19, 2025

TreB1eN / InsightFace_Pytorch

Pytorch0.4.1 codes for InsightFace

Jupyter Notebook 1,838 426 Updated Nov 22, 2022

TaoRuijie / TalkNet-ASD

ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'

Python 396 82 Updated Oct 23, 2023

yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 5,825 581 Updated Aug 10, 2024

sh-lee-prml / HierSpeechpp

The official implementation of HierSpeech++

Python 1,223 151 Updated Feb 20, 2024

haoheliu / AudioLDM2

Text-to-Audio/Music Generation

Python 2,457 200 Updated Sep 29, 2024

chq1155 / A-Survey-on-Generative-Diffusion-Model

960 59 Updated Oct 18, 2023

YangLing0818 / Diffusion-Models-Papers-Survey-Taxonomy

Diffusion model papers, survey, and taxonomy

3,206 266 Updated Jun 13, 2025

wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Python 4,652 1,142 Updated Jun 11, 2025

hojonathanho / diffusion

Denoising Diffusion Probabilistic Models

Python 4,516 426 Updated Aug 29, 2023

reworkd / AgentGPT

🤖 Assemble, configure, and deploy autonomous AI Agents in your browser.

TypeScript 34,462 9,449 Updated Apr 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly