8000 juntengzhang (isaac) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View juntengzhang's full-sized avatar

Block or report juntengzhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Awesome speech/audio LLMs, representation learning, and codec models

1,058 63 Updated Jun 27, 2025
Python 419 40 Updated May 6, 2025

GLM-4-Voice | 端到端中英语音对话模型

Python 2,967 252 Updated Dec 5, 2024
Python 265 31 Updated Apr 11, 2025

Spark-TTS Inference Code

Python 9,938 1,052 Updated Apr 9, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 22,941 2,536 Updated Aug 12, 2024

✨✨Latest Advances on Multimodal Large Language Models

15,719 1,021 Updated Jul 1, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 14,929 1,577 Updated Jun 29, 2025

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 947 112 Updated Aug 7, 2024

A generative speech model for daily dialogue.

Python 37,021 4,012 Updated May 23, 2025

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Python 2,676 213 Updated Jun 19, 2025

Inference and training library for high-quality TTS models.

Python 5,331 569 Updated Dec 10, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 48,351 5,316 Updated Jul 2, 2025

Official code for "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps" (Neurips 2022 Oral)

Python 1,723 129 Updated Feb 6, 2024

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation

Python 921 220 Updated Mar 10, 2024
Python 167 13 Updated Jul 9, 2024

Emu Series: Generative Multimodal Models from BAAI

Python 1,731 85 Updated Sep 27, 2024

SOTA Open Source TTS

Python 22,176 1,818 Updated Jul 2, 2025

Instant voice cloning by MIT and MyShell. Audio foundation model.

Python 32,846 3,450 Updated Apr 19, 2025

Pytorch0.4.1 codes for InsightFace

Jupyter Notebook 1,838 426 Updated Nov 22, 2022

ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'

Python 396 82 Updated Oct 23, 2023

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 5,825 581 Updated Aug 10, 2024

The official implementation of HierSpeech++

Python 1,223 151 Updated Feb 20, 2024

Text-to-Audio/Music Generation

Python 2,457 200 Updated Sep 29, 2024

Diffusion model papers, survey, and taxonomy

3,206 266 Updated Jun 13, 2025

Production First and Production Ready End-to-End Speech Recognition Toolkit

Python 4,652 1,142 Updated Jun 11, 2025

Denoising Diffusion Probabilistic Models

Python 4,516 426 Updated Aug 29, 2023

🤖 Assemble, configure, and deploy autonomous AI Agents in your browser.

TypeScript 34,462 9,449 Updated Apr 29, 2025
Next
0