open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,318 283 Updated Nov 5, 2024

hubertsiuzdak / snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 593 32 Updated Nov 19, 2024

AbrahamSanders / codec-bpe

Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs

Python 58 7 Updated Apr 11, 2025

svc-develop-team / so-vits-svc

SoftVC VITS Singing Voice Conversion

Python 27,061 4,981 Updated Nov 11, 2023

JeongHun0716 / lmd-vsr

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge (ICCV 2023)

Python 7 Updated Sep 3, 2024

GalaxyCong / HPMDubbing

[CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.

Python 106 8 Updated Jun 21, 2024

GalaxyCong / StyleDubber

[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"

Python 83 3 Updated Nov 14, 2024

Plachtaa / VALL-E-X

An open source implementatio 656E n of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,867 787 Updated Feb 11, 2024

joonson / syncnet_python

Out of time: automated lip sync in the wild

Python 762 170 Updated Jan 23, 2024

v-iashin / Synchformer

Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)

Python 61 6 Updated Feb 6, 2025

lucidrains / autoregressive-diffusion-pytorch

Implementation of Autoregressive Diffusion in Pytorch

Python 381 11 Updated Nov 3, 2024

mhamilton723 / DenseAV

Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Jupyter Notebook 77 12 Updated Jun 12, 2024

roudimit / whisper-flamingo

Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 160 10 Updated May 7, 2025

ms-dot-k / AVSR

PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" (CVPR2023) and "Visual Context-driven Audio Feature Enhan…

Python 17 Updated Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jeongsoo Choi choijeongsoo

Achievements

Achievements

Block or report choijeongsoo

Stars

CodeGoat24 / UnifiedReward

naver-ai / usdm

facebookresearch / blt

LqNoob / Neural-Codec-and-Speech-Language-Models

BytedanceSpeech / seed-tts-eval

VITA-MLLM / Freeze-Omni

facebookresearch / MovieGenBench

facebookresearch / spiritlm

gpt-omni / mini-omni2

WWWWxp / Speech-Tokenizer-Papers

SWivid / F5-TTS

baaivision / Emu3

antgroup / echomimic

FireRedTeam / FireRedTTS

yangdongchao / RSTnet

Plachtaa / seed-vc

gpt-omni / mini-omni