10000 choijeongsoo (Jeongsoo Choi) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View choijeongsoo's full-sized avatar

Block or report choijeongsoo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official implementation of UnifiedReward & UnifiedReward-Think

Python 360 9 Updated May 15, 2025

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 87 3 Updated Dec 3, 2024

Code for BLT research paper

Python 1,642 132 Updated May 15, 2025

Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models

Python 159 11 Updated May 16, 2025

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Python 316 20 Updated Jan 2, 2025

Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen

383 22 Updated Mar 8, 2025

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 902 59 Updated Oct 28, 2024

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,747 194 Updated Jan 16, 2025

This repository collects papers related to Speech Tokenizer.

16 1 Updated Oct 16, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 11,897 1,681 Updated May 11, 2025

Next-Token Prediction is All You Need

Python 2,120 80 Updated Mar 17, 2025

[AAAI 2025] EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Python 3,853 426 Updated Dec 10, 2024

An Open-Sourced LLM-empowered Foundation TTS System

Python 699 56 Updated Apr 15, 2025

Real-time Speech-Text Foundation Model Toolkit (wip)

Python 228 20 Updated Mar 26, 2025

zero-shot voice conversion & singing voice conversion, with real-time support

Python 2,459 280 Updated Apr 20, 2025

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,318 283 Updated Nov 5, 2024

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 593 32 Updated Nov 19, 2024

Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs

Python 58 7 Updated Apr 11, 2025

SoftVC VITS Singing Voice Conversion

Python 27,061 4,981 Updated Nov 11, 2023

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge (ICCV 2023)

Python 7 Updated Sep 3, 2024

[CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.

Python 106 8 Updated Jun 21, 2024

[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"

Python 83 3 Updated Nov 14, 2024

An open source implementatio 656E n of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,867 787 Updated Feb 11, 2024

Out of time: automated lip sync in the wild

Python 762 170 Updated Jan 23, 2024

Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)

Python 61 6 Updated Feb 6, 2025

Implementation of Autoregressive Diffusion in Pytorch

Python 381 11 Updated Nov 3, 2024

Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Jupyter Notebook 77 12 Updated Jun 12, 2024

Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 160 10 Updated May 7, 2025

PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" (CVPR2023) and "Visual Context-driven Audio Feature Enhan…

Python 17 Updated Apr 3, 2024
Next
0