-
Hong Kong University of Science and Technology
- Hong Kong
Highlights
- Pro
Stars
Official Implementation for the paper: A Variational Framework for Improving Naturalness in Generative Spoken Language Models
Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.
Generative models for conditional audio generation
moiseshorta / music2latent
Forked from SonyCSLParis/music2latentEncode and decode audio samples to/from compressed latent representations!
LLM4MA: Large Language Models for Music & Audio (ISMIR 2025 Satellite Workshop)
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
PiCoGen (Piano Cover Generation) is an academic project aimed at developing an automatic piano cover generation system.
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
Curated list for papers, codes and resources related to Text-to-Audio (TTA) Generation
Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.
Towards Fine-grained Audio Captioning with Multimodal Contextual Cues
Collection of scripts from mHuBERT-147.
Official Repository for "Music Source Restoration"
A Neural Audio Codec (NAC) for Universal Audio
在原始Apollo代码基础上修改了训练集格式以及训练过程 Improve the training set production process and the training process
woct0rdho / ACE-Step
Forked from ace-step/ACE-StepFork of ACE-Step for LoRA training with < 10 GB VRAM
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
Simple reimplementation of Flow Matching for Generative Modeling (https://arxiv.org/abs/2210.02747) paper in PyTorch