fengshi-cherish

fengshi-cherish

13 followers · 78 following

Hong Kong University of Science and Technology
Hong Kong

Highlights

Stars

TransDiff / TransDiff

Jupyter Notebook 54 3 Updated Jun 20, 2025

b04901014 / vae-gslm

Official Implementation for the paper: A Variational Framework for Improving Naturalness in Generative Spoken Language Models

Python 14 3 Updated Jun 18, 2025

dreamtheater123 / Awesome-SpeechLM-Survey

Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.

44 Updated Jun 17, 2025

tencent-ailab / SongGeneration

Python 245 13 Updated Jun 21, 2025

FunAudioLLM / CV3-Eval

Python 44 1 Updated Jun 13, 2025

sakemin / stable-audio-cvae

Forked from Stability-AI/stable-audio-tools

Generative models for conditional audio generation

Jupyter Notebook 3 Updated Jun 18, 2025

moiseshorta / music2latent

Forked from SonyCSLParis/music2latent

Encode and decode audio samples to/from compressed latent representations!

Python 1 Updated Jun 18, 2025

lrf23 / MeanFlow-for-Image-Restoration

Python 4 Updated Jun 17, 2025

multimodal-art-projection / LLM4Music

LLM4MA: Large Language Models for Music & Audio (ISMIR 2025 Satellite Workshop)

HTML 1 Updated Jun 9, 2025

hustvl / LightningDiT

[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Python 947 28 Updated Jun 12, 2025

lifeiteng / NotebookTTS

Text-To-Speech for NotebookLM

32 Updated Dec 21, 2024

MTG / discogs-vi-dataset

Discogs-VI dataset and code

Python 12 Updated Dec 13, 2024

xingchensong / S3Tokenizer

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 335 44 Updated Jun 16, 2025

SparkAudio / SparkVox

Python 11 Updated Jun 9, 2025

tanchihpin0517 / PiCoGen

PiCoGen (Piano Cover Generation) is an academic project aimed at developing an automatic piano cover generation system.

32 2 Updated May 31, 2025

JishengBai / AudioSetCaps

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 155 3 Updated Dec 13, 2024

xiquan-li / Awesome-Audio-Generation

Curated list for papers, codes and resources related to Text-to-Audio (TTA) Generation

51 1 Updated Jun 2, 2025

boson-ai / EmergentTTS-Eval-public

Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.

Python 45 4 Updated Jun 5, 2025

FreedomIntelligence / FusionAudio

Towards Fine-grained Audio Captioning with Multimodal Contextual Cues

Python 67 3 Updated Jun 8, 2025

mdeff / fma

FMA: A Dataset For Music Analysis

Jupyter Notebook 2,418 452 Updated Jan 5, 2023

utter-project / mHuBERT-147-scripts

Collection of scripts from mHuBERT-147.

Python 27 1 Updated Nov 19, 2024

WangHelin1997 / CapSpeech-demo

JavaScript 3 1 Updated Jun 5, 2025

yongyizang / music-source-restoration

Official Repository for "Music Source Restoration"

Python 25 1 Updated Jun 1, 2025

IDEA-Emdoor-Lab / DistilCodec

A Neural Audio Codec (NAC) for Universal Audio

Python 37 2 Updated May 30, 2025

resemble-ai / chatterbox

SoTA open-source TTS

Python 8,558 899 Updated Jun 13, 2025

SUC-DriverOld / Apollo-Training

在原始Apollo代码基础上修改了训练集格式以及训练过程 Improve the training set production process and the training process

Python 9 1 Updated May 30, 2025

zhushiyun88 / teaching-boyfriend-llm

398 23 Updated May 8, 2025

woct0rdho / ACE-Step

Forked from ace-step/ACE-Step

Fork of ACE-Step for LoRA training with < 10 GB VRAM

Python 20 4 Updated Jun 13, 2025

WangHelin1997 / SoloSpeech

SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline

Python 213 27 Updated Jun 12, 2025

aleksandrinvictor / flow-matching

Simple reimplementation of Flow Matching for Generative Modeling (https://arxiv.org/abs/2210.02747) paper in PyTorch

Python 12 2 Updated Aug 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly