Stars
A repo that builds text to music datasets from scratch
国家中小学智慧教育平台 电子课本下载工具,帮助您从智慧教育平台中获取电子课本的 PDF 文件网址并进行下载,让您更方便地获取课本内容。
Python implementation for audio time-frequency automatic gain control
Deep Learning for Person Re-identification: A Survey and Outlook
The official repository for ICLR2025 paper "HiLo: A Learning Framework for Generalized Category Discovery Robust to Domain Shifts"
ACE-Step: A Step Towards Music Generation Foundation Model
Pytorch implementation of SoundCTM-DiT
This repository aims to collect Transformer-based sound event detection (SED) algorithms.
Source code for Consistent ensemble distillation for audio tagging
Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.
Unified automatic quality assessment for speech, music, and sound.
🎧 Hybrid music recommendation with graph neural networks.
(WWW'24 + LinkedIn) The first RS that tightly combines LLM with ID-based RS
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Utilizes ONNX Runtime for audio denoising.
NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms
automatic audio labelling with laion-clap
It includes papers on speech&audio field. Now update: ICLR2025-2023, ICML2025-2023, NeurIPS2024-2023, ACMMM2024, AAAI2025-2024, ACL2024, EMNLP2024, NAACL2025, IJCAI2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.
Code for the paper "Songs Across Borders: Singable and Controllable Neural Lyric Translation"
Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion