Stars
[ICASSP 2025] FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion
zero-shot voice conversion & singing voice conversion, with real-time support
liujing04/Retrieval-based-Voice-Conversion-WebUI reconstruction project
ouor / so-vits-svc-5.0
Forked from PlayVoice/whisper-vits-svcCore Engine of Singing Voice Conversion & Singing Voice Clone
[IPMI'23] Diffusion Model based Semi-supervised Learning on Brain Hemorrhage Images for Efficient Midline Shift Quantification
Full code for the paper "Incorporating Task-Specific Structural Knowledge into CNNs for Brain Midline Shift Detection"
Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"
[ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & RFT & Dynamic Sampling & Async Agent RL)
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Robust Speech Recognition via Large-Scale Weak Supervision
Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
ASVtorch Toolkit: Speaker Verification with Deep-Neural Networks. To cite this software publication: https://www.sciencedirect.com/science/article/pii/S235271102100042X
So-VITS-SVC 本地部署使用帮助文档,提供Colab笔记本 So-VITS-SVC Local Deployment Document and provide Colab notebook
SoftVC VITS Singing Voice Conversion
[CSUR] A Survey on Video Diffusion Models
A curated list of recent diffusion models for video generation, editing, and various other applications.
DeepMind's Tacotron-2 Tensorflow implementation
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Foundational model for human-like, expressive TTS
Zero-Shot Speech Editing and Text-to-Speech in the Wild
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)