Stars
The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation
Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.
Distillation of Self-Supervised Representation-Based Speech Quality Assessment
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
PengChengStarling is specifically designed for developing multilingual ASR models based on the icefall project, supporting a complete ASR pipeline that includes data processing, model training, inf…
first base model for full-duplex conversational audio
Production First and Production Ready End-to-End Speech Recognition Toolkit
Repository for training models for music source separation.
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
Tips for best practices with filterbanks
Fast and differentiable time domain all-pole filter in PyTorch.
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
Chinese text normalization for speech processing
Keyword spotting and forced alignment in any language
Foundational Models for State-of-the-Art Speech and Text Translation
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
🔊 Text-Prompted Generative Audio Model
Pytorch implementation of LSTM/BERT-CRF for named entity recognition
a lightweight network for monaural speech enhancement