Stars
[ICLR 2025 Spotlight] Official Implementation for ToST (Token Statistics Transformer)
In-car multi-channel speech transcription system of AISHELL-5.
This is the official implementation of the LiSenNet
This project uses a variety of advanced voiceprint recognition models such as EcapaTdnn, ResNetSE, ERes2Net, CAM++, etc. It is not excluded that more models will be supported in the future. At the …
Official PyTorch Implementation of CleanUNet (ICASSP 2022)
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
For Assignment #3, End-to-End Automatic Speech Recognition in 11-737: Multilingual Natural Language Processing, S22 @ CMU.
百聆 是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,集成DeepSeek R1等优秀大模型,时延低至800ms,Mac等低配置也可运行,支持打断
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.