Stars
Pseudo Streaming SenseVoice with Hotwords
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
Deep Neural Network for Speaker Count Estimation
Whisper based Japanese subtitle generator
Unofficial PyTorch implementation of Google AI's VoiceFilter system
The implementation of "X-TF-GridNet: A Time-Frequency Domain Target Speaker Extraction Network with Adaptive Speaker Embedding Fusion", which is accepted by Information Fusion.
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
The official implementation of GTCRN, an ultra-lightweight SE model.
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & RFT & Dynamic Sampling & Async Agent RL)
verl: Volcano Engine Reinforcement Learning for LLMs
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Train transformer language models with reinforcement learning.
中文nlp解决方案(大模型、数据、模型、训练、推理)
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
Minimal reproduction of DeepSeek R1-Zero
Fully open reproduction of DeepSeek-R1
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction…
百聆 是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,集成DeepSeek R1等优秀大模型,时延低至800ms,Mac等低配置也可运行,支持打断
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Real time interactive streaming digital human
Sky-T1: Train your own O1 preview model within $450
Scalable RL solution for advanced reasoning of language models
Deep Reasoning Translation via Reinforcement Learning (arXiv preprint 2025); DRT: Deep Reasoning Translation via Long Chain-of-Thought (arXiv preprint 2024)