Stars
A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions (Interspeech 2025)
A fast and lightweight framework for creating decentralized agents with ease.
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
✨✨Latest Advances on Multimodal Large Language Models
Voice Activity Detector(VAD) from TEN: low-latency, high-performance and lightweight
A lightweight Python package for managing multi-agent orchestration. Easily define agents with custom instructions, tools, containers, and models, and orchestrate their interactions seamlessly. Per…
Awesome multilingual OCR and Document Parsing toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools,…
Like Manus, Computer Use Agent(CUA) and Omniparser, we are computer-using agents.AI-driven local automation assistant that uses natural language to make computers work by themselves
Utilizes ONNX Runtime for audio denoising.
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
A WebUI app for Music-Source-Separation-Training and we packed UVR together!
adaptive acoustic feedback cancellation, howling suppression, AI noise reduction, low latency
Python library for extracting chords from multiple sound file formats
A simple screen parsing tool towards pure vision based GUI agent
🚀 Efficient implementations of state-of-the-art linear attention models
Implementation of the proposed minGRU in Pytorch
Official inference framework for 1-bit LLMs
This is the official implementation of the LiSenNet
Apply Score diffusion to improve speech signals recorded under various adverse conditions and distortions, including noise, reverberation, clipping, equalization (EQ) distortion, packet loss, codec…
Port of Funasr's Sense-voice model in C/C++
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
该代码与B站上的视频 https://www.bilibili.com/video/BV18L41197Uz/?spm_id_from=333.788&vd_source=eefa4b6e337f16d87d87c2c357db8ca7 相关联。
This is the code and dataset repo for Interspeech 2024 paper "Target conversation extraction: Source separation using turn-taking dynamics"