Starred repositories
AudioTrust: Benchmarking the Multi-faceted Trustworthiness of Audio Large Language Models
Speech, Language, Audio, Music Processing with Large Language Model
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
The simplest, fastest repository for training/finetuning small-sized VLMs.
AI Manus is a general-purpose AI Agent system that supports running various tools and operations in a sandbox environment.
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
The official implementation of "2025ICLR Dynamic Diffusion Transformer" and "2025ArXivDyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation".
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
A TTS model capable of generating ultra-realistic dialogue in one pass.
Janus-Series: Unified Multimodal Understanding and Generation Models
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS…
SkyReels-A2: Compose anything in video diffusion transformers
SkyReels-V2: Infinite-length Film Generative model
MediaTek's TFLite delegate
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
Qwen1.5-SFT(阿里, Ali), Qwen_Qwen1.5-2B-Chat/Qwen_Qwen1.5-7B-Chat微调(transformers)/LORA(peft)/推理
cogmhear / avse_challenge
Forked from claritychallenge/clarityCOG-MHEAR Audio-Visual Speech Enhancement Challenge
🚀 阿里通义千问2.5大模型逆向API【特长:六边形战士】,支持高速流式输出、无水印AI绘图、长文档解读、图像解析、联网检索、多轮对话,零配置部署,多路token支持,自动清理会话痕迹,仅供测试,如需商用请前往官方开放平台。