Stars
A tool for extracting plain text from Wikipedia dumps
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
A Conversational Speech Generation Model
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
可本地部署的AI语音工具箱 | A user-friendly audio toolkit for voice recognition, voice transcription, voice conversion etc.
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
A python package to build AI-powered real-time audio applications
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A simple screen parsing tool towards pure vision based GUI agent
Enhanced ChatGPT Clone: Features Agents, DeepSeek, Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message se…
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
Share a single keyboard and mouse between multiple computers.
Prompt Engineering, Generative AI, and LLM Guide by Learn Prompting | Join our discord for the largest Prompt Engineering learning community
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Whisper realtime streaming for long speech-to-text transcription and translation
Text Normalization & Inverse Text Normalization
Noise supression using deep filtering
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Pytorch Lightning入门中文教程,转载请注明来源。(当初是写着玩的,建议看完MNIST这个例子再上手)