Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,099 240 Updated May 28, 2025

juanmc2005 / diart

A python package to build AI-powered real-time audio applications

Python 1,320 103 Updated Feb 12, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,415 614 Updated May 27, 2025

YUANZHUO-BNU / metahuman_overview

数字人资料整理

879 104 Updated Jan 8, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 8,998 895 Updated May 21, 2025

microsoft / OmniParser

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 22,368 1,880 Updated Mar 26, 2025

danny-avila / LibreChat

Enhanced ChatGPT Clone: Features Agents, DeepSeek, Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message se…

TypeScript 26,315 4,619 Updated Jun 8, 2025

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,136 721 Updated May 27, 2025

k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…

C++ 6,266 716 Updated Jun 6, 2025

deepseek-ai / DeepSeek-V3

Python 97,485 15,840 Updated Apr 9, 2025

deskflow / deskflow

Share a single keyboard and mouse between multiple computers.

C++ 17,925 4,038 Updated Jun 8, 2025

trigaten / Learn_Prompting

Prompt Engineering, Generative AI, and LLM Guide by Learn Prompting | Join our discord for the largest Prompt Engineering learning community

MDX 4,485 650 Updated Jan 14, 2025

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 9,307 1,591 Updated Jun 8, 2025

comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 79,183 8,745 Updated Jun 8, 2025

Wikidepia / indonesian_datasets

NLP Datasets for Indonesian

Python 116 13 Updated Feb 11, 2023

ufal / whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

Python 2,956 364 Updated Jan 7, 2025

wenet-e2e / WeTextProcessing

Text Normalization & Inverse Text Normalization

Python 592 81 Updated Nov 11, 2024

Rikorose / DeepFilterNet

Noise supression using deep filtering

Python 3,106 290 Updated Oct 17, 2024

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 2,897 229 Updated May 23, 2025

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 21,345 2,630 Updated Jun 3, 2025

3017218062 / Pytorch-Lightning-Learning

Pytorch Lightning入门中文教程，转载请注明来源。（当初是写着玩的，建议看完MNIST这个例子再上手）

Jupyter Notebook 217 19 Updated Dec 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bookong22

Block or report bookong22

Stars

attardi / wikiextractor

microsoft / NeuralSpeech

jishengpeng / WavTokenizer

mozillazg / python-pinyin

fxsjy / jieba

SesameAILabs / csm

EvolvingLMMs-Lab / lmms-eval

MoonshotAI / Kimi-Audio

Spr-Aachen / Easy-Voice-Toolkit

QwenLM / Qwen2.5-Omni