Lists (3)
Sort Name ascending (A-Z)
Stars
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data…
A lightweight data processing framework built on DuckDB and 3FS.
OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.
Minimal reproduction of DeepSeek R1-Zero
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
Instruction Tuning with GPT-4
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.
A high-throughput and memory-efficient inference and serving engine for LLMs
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Awesome speech/audio LLMs, representation learning, and codec models
Speech, Language, Audio, Music Processing with Large Language Model
Multilingual Voice Understanding Model
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
A generative speech model for daily dialogue.
A very simple and easy to understand RISC-V core.
Modeling, training, eval, and inference code for OLMo
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.