robin1001

Binbin Zhang robin1001

Speech Engineer

388 followers · 34 following

https://robin1001.github.io/

Achievements

x3 x3

Achievements

x3 x3

Lists (3)

Sort

🔮 Future ideas

✨ Inspiration

🚀 My stack

Stars

deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data…

Python 20,625 2,160 Updated May 9, 2025

deepseek-ai / smallpond

A lightweight data processing framework built on DuckDB and 3FS.

Python 4,613 410 Updated Mar 5, 2025

SparkAudio / Spark-TTS

Spark-TTS Inference Code

Python 9,188 957 Updated Apr 9, 2025

ASLP-lab / OSUM

OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.

Python 362 24 Updated Apr 16, 2025

Jiayi-Pan / TinyZero

Minimal reproduction of DeepSeek R1-Zero

Python 11,721 1,483 Updated Apr 24, 2025

Unakar / Logic-RL

Reproduce R1 Zero on Logic Puzzle

Python 2,331 154 Updated Mar 20, 2025

FireRedTeam / FireRedASR

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 952 73 Updated Mar 27, 2025

Instruction-Tuning-with-GPT-4 / GPT-4-LLM

Instruction Tuning with GPT-4

HTML 4,301 306 Updated Jun 11, 2023

xingchensong / S3Tokenizer

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 292 37 Updated Jan 15, 2025

jishengpeng / WavChat

A Survey of Spoken Dialogue Models (60 pages)

294 16 Updated Nov 28, 2024

Tencent / ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

C++ 21,432 4,248 Updated May 9, 2025

pengzhendong / g2p-mix

Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.

Python 96 12 Updated Mar 20, 2025

mozillazg / python-pinyin

汉字转拼音(pypinyin)

Python 5,058 624 Updated Mar 30, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 46,991 7,328 Updated May 10, 2025

wenet-e2e / wesep

Target Speaker Extraction Toolkit

Python 165 16 Updated Apr 7, 2025

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 5,742 549 Updated Mar 24, 2025

ga642381 / speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

983 59 Updated Apr 25, 2025

X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Python 802 76 Updated Apr 24, 2025

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

Python 5,573 496 Updated Mar 23, 2025

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 13,681 1,391 Updated May 6, 2025

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 36,125 3,915 Updated May 6, 2025

Tele-AI / TeleSpeech-ASR

Python 692 63 Updated Jun 7, 2024

AdolfVonKleist / Phonetisaurus

Phonetisaurus G2P

Shell 473 123 Updated Jun 1, 2024

liangkangnan / tinyriscv

A very simple and easy to understand RISC-V core.

C 1,227 212 Updated Nov 9, 2023

allenai / OLMo

Modeling, training, eval, and inference code for OLMo

Python 5,583 604 Updated May 6, 2025

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 46,075 5,070 Updated Apr 25, 2025

hhguo / MSMC-TTS

Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS

Python 163 17 Updated Apr 10, 2024

fishaudio / Bert-VITS2

vits2 backbone with multilingual-bert

Python 8,407 1,191 Updated May 5, 2025

esbatmop / MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,846 272 Updated Apr 13, 2025

modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 10,344 1,039 Updated May 8, 2025