8000 charlesliucn (Qian Liu) / Starred · GitHub

More Web Proxy on the site http://driver.im/

charlesliucn

Follow

Qian Liu charlesliucn

Follow

TTS, ASR

161 followers · 652 following

Tsinghua University
Beijing, China
22:19 (UTC +08:00)

Achievements

Achievements

Stars

ictnlp / SLED-TTS

Streamable Text-to-Speech model using a language modeling approach, without vector quantization

Python 65 5 Updated May 20, 2025

TEN-framework / ten-vad

A Low-Latency, Lightweight and High-Performance Streaming VAD

C 327 25 Updated May 20, 2025

unslothai / unsloth

Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥

Python 38,983 3,059 Updated May 20, 2025

TapXWorld / ChinaTextbook

所有小初高、大学PDF教材。

Roff 26,971 5,699 Updated May 18, 2025

caizexin / GenVC

Self-supervised Generative LM-based Voice Conversion

Python 36 6 Updated Apr 24, 2025

KoljaB / RealtimeVoiceChat

Have a natural, spoken conversation with AI!

Python 2,263 180 Updated May 17, 2025

DanielLin94144 / Full-Duplex-Bench

A benchmark to evaluate full-duplex spoken dialogue models on pause handling, backchanneling, turn-taking, and user interruptions.

Python 30 Updated May 15, 2025

ZeyueT / AudioX

Python 815 86 Updated Apr 30, 2025

maitrix-org / Voila

Python 355 31 Updated May 6, 2025

ictnlp / LLaMA-Omni2

Python 165 19 Updated May 19, 2025

adelacvg / detail_tts

All generative model in one for better TTS model

Python 71 8 Updated Sep 8, 2024

MYZY-AI / Muyan-TTS

Python 379 35 Updated May 19, 2025

MoonshotAI / Kimi-Audio-Evalkit

Python 106 4 Updated Apr 29, 2025

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,627 226 Updated May 19, 2025

nari-labs / dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 16,068 1,268 Updated May 19, 2025

bytedance / MegaTTS3

Python 5,333 375 Updated May 11, 2025

nu-dialogue / moshi-finetune

Fine-tuning Moshi/J-Moshi on your own spoken dialogue data

Python 56 3 Updated Apr 8, 2025

Text-to-Audio / AudioLCM

PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.

Python 1,112 155 Updated May 19, 2025

ASLP-lab / OSUM

OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.

Python 365 25 Updated May 13, 2025

kyutai-labs / moshi-finetune

Python 216 13 Updated Apr 3, 2025

ZhikangNiu / LLaSA_training

Forked from zhenye234/LLaSA_training

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 2 Updated Mar 12, 2025

tuanh123789 / Spark-TTS-finetune

finetune llm part for spark-tts model

Python 70 7 Updated Mar 25, 2025

alvanli / canto-audio-llm

Python 5 1 Updated Jan 10, 2025

lucidrains / MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

Python 642 55 Updated Dec 27, 2024

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and per 5DDD forming real-time speech generation.

Jupyter Notebook 2,971 228 Updated May 19, 2025

NVIDIA / audio-flamingo

PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.

Python 483 27 Updated Apr 29, 2025

kyutai-labs / moshivis

Kyutai with an "eye"

Python 192 25 Updated Mar 26, 2025

canopyai / Orpheus-TTS

Towards Human-Sounding Speech

Python 4,801 384 Updated May 6, 2025

u14app / deep-research

Use any LLMs (Large Language Models) for Deep Research. Support SSE API and MCP server.

JavaScript 2,338 625 Updated May 20, 2025

ajd12342 / paraspeechcaps

Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'

Python 120 4 Updated Mar 24, 2025

0