8000 charlesliucn (Qian Liu) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View charlesliucn's full-sized avatar
  • Tsinghua University
  • Beijing, China
  • 22:19 (UTC +08:00)

Block or report charlesliucn

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Streamable Text-to-Speech model using a language modeling approach, without vector quantization

Python 65 5 Updated May 20, 2025

A Low-Latency, Lightweight and High-Performance Streaming VAD

C 327 25 Updated May 20, 2025

Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥

Python 38,983 3,059 Updated May 20, 2025

所有小初高、大学PDF教材。

Roff 26,971 5,699 Updated May 18, 2025

Self-supervised Generative LM-based Voice Conversion

Python 36 6 Updated Apr 24, 2025

Have a natural, spoken conversation with AI!

Python 2,263 180 Updated May 17, 2025

A benchmark to evaluate full-duplex spoken dialogue models on pause handling, backchanneling, turn-taking, and user interruptions.

Python 30 Updated May 15, 2025
Python 815 86 Updated Apr 30, 2025
Python 355 31 Updated May 6, 2025
Python 165 19 Updated May 19, 2025

All generative model in one for better TTS model

Python 71 8 Updated Sep 8, 2024
Python 379 35 Updated May 19, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,627 226 Updated May 19, 2025

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 16,068 1,268 Updated May 19, 2025
Python 5,333 375 Updated May 11, 2025

Fine-tuning Moshi/J-Moshi on your own spoken dialogue data

Python 56 3 Updated Apr 8, 2025

PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.

Python 1,112 155 Updated May 19, 2025

OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.

Python 365 25 Updated May 13, 2025

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 2 Updated Mar 12, 2025

finetune llm part for spark-tts model

Python 70 7 Updated Mar 25, 2025
Python 5 1 Updated Jan 10, 2025

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

Python 642 55 Updated Dec 27, 2024

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and per 5DDD forming real-time speech generation.

Jupyter Notebook 2,971 228 Updated May 19, 2025

PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.

Python 483 27 Updated Apr 29, 2025

Kyutai with an "eye"

Python 192 25 Updated Mar 26, 2025

Towards Human-Sounding Speech

Python 4,801 384 Updated May 6, 2025

Use any LLMs (Large Language Models) for Deep Research. Support SSE API and MCP server.

JavaScript 2,338 625 Updated May 20, 2025

Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'

Python 120 4 Updated Mar 24, 2025
Next
0