10000 liusongxiang (Songxiang Liu) / Starred · GitHub

More Web Proxy on the site http://driver.im/

liusongxiang

Follow

🎯

Focusing

Songxiang Liu liusongxiang

🎯

Focusing

Follow

Work on spoken language processing: General Audio synthesis, TTS, VC, SVS & SVC etc.

369 followers · 103 following

http://liusongxiang.github.io

Achievements

Achievements

Highlights

Pro

Lists (1)

Sort

🚀 My stack

Starred repositories

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 10,062 1,656 Updated Jun 27, 2025

Visual-Agent / DeepEyes

Python 555 23 Updated Jun 23, 2025

HITsz-TMG / Awesome-Large-Multimodal-Reasoning-Models

The development and future prospects of multimodal reasoning models.

399 16 Updated Jun 13, 2025

SynthLabsAI / big-math

A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Python 57 3 Updated Feb 25, 2025

Junhua-Liao / Light-ASD

The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)

Python 143 16 Updated Mar 23, 2025

MoonshotAI / Kimi-Audio-Evalkit

Python 120 5 Updated Apr 29, 2025

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,878 260 Updated Jun 21, 2025

lllyasviel / FramePack

Lets make video diffusion practical!

Python 14,710 1,322 Updated Jun 27, 2025

Anduin2017 / HowToCook

程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).

Dockerfile 90,268 10,295 Updated Jun 25, 2025

F3Set / F3Set

Python 6 1 Updated Oct 2, 2024

vision-x-nyu / thinking-in-space

Official repo and evaluation implementation of VSI-Bench

Python 524 28 Updated Feb 28, 2025

FoundationVision / Infinity

[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 1,345 72 Updated Jun 24, 2025

Xnhyacinth / Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,547 60 Updated Jun 26, 2025

manycore-research / SpatialLM

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Python 3,421 256 Updated Jun 24, 2025

SilentView / GigaTok

[ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"

Python 162 2 Updated Jun 26, 2025

yangdongchao / ALMTokenizer

The demo page for ALMTokenizer

Python 51 3 Updated Apr 14, 2025

tulerfeng / Video-R1

Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]

Python 585 29 Updated May 28, 2025

JoeLeelyf / OVO-Bench

[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Python 66 3 Updated Apr 3, 2025

OpenGVLab / VideoChat-R1

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Python 153 4 Updated Jun 9, 2025

EvolvingLMMs-Lab / lmms-eval

One for All Modalities Evaluation Toolkit - including text, image, video, audio tasks.

Python 2,682 318 Updated Jun 27, 2025

yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader

Python 116,665 9,223 Updated Jun 26, 2025

Victorwz / Open-Qwen2VL

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

Python 229 8 Updated May 17, 2025

lcpmgh / colors

学术期刊配色推荐器

R 423 29 Updated Jan 27, 2025

yaotingwangofficial / Awesome-MCoT

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

668 20 Updated Jun 21, 2025

mbzuai-oryx / Awesome-LLM-Post-training

Awesome Reasoning LLM Tutorial/Survey/Guide

Python 1,799 128 Updated Jun 16, 2025

AGENDD / RWKV-SpeechChat

RWKV-SpeechChat is a real-time dialogue script based on a frozen 3B RWKV model with trained adapters and initial states. Various trained weights can be applied to perform a range of audio tasks, in…

Python 27 1 Updated Jan 1, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 24,903 2,312 Updated Jun 26, 2025

facebookresearch / MovieGenBench

Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen

403 22 Updated Mar 8, 2025

B05901022 / VOCANO

VOCANO: A note transcription framework for singing voice in polyphonic music

Python 68 6 Updated Aug 9, 2021

GeWu-Lab / awesome-audiovisual-learning

A curated list of audio-visual learning methods and datasets.

263 18 Updated Dec 3, 2024

Starred topics

singing-voice

text-to-speech

0