8000 Jiaxin-Ye (Jiaxin Ye) / Starred · GitHub

More Web Proxy on the site http://driver.im/

Jiaxin-Ye

Follow

💭

Keep Improving

Jiaxin Ye Jiaxin-Ye

💭

Keep Improving

Follow

A third-year Ph.D. student at Fudan University.

46 followers · 26 following

Shanghai, China
https://jiaxin-ye.github.io/

Achievements

Achievements

Lists (8)

Sort

Affective Computing 🤓

14 repositories

AIGC 🫨

27 repositories

Diffusion-based Method 🫡

18 repositories

FaceTTS 😊🎙️

10 repositories

Mamba🐍

Speech Generation 🎤

Talking Head Generation 🤖️

40 repositories

Toolkit 👍

30 repositories

Stars

huanranchen / DiffusionClassifier

Official code implement of Robust Classification via a Single Diffusion Model

Python 83 3 Updated Mar 7, 2025

huanranchen / NoisedDiffusionClassifiers

Official code implement of "Your Diffusion Model is Secretly a Certifiably Robust Classifier"

Python 15 1 Updated Feb 2, 2024

01Zhangbw / Speech-and-audio-papers-Top-Conference

63 1 Updated May 25, 2025

Xiaohao-Liu / Awesome-Vison2Audio

A curated list of Video to Audio Generation

44 2 Updated Apr 15, 2025

nguyenvulebinh / AV-HuBERT-S2S

Huggingface Implementation of AV-HuBERT on the MuAViC Dataset

Python 8 Updated Mar 6, 2025

maitrix-org / Voila

Python 381 35 Updated May 6, 2025

gokayfem / awesome-vlm-architectures

Famous Vision Language Models and Their Architectures

Markdown 849 43 Updated Feb 24, 2025

QwenLM / Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 21,779 1,449 Updated May 29, 2025

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,707 234 Updated May 29, 2025

SandAI-org / MAGI-1

MAGI-1: Autoregressive Video Generation at Scale

Python 3,204 181 Updated May 30, 2025

DragonLiu1995 / Vision-to-Audio-and-Beyond

ICML 2024 "From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation"

6 Updated Oct 13, 2024

Google-Health / hear

Python 16 3 Updated May 19, 2025

jacklishufan / Reflect-DiT

Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection

Python 28 Updated Apr 7, 2025

Xnhyacinth / Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,502 49 Updated May 27, 2025

ZiyuGuo99 / Image-Generation-CoT

[CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation

Python 699 20 Updated May 23, 2025

facebookresearch / audiobox-aesthetics

Unified automatic quality assessment for speech, music, and sound.

Python 494 33 Updated May 1, 2025

thuhcsi / VoxInstruct

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

Python 78 4 Updated Nov 9, 2024

thuhcsi / SpeechCraft

The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.

Python 133 2 Updated Apr 14, 2025

EGO4D / audio-visual

C 65 10 Updated Sep 13, 2022

triton99 / MDSGen

Official Pytorch Implementation of MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation (ICLR 2025)

Python 4 Updated Feb 11, 2025

facebookresearch / Ego4d

Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset

Jupyter Notebook 432 52 Updated Jan 10, 2025

DragonLiu1995 / multimodal-llm-for-audio-gen

Code, Dataset, Samples for the NeurIPS paper “ Tell What You Hear From What You See -- Video to Audio Generation Through Text”

Python 8 Updated May 29, 2025

blairstar / NaturalDiffusion

Official Code for "Rethinking Diffusion Model in High Dimension"

HTML 14 Updated May 20, 2025

lsdefine / simple_GRPO

A very simple GRPO implement for reproducing r1-like LLM thinking.

Python 1,082 86 Updated Apr 3, 2025

pbashivan / EEGLearn

A set of functions for supervised feature learning/classification of mental states from EEG based on "EEG images" idea.

Python 732 222 Updated Jul 2, 2020

openhuman-ai / awesome-gesture_generation

Awesome Gesture Generation

197 7 Updated Jan 25, 2025

shashikg / WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Jupyter Notebook 420 54 Updated Aug 27, 2024

yaotingwangofficial / Awesome-MCoT

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

617 16 Updated May 20, 2025

harritaylor / torchvggish

Pytorch port of Google Research's VGGish model used for extracting audio features.

Python 390 71 Updated Nov 3, 2021

line / LibriTTS-P

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

137 2 Updated Jun 13, 2024

0