8000 Jiaxin-Ye (Jiaxin Ye) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View Jiaxin-Ye's full-sized avatar
💭
Keep Improving
💭
Keep Improving

Block or report Jiaxin-Ye

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official code implement of Robust Classification via a Single Diffusion Model

Python 83 3 Updated Mar 7, 2025

Official code implement of "Your Diffusion Model is Secretly a Certifiably Robust Classifier"

Python 15 1 Updated Feb 2, 2024

A curated list of Video to Audio Generation

44 2 Updated Apr 15, 2025

Huggingface Implementation of AV-HuBERT on the MuAViC Dataset

Python 8 Updated Mar 6, 2025
Python 381 35 Updated May 6, 2025

Famous Vision Language Models and Their Architectures

Markdown 849 43 Updated Feb 24, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 21,779 1,449 Updated May 29, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 3,707 234 Updated May 29, 2025

MAGI-1: Autoregressive Video Generation at Scale

Python 3,204 181 Updated May 30, 2025

ICML 2024 "From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation"

6 Updated Oct 13, 2024
Python 16 3 Updated May 19, 2025

Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection

Python 28 Updated Apr 7, 2025

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,502 49 Updated May 27, 2025

[CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation

Python 699 20 Updated May 23, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 494 33 Updated May 1, 2025

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

Python 78 4 Updated Nov 9, 2024

The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.

Python 133 2 Updated Apr 14, 2025
C 65 10 Updated Sep 13, 2022

Official Pytorch Implementation of MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation (ICLR 2025)

Python 4 Updated Feb 11, 2025

Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset

Jupyter Notebook 432 52 Updated Jan 10, 2025

Code, Dataset, Samples for the NeurIPS paper “ Tell What You Hear From What You See -- Video to Audio Generation Through Text”

Python 8 Updated May 29, 2025

Official Code for "Rethinking Diffusion Model in High Dimension"

HTML 14 Updated May 20, 2025

A very simple GRPO implement for reproducing r1-like LLM thinking.

Python 1,082 86 Updated Apr 3, 2025

A set of functions for supervised feature learning/classification of mental states from EEG based on "EEG images" idea.

Python 732 222 Updated Jul 2, 2020

Awesome Gesture Generation

197 7 Updated Jan 25, 2025

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Jupyter Notebook 420 54 Updated Aug 27, 2024

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

617 16 Updated May 20, 2025

Pytorch port of Google Research's VGGish model used for extracting audio features.

Python 390 71 Updated Nov 3, 2021

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

137 2 Updated Jun 13, 2024
Next
0