-
South China University of Technology
Stars
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Toward Generalizing Visual Brain Decoding to Unseen Subjects
Self-supervised learning techniques for neuroimaging data inspired by prominent learning frameworks in natural language processing + One of the broadest neuroimaging datasets used for pre-training …
Implementation of Domain Specific Denoising Diffusion Probabilistic Models for Brain Dynamics/EEG Signals
High-performance Image Tokenizers for VAR and AR
[ICLR 2025] Rectified Diffusion: Straightness Is Not Your Need
Exploration of diffuison-based generative model to sychronizing brain dynamics from semantic language input.
PyTorch implementation of "Brain Decodes Deep Nets"
A collection of literature after or concurrent with Masked Autoencoder (MAE) (Kaiming He el al.).
Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders
Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models (NeurIPS 2023 Oral)
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
✨✨Latest Advances on Multimodal Large Language Models
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language Generation from Brain Recordings
THINGS-data: A multimodal collection of large-scale datasets for investigating object representations in brain and behavior
collection of awesome research in brain decoding, including interaction with multi-modalities, theories, and foundation models.
[ICLR 2025] Diffusion Feedback Helps CLIP See Better
[Embodied-AI-Survey-2025] Paper List and Resource Repository for Embodied AI
A Survey on Vision-Language Geo-Foundation Models (VLGFMs)
Brain-Conditional Multimodal Synthesis: A Survey and Taxonomy
Official implementation of SEED-LLaMA (ICLR 2024).
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models