Stars
Official Implementation of VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention
Latest Advances on System-2 Reasoning
TRO 2022 - QPEP: A C++/MATLAB library for solving generalized quadratic pose estimation problems and related uncertainty description
Official implementation of the Law of Vision Representation in MLLMs
TorchCFM: a Conditional Flow Matching library
Generative models for conditional audio generation
[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Use API to call the music generation AI of suno.ai, and easily integrate it into agents like GPTs.
LP-MusicCaps: LLM-Based Pseudo Music Captioning [ISMIR23]
[CVPR'24] DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
Community list of startups working with AI in audio and music technology
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (imag 76DA e, video, 3D and audio).
FLUX, Stable Diffusion, SDXL, SD3, LoRA, Fine Tuning, DreamBooth, Training, Automatic1111, Forge WebUI, SwarmUI, DeepFake, TTS, Animation, Text To Video, Tutorials, Guides, Lectures, Courses, Comfy…
🔊 Text-Prompted Generative Audio Model
Some Conferences' accepted paper lists (including AI, ML, Robotic)
Implementation of MusicLM, a text to music model published by Google Research, with a few modifications.
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
[AAAI 2024] Follow-Your-Pose: This repo is the official implementation of "Follow-Your-Pose : Pose-Guided Text-to-Video Generation using Pose-Free Videos"