Highlights
- Pro
Starred repositories
A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.
Paper List of Inference/Test Time Scaling/Computing
A minimal and universal controller for FLUX.1.
🚀🚀🚀A curated list of papers on controllable video generation.
Collection of Summer 2025 tech internships!
GenStereo: Towards Open-World Generation of Stereo Images and Unsupervised Matching
FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting (IJCAI-2024)
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
Machine Learning and Computer Vision Engineer - Technical Interview Questions
Official repo and evaluation implementation of VSI-Bench
Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation
A curated list of recent diffusion models for video generation, editing, and various other applications.
[ICLR'25] SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
Awesome Instruction Editing. Image and Media Editing with Human Instructions. Instruction-Guided Image and Media Editing.
The official implementation of VLPL: Vision Language Pseudo Label for Multi-label Learning with Single Positive Labels
An expert benchmark aiming to comprehensively evaluate the aesthetic perception capacities of MLLMs.
[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models
collection of diffusion model papers categorized by their subareas
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
The official implementation of the paper: Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs (ICCV 2023)
Official Pytorch Implementation of DenseDiffusion (ICCV 2023)
Directed Diffusion: Direct Control of Object Placement through Attention Guidance (AAAI2024)
G2LP-Net: Global to Local Progressive Video Inpainting Network Dataset
[ICCV 2023] VPD is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.