Stars
🚀 [ICLR 2025] Pytorch implementation of 'Fast Feedforward 3D Gaussian Splatting Compression'
The code for the paper "Reducing the Memory Footprint of 3D Gaussian Splatting"
Official inference repo for FLUX.1 models
Personalize Anything for Free with Diffusion Transformer
[ICCV 2025] OminiControl: Minimal and Universal Control for Diffusion Transformer
[ICCV 2025] 🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
DreamO: A Unified Framework for Image Customization
Official implementation of "XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation".
Official Implementation of "Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function" [NeurIPS 2024]
[NeurIPS 2024] Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Solve Visual Understanding with Reinforced VLMs
📈 目前最大的工业缺陷检测数据库及论文集 Constantly summarizing open source dataset and critical papers in the field of surface defect research which are of great importance.
Two neural network models built based on ConvNeXT and DenseNet, respectively, for the BIRADS six-class classification and feature recognition tasks, along with the data processing and training code
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
Matplotlib中文教程,在线阅读地址:https://datawhalechina.github.io/fantastic-matplotlib/
Code release for SLIP Self-supervision meets Language-Image Pre-training
《Pytorch实用教程》(第二版)无论是零基础入门,还是CV、NLP、LLM项目应用,或是进阶工程化部署落地,在这里都有。相信在本书的帮助下,读者将能够轻松掌握 PyTorch 的使用,成为一名优秀的深度学习工程师。
AlignCLIP: Improving Cross-Modal Alignment in CLIP (ICLR 2025)
An open source implementation of CLIP.
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Janus-Series: Unified Multimodal Understanding and Generation Models
[ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.