Stars
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
real time face swap and one-click video deepfake with only a single image
Unofficial implementation of: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics
Native BM25 Ranking Index in PostgreSQL
Segment Anything in High Quality [NeurIPS 2023]
An arbitrary face-swapping framework on images and videos with one single trained model!
[ICLR 2025] Codebase for "CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation"
Python library for analysing faces using PyTorch
[ICLR 2025] CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) …
PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
Object detection and tracking algorithm implemented for Real-Time video streams and static images.
Collection of public available person re-identification datasets
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
The Swin-UNet is a version of the widely used U-Net architecture that combines the windowed attention mechanism of Swin transfomer with the U-Net framework.
Elucidating the Design Space of Diffusion-Based Generative Models (EDM)
Unofficial Implementation of Animate Anyone
Character Animation (AnimateAnyone, Face Reenactment)
DiffusionFastForward: a free course and experimental framework for diffusion-based generative models
[CVPR 2024 Oral, Best Paper Runner-Up] Code for "pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction" by David Charatan, Sizhe Lester Li, Andrea Tagliasacch…
[Information Fusion (Vol.103, Mar. '24)] Boosting Image Matting with Pretrained Plain Vision Transformers
UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery, ISPRS. Also, including other vision transformers and CNNs for satellite, aerial image …
OpenMMLab Pose Estimation Toolbox and Benchmark.
[NeurIPS 2023] Official Code for "SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation"
[CVPR 2023] Official implementation of the paper "One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer"
ModaNet: A large-scale street fashion dataset with polygon annotations
Polish dictionary files for PostgreSQL Text Search with correct UTF8 encoding and names