Stars
一个基于可视水印检测识别的数字媒体溯源应用系统,是我的大作业项目,包含这个系统以及一个开源的大规模常见水印图像数据集(Large-scale Common Watermark Dataset, LCWD)。 输入一个带有可视水印的图片或视频,系统会检测定位到水印所在的区域,然后将其提取出来,然后借助百度AI开放平台的OCR和logo识别以及Bing搜索引擎,溯源到这个图片或视频的源头。
Code Implementation of "PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data"
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
Intrinsic Image Diffusion for Single-view Material Estimation
Code for the SIGGRAPH Asia 2023 paper "Intrinsic Harmonization for Illumination-Aware Compositing"
[SIGGRAPH Asia 2024 (Journal Track)] StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
a state-of-the-art-level open visual language model | 多模态预训练模型
[NeurIPS 2024] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
A curated list of papers, code, and resources pertaining to object shadow generation.
A curated list of papers, code and resources pertaining to image composition/compositing or object insertion/addition/compositing, which aims to generate realistic composite image.
freeCodeCamp.org's open-source codebase and curriculum. Learn math, programming, and computer science for free.
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Era of Deep Learning (Awesome & Benchmark)
A set of nodes for ComfyUI that can composite layer and mask to achieve Photoshop like functionality.
pytorch单精度、半精度、混合精度、单卡、多卡(DP / DDP)、FSDP、DeepSpeed模型训练代码,并对比不同方法的训练速度以及GPU内存的使用
Official implementations for paper: Zero-shot Image Editing with Reference Imitation
A curated list of papers, code and resources pertaining to image harmonization.
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Lumina-T2X is a unified framework for Text to Any Modality Generation
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
Implementation of Z. Farbman, R. Fattal, D. Lischinski, and R. Szeliski, 'Edge-Preserving Decompositions for Multi-Scale Tone and Detail Manipulation' (2008)