8000 GitHub - hxtruong6/gigagan-pytorch: Implementation of GigaGAN, new SOTA GAN out of Adobe
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

hxtruong6/gigagan-pytorch

 
 

Repository files navigation

GigaGAN - Pytorch (wip)

Implementation of GigaGAN (project page), new SOTA GAN out of Adobe.

I will also add a few findings from lightweight gan, for faster convergence (skip layer excitation), better stability (reconstruction auxiliary loss in discriminator), as well as improved results (GLU in generator).

It will also contain the code for the 1k - 4k upsamplers, which I find to be the highlight of this paper.

Please join Join us on Discord if you are interested in helping out with the replication with the LAION community

Appreciation

  • StabilityAI for the sponsorship, as well as my other sponsors, for affording me the independence to open source artificial intelligence.

  • 🤗 Huggingface for their accelerate library

  • All the maintainers at OpenClip, for their SOTA open sourced contrastive learning text-image models

  • Xavier for reviewing the discriminator code and pointing out that the scale invariance was not correctly built!

Usage

Simple unconditional GAN, for starters

import torch

from gigagan_pytorch import (
    GigaGAN,
    ImageDataset
)

gan = GigaGAN(
    generator = dict(
        dim = 64,
        style_network = dict(
            dim = 64,
            depth = 4
        ),
        image_size = 256,
        dim_max = 512,
        use_glu = True,
        num_skip_layers_excite = 4,
        unconditional = True
    ),
    discriminator = dict(
        dim = 64,
        dim_max = 512,
        image_size = 256,
        use_glu = True,
        num_skip_layers_excite = 4,
        unconditional = True
    )
).cuda()

# dataset

dataset = ImageDataset(
    folder = '/path/to/your/data',
    image_size = 256
)

dataloader = dataset.get_dataloader(batch_size = 1)

# training the discriminator and generator alternating
# for 100 steps in this example, batch size 1, gradient accumulated 8 times

gan(
    dataloader = dataloader,
    steps = 100,
    grad_accum_every = 8
)

For unconditional Unet Upsampler

import torch
from gigagan_pytorch import GigaGAN, ImageDataset

gan = GigaGAN(
    upsampler_generator = True,     # set this to True
    generator = dict(
        style_network = dict(
            dim = 64,
            depth = 4
        ),
        dim = 64,
        image_size = 256,
        input_image_size = 128,
        unconditional = True
    ),
    discriminator = dict(
        dim = 64,
        dim_max = 512,
        image_size = 256,
        use_glu = True,
        num_skip_layers_excite = 4,
        unconditional = True
    )
).cuda()

dataset = ImageDataset(
    folder = '/home/phil/dl/data/flowers',
    image_size = 256
)

dataloader = dataset.get_dataloader(batch_size = 1)

# training the discriminator and generator alternating
# for 100 steps in this example, batch size 1, gradient accumulated 8 times

gan(
    dataloader = dataloader,
    steps = 100,
    grad_accum_every = 8
)

Todo

  • make sure it can be trained unconditionally
  • read the relevant papers and knock out all 3 auxiliary losses
    • matching aware loss
    • clip loss
    • vision-aided discriminator loss
    • add reconstruction losses on arbitrary stages in the discriminator (lightweight gan)
    • figure out how the random projections are used from projected-gan
    • vision aided discriminator needs to extract N layers from the vision model in CLIP
    • figure out whether to discard CLS token and reshape into image dimensions for convolution, or stick with attention and condition with adaptive layernorm - also turn off vision aided gan in unconditional case
  • unet upsampler
    • add adaptive conv
    • modify latter stage of unet to also output rgb residuals, and pass the rgb into discriminator. make discriminator agnostic to rgb being passed in
    • do pixel shuffle upsamples for unet
  • get a code review for the multi-scale inputs and outputs, as the paper was a bit vague
  • add upsampling network architecture
  • make unconditional work for both base generator and upsampler
  • add accelerate
  • do a review of the auxiliary losses
  • port over CLI from lightweight|stylegan2-pytorch
  • hook up laion dataset for text-image

Citations

@misc{https://doi.org/10.48550/arxiv.2303.05511,
    url     = {https://arxiv.org/abs/2303.05511},
    author  = {Kang, Minguk and Zhu, Jun-Yan and Zhang, Richard and Park, Jaesik and Shechtman, Eli and Paris, Sylvain and Park, Taesung},  
    title   = {Scaling up GANs for Text-to-Image Synthesis},
    publisher = {arXiv},
    year    = {2023},
    copyright = {arXiv.org perpetual, non-exclusive license}
}
@article{Liu2021TowardsFA,
    title   = {Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis},
    author  = {Bingchen Liu and Yizhe Zhu and Kunpeng Song and A. Elgammal},
    journal = {ArXiv},
    year    = {2021},
    volume  = {abs/2101.04775}
}
@inproceedings{dao2022flashattention,
    title   = {Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness},
    author  = {Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
    booktitle = {Advances in Neural Information Processing Systems},
    year    = {2022}
}

About

Implementation of GigaGAN, new SOTA GAN out of Adobe

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%
0