8000 GitHub - yuanxiaoming8899/IF: 一种新颖的最先进的开源文本转图像模型
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

yuanxiaoming8899/IF

 
 

Repository files navigation

执照 执照 下载 不和谐 推特 链接树

我们介绍了 DeepFloyd IF,这是一种新颖的最先进的开源文本转图像模型,具有高度的照片级真实感和语言理解能力。DeepFloyd IF 是一个由冻结文本编码器和三个级联像素扩散模块组成的模块:一个基于文本提示生成 64x64 像素图像的基本模型和两个超分辨率模型,每个模型都旨在生成分辨率不断提高的图像:256x256 像素和 1024x1024 像素。该模型的所有阶段都使用基于 T5 变换器的冻结文本编码器来提取文本嵌入,然后将其输入到通过交叉注意力和注意力池增强的 UNet 架构中。结果是一个高效的模型,其表现优于目前最先进的模型,在 COCO 数据集上实现了 6.66 的零样本 FID 分数。我们的工作强调了更大的 UNet 架构在级联扩散模型第一阶段的潜力,并描绘了文本转图像合成的光明未来。

受到具有深度语言理解能力的逼真文本到图像扩散模型的启发

使用所有 IF 模型的最低要求:

  • IF-I-XL(4.3B 文本到 64x64 基本模块)和 IF-II-L(1.2B 到 256x256 升频模块)配备 16GB vRAM
  • IF-I-XL(4.3B 文本至 64x64 基本模块)和 IF-II-L(1.2B 至 256x256 升频器模块)和 Stable x4(至 1024x1024 升频器)均配备 24GB vRAM
  • xformers并设置环境变量FORCE_MEM_EFFICIENT_ATTN=1

快速开始

在 Colab 中打开 拥抱面部空间

pip install deepfloyd_if==1.0.2rc0
pip install xformers==0.0.16
pip install git+https://github.com/openai/CLIP.git --no-deps

本地笔记本

Jupyter 笔记本 卡格勒

这里的Jupyter Notebook 中提供了梦幻、风格转换、超分辨率或修复模式。

与🤗扩散器集成

IF 还与 🤗 Hugging Face Diffusers 库集成。

扩散器单独运行每个阶段,允许用户自定义图像生成过程并轻松检查中间结果。

例子

在使用 IF 之前,您需要接受其使用条件。具体操作如下:

  1. 确保拥有Hugging Face 账户并登录
  2. 在DeepFloyd/IF-I-XL-v1.0模型卡上接受许可证
  3. 确保本地登录。安装huggingface_hub
pip install huggingface_hub --upgrade

在 Python shell 中运行登录函数

from huggingface_hub import login

login()

<clipboard-copy aria-label="Copy" class="ClipboardButton btn btn-invisible js-clipboard-copy m-2 p-0 tooltipped-no-delay d-flex flex-justify-center flex-items-center" data-copy-feedback="Copied!" data-tooltip-direction="w" value="from huggingface_hub import login

login()" tabindex="0" role="button">

并输入您的Hugging Face Hub 访问令牌

接下来我们安装diffusers依赖项:

pip install diffusers accelerate transformers safetensors

现在我们可以在本地运行该模型了。

默认情况下,diffusers利用模型 CPU 卸载来运行整个 IF 管道,且仅需 14 GB 的 VRAM。

如果您正在使用torch>=2.0.0,请确保删除所有 enable_xformers_memory_efficient_attention() 功能。

from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
import torch

# stage 1 stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16) stage_1.enable_xformers_memory_efficient_attention() # remove line if torch.version >= 2.0.0 stage_1.enable_model_cpu_offload()

# stage 2 stage_2 = DiffusionPipeline.from_pretrained( "DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16 ) stage_2.enable_xformers_memory_efficient_attention() # remove line if torch.version >= 2.0.0 stage_2.enable_model_cpu_offload()

# stage 3 safety_modules = {"feature_extractor": stage_1.feature_extractor, "safety_checker": stage_1.safety_checker, "watermarker": stage_1.watermarker} stage_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16) stage_3.enable_xformers_memory_efficient_attention() # remove line if torch.version >= 2.0.0 stage_3.enable_model_cpu_offload()

prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'

# text embeds prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)

generator = torch.manual_seed(0)

# stage 1 image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images pt_to_pil(image)[0].save("./if_stage_I.png")

# stage 2 image = stage_2( image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt" ).images pt_to_pil(image)[0].save("./if_stage_II.png")

# stage 3 image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images image[0].save("./if_stage_III.png")

<clipboard-copy aria-label="Copy" class="ClipboardButton btn btn-invisible js-clipboard-copy m-2 p-0 tooltipped-no-delay d-flex flex-justify-center flex-items-center" data-copy-feedback="Copied!" data-tooltip-direction="w" value="from diffusers import DiffusionPipeline from diffusers.utils import pt_to_pil import torch

stage 1

stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16) stage_1.enable_xformers_memory_efficient_attention() # remove line if torch.version >= 2.0.0 stage_1.enable_model_cpu_offload()

stage 2

stage_2 = DiffusionPipeline.from_pretrained( "DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16 ) stage_2.enable_xformers_memory_efficient_attention() # remove line if torch.version >= 2.0.0 stage_2.enable_model_cpu_offload()

stage 3

safety_modules = {"feature_extractor": stage_1.feature_extractor, "safety_checker": stage_1.safety_checker, "watermarker": stage_1.watermarker} stage_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16) stage_3.enable_xformers_memory_efficient_attention() # remove line if torch.version >= 2.0.0 stage_3.enable_model_cpu_offload()

prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'

text embeds

prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)

generator = torch.manual_seed(0)

stage 1

image = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt").images pt_to_pil(image)[0].save("./if_stage_I.png")

stage 2

image = stage_2( image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt" ).images pt_to_pil(image)[0].save("./if_stage_II.png")

stage 3

image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images image[0].save("./if_stage_III.png")" tabindex="0" role="button">

有多种方法可以加快推理时间并进一步降低内存消耗diffusers。为此,请查看 Diffusers 文档:

有关如何使用 IF 的更多详细信息,请查看IF 博客文章文档📖。

Diffusers dreambooth 脚本还支持微调 🎨 IF。通过参数高效微调,您可以使用单个 GPU 和 ~28 GB VRAM 向 IF 添加新概念。

在本地运行代码

将模型加载到 VRAM 中

from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII
from deepfloyd_if.modules.t5 import T5Embedder

device = 'cuda:0' if_I = IFStageI('IF-I-XL-v1.0', device=device) if_II = IFStageII('IF-II-L-v1.0', device=device) if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device) t5 = T5Embedder(device="cpu")

<clipboard-copy aria-label="Copy" class="ClipboardButton btn btn-invisible js-clipboard-copy m-2 p-0 tooltipped-no-delay d-flex flex-justify-center flex-items-center" data-copy-feedback="Copied!" data-tooltip-direction="w" value="from deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII from deepfloyd_if.modules.t5 import T5Embedder

device = 'cuda:0' if_I = IFStageI('IF-I-XL-v1.0', device=device) if_II = IFStageII('IF-II-L-v1.0', device=device) if_III = StableStageIII('stable-diffusion-x4-upscaler', device=device) t5 = T5Embedder(device="cpu")" tabindex="0" role="button">

我梦想

Dream 是 IF 模型的文本转图像模式

from deepfloyd_if.pipelines import dream

prompt = 'ultra close-up color photo portrait of rainbow owl with deer horns in the woods' count = 4

result = dream( t5=t5, if_I=if_I, if_II=if_II, if_III=if_III, prompt=[prompt]*count, seed=42, if_I_kwargs={ "guidance_scale": 7.0, "sample_timestep_respacing": "smart100", }, if_II_kwargs={ "guidance_scale": 4.0, "sample_timestep_respacing": "smart50", }, if_III_kwargs={ "guidance_scale": 9.0, "noise_level": 20, "sample_timestep_respacing": "75", }, )

if_III.show(result['III'], size=14)

<clipboard-copy aria-label="Copy" class="ClipboardButton btn btn-invisible js-clipboard-copy m-2 p-0 tooltipped-no-delay d-flex flex-justify-center flex-items-center" data-copy-feedback="Copied!" data-tooltip-direction="w" value="from deepfloyd_if.pipelines import dream

prompt = 'ultra close-up color photo portrait of rainbow owl with deer horns in the woods' count = 4

result = dream( t5=t5, if_I=if_I, if_II=if_II, if_III=if_III, prompt=[prompt]*count, seed=42, if_I_kwargs={ "guidance_scale": 7.0, "sample_timestep_respacing": "smart100", }, if_II_kwargs={ "guidance_scale": 4.0, "sample_timestep_respacing": "smart50", }, if_III_kwargs={ "guidance_scale": 9.0, "noise_level": 20, "sample_timestep_respacing": "75", }, )

if_III.show(result['III'], size=14)" tabindex="0" role="button">

II. 零样本图像到图像翻译

在风格转换模式下,你的提示输出的风格与support_pil_img

from deepfloyd_if.pipelines import style_transfer

result = style_transfer( t5=t5, if_I=if_I, if_II=if_II, support_pil_img=raw_pil_image, style_prompt=[ 'in style of professional origami', 'in style of oil art, Tate modern', 'in style of plastic building bricks', 'in style of classic anime from 1990', ], seed=42, if_I_kwargs={ "guidance_scale": 10.0, "sample_timestep_respacing": "10,10,10,10,10,10,10,10,0,0", 'support_noise_less_qsample_steps': 5, }, if_II_kwargs={ "guidance_scale": 4.0, "sample_timestep_respacing": 'smart50', "support_noise_less_qsample_steps": 5, }, ) if_I.show(result['II'], 1, 20)

<clipboard-copy aria-label="Copy" class="ClipboardButton btn btn-invisible js-clipboard-copy m-2 p-0 tooltipped-no-delay d-flex flex-justify-center flex-items-center" data-copy-feedback="Copied!" data-tooltip-direction="w" value="from deepfloyd_if.pipelines import style_transfer

result = style_transfer( t5=t5, if_I=if_I, if_II=if_II, support_pil_img=raw_pil_image, style_prompt=[ 'in style of professional origami', 'in style of oil art, Tate modern', 'in style of plastic building bricks', 'in style of classic anime from 1990', ], seed=42, if_I_kwargs={ "guidance_scale": 10.0, "sample_timestep_respacing": "10,10,10,10,10,10,10,10,0,0", 'support_noise_less_qsample_steps': 5, }, if_II_kwargs={ "guidance_scale": 4.0, "sample_timestep_respacing": 'smart50', "support_noise_less_qsample_steps": 5, }, ) if_I.show(result['II'], 1, 20)" tabindex="0" role="button">

替代文本

    <span data-target="animated-image.imageContainer">
        <img data-target="animated-image.replacedImage" alt="替代文本" class="AnimatedImagePlayer-animatedImage" src="https://github.com/deep-floyd/IF/raw/develop/pics/deep_floyd_if_image_2_image.gif" style="display: block; opacity: 1;">
      <canvas class="AnimatedImagePlayer-stillImage" aria-hidden="true" width="814" height="501"></canvas></span></a>
    <button data-target="animated-image.imageButton" class="AnimatedImagePlayer-images" tabindex="-1" aria-label="播放替代文本" hidden=""></button>
    <span class="AnimatedImagePlayer-controls" data-target="animated-image.controls" hidden="">
      <button data-target="animated-image.playButton" class="AnimatedImagePlayer-button" aria-label="播放替代文本">
        <svg aria-hidden="true" focusable="false" class="octicon icon-play" width="16" height="16" viewBox="0 0 16 16" fill="none" xmlns="http://www.w3.org/2000/svg">
          <path d="M4 13.5427V2.45734C4 1.82607 4.69692 1.4435 5.2295 1.78241L13.9394 7.32507C14.4334 7.63943 14.4334 8.36057 13.9394 8.67493L5.2295 14.2176C4.69692 14.5565 4 14.1739 4 13.5427Z">
        </path></svg>
        <svg aria-hidden="true" focusable="false" class="octicon icon-pause" width="16" height="16" viewBox="0 0 16 16" xmlns="http://www.w3.org/2000/svg">
          <rect x="4" y="2" width="3" height="12" rx="1"></rect>
          <rect x="9" y="2" width="3" height="12" rx="1"></rect>
        </svg>
      </button>
      <a data-target="animated-image.openButton" aria-label="在新窗口中打开替代文本" class="AnimatedImagePlayer-button" href="https://github.com/deep-floyd/IF/blob/develop/pics/deep_floyd_if_image_2_image.gif" target="_blank">
        <svg aria-hidden="true" class="octicon" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16">
          <path fill-rule="evenodd" d="M10.604 1h4.146a.25.25 0 01.25.25v4.146a.25.25 0 01-.427.177L13.03 4.03 9.28 7.78a.75.75 0 01-1.06-1.06l3.75-3.75-1.543-1.543A.25.25 0 0110.604 1zM3.75 2A1.75 1.75 0 002 3.75v8.5c0 .966.784 1.75 1.75 1.75h8.5A1.75 1.75 0 0014 12.25v-3.5a.75.75 0 00-1.5 0v3.5a.25.25 0 01-.25.25h-8.5a.25.25 0 01-.25-.25v-8.5a.25.25 0 01.25-.25h3.5a.75.75 0 000-1.5h-3.5z"></path>
        </svg>
      </a>
    </span>
  </span></animated-image></p>

三、超分辨率

对于超分辨率,用户可以在不一定由 IF 生成的图像上运行IF-II和/或“稳定 x4”(两个级联):IF-III

from deepfloyd_if.pipelines import super_resolution

middle_res = super_resolution( t5, if_III=if_II, prompt=['woman with a blue headscarf and a blue sweaterp, detailed picture, 4k dslr, best quality'], support_pil_img=raw_pil_image, img_scale=4., img_size=64, if_III_kwargs={ 'sample_timestep_respacing': 'smart100', 'aug_level': 0.5, 'guidance_scale': 6.0, }, ) high_res = super_resolution( t5, if_III=if_III, prompt=[''], support_pil_img=middle_res['III'][0], img_scale=4., img_size=256, if_III_kwargs={ "guidance_scale": 9.0, "noise_level": 20, "sample_timestep_respacing": "75", }, ) show_superres(raw_pil_image, high_res['III'][0])

<clipboard-copy aria-label="Copy" class="ClipboardButton btn btn-invisible js-clipboard-copy m-2 p-0 tooltipped-no-delay d-flex flex-justify-center flex-items-center" data-copy-feedback="Copied!" data-tooltip-direction="w" value="from deepfloyd_if.pipelines import super_resolution

middle_res = super_resolution( t5, if_III=if_II, prompt=['woman with a blue headscarf and a blue sweaterp, detailed picture, 4k dslr, best quality'], support_pil_img=raw_pil_image, img_scale=4., img_size=64, if_III_kwargs={ 'sample_timestep_respacing': 'smart100', 'aug_level': 0.5, 'guidance_scale': 6.0, }, ) high_res = super_resolution( t5, if_III=if_III, prompt=[''], support_pil_img=middle_res['III'][0], img_scale=4., img_size=256, if_III_kwargs={ "guidance_scale": 9.0, "noise_level": 20, "sample_timestep_respacing": "75", }, ) show_superres(raw_pil_image, high_res['III'][0])" tabindex="0" role="button">

IV. 零样本修复

from deepfloyd_if.pipelines import inpainting

result = inpainting( t5=t5, if_I=if_I, if_II=if_II, if_III=if_III, support_pil_img=raw_pil_image, inpainting_mask=inpainting_mask, prompt=[ 'oil art, a man in a hat', ], seed=42, if_I_kwargs={ "guidance_scale": 7.0, "sample_timestep_respacing": "10,10,10,10,10,0,0,0,0,0", 'support_noise_less_qsample_steps': 0, }, if_II_kwargs={ "guidance_scale": 4.0, 'aug_level': 0.0, "sample_timestep_respacing": '100', }, if_III_kwargs={ "guidance_scale": 9.0, "noise_level": 20, "sample_timestep_respacing": "75", }, ) if_I.show(result['I'], 2, 3) if_I.show(result['II'], 2, 6) if_I.show(result['III'], 2, 14)

<clipboard-copy aria-label="Copy" class="ClipboardButton btn btn-invisible js-clipboard-copy m-2 p-0 tooltipped-no-delay d-flex flex-justify-center flex-items-center" data-copy-feedback="Copied!" data-tooltip-direction="w" value="from deepfloyd_if.pipelines import inpainting

result = inpainting( t5=t5, if_I=if_I, if_II=if_II, if_III=if_III, support_pil_img=raw_pil_image, inpainting_mask=inpainting_mask, prompt=[ 'oil art, a man in a hat', ], seed=42, if_I_kwargs={ "guidance_scale": 7.0, "sample_timestep_respacing": "10,10,10,10,10,0,0,0,0,0", 'support_noise_less_qsample_steps': 0, }, if_II_kwargs={ "guidance_scale": 4.0, 'aug_level': 0.0, "sample_timestep_respacing": '100', }, if_III_kwargs={ "guidance_scale": 9.0, "noise_level": 20, "sample_timestep_respacing": "75", }, ) if_I.show(result['I'], 2, 3) if_I.show(result['II'], 2, 6) if_I.show(result['III'], 2, 14)" tabindex="0" role="button">

    <span data-target="animated-image.imageContainer">
        <img data-target="animated-image.replacedImage" alt="deep_floyd_if_inpainting.gif" class="AnimatedImagePlayer-animatedImage" src="https://github.com/deep-floyd/IF/raw/develop/pics/deep_floyd_if_inpainting.gif" style="display: block; opacity: 1;">
      <canvas class="AnimatedImagePlayer-stillImage" aria-hidden="true" width="814" height="814"></canvas></span></a>
    <button data-target="animated-image.imageButton" class="AnimatedImagePlayer-images" tabindex="-1" aria-label="播放 deep_floyd_if_inpainting.gif" hidden=""></button>
    <span class="AnimatedImagePlayer-controls" data-target="animated-image.controls" hidden="">
      <button data-target="animated-image.playButton" class="AnimatedImagePlayer-button" aria-label="播放 deep_floyd_if_inpainting.gif">
        <svg aria-hidden="true" focusable="false" class="octicon icon-play" width="16" height="16" viewBox="0 0 16 16" fill="none" xmlns="http://www.w3.org/2000/svg">
          <path d="M4 13.5427V2.45734C4 1.82607 4.69692 1.4435 5.2295 1.78241L13.9394 7.32507C14.4334 7.63943 14.4334 8.36057 13.9394 8.67493L5.2295 14.2176C4.69692 14.5565 4 14.1739 4 13.5427Z">
        </path></svg>
        <svg aria-hidden="true" focusable="false" class="octicon icon-pause" width="16" height="16" viewBox="0 0 16 16" xmlns="http://www.w3.org/2000/svg">
          <rect x="4" y="2" width="3" height="12" rx="1"></rect>
          <rect x="9" y="2" width="3" height="12" rx="1"></rect>
        </svg>
      </button>
      <a data-target="animated-image.openButton" aria-label="在新窗口中打开 deep_floyd_if_inpainting.gif" class="AnimatedImagePlayer-button" href="https://github.com/deep-floyd/IF/blob/develop/pics/deep_floyd_if_inpainting.gif" target="_blank">
        <svg aria-hidden="true" class="octicon" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16">
          <path fill-rule="evenodd" d="M10.604 1h4.146a.25.25 0 01.25.25v4.146a.25.25 0 01-.427.177L13.03 4.03 9.28 7.78a.75.75 0 01-1.06-1.06l3.75-3.75-1.543-1.543A.25.25 0 0110.604 1zM3.75 2A1.75 1.75 0 002 3.75v8.5c0 .966.784 1.75 1.75 1.75h8.5A1.75 1.75 0 0014 12.25v-3.5a.75.75 0 00-1.5 0v3.5a.25.25 0 01-.25.25h-8.5a.25.25 0 01-.25-.25v-8.5a.25.25 0 01.25-.25h3.5a.75.75 0 000-1.5h-3.5z"></path>
        </svg>
      </a>
    </span>
  </span></animated-image></p>
8000

🤗 模范动物园 🤗

下载权重以及模型卡的链接将很快在模型动物园的每个模型上提供

原来的

姓名 级联 参数 火焰离子化 批次大小 脚步
如果我是 4亿 8.86 3072 2.5米
中频 900米 8.06 3200 3.0米
如果-我-XL * 4.3B 6.66 3072 2.42万
IF-II-M 450米 - 1536 2.5米
如果-II-L * 1.2B - 1536 2.5米
IF-III-L* (即将推出) 700米 - 3072 1.25米

*最佳模块

定量评估

FID = 6.66

执照

此存储库中的代码是根据定制许可证发布的(参见添加的第二点)。

这些重量将很快通过Hugging Face 的 DeepFloyd 组织提供,并拥有自己的许可证。

免责声明: IF 模型的初始版本暂时受到限制性研究目的许可,以收集反馈,之后我们打算发布与其他稳定性 AI 模型一致的完全开源模型。

局限性和偏见

此代码库中提供的模型具有已知的局限性和偏差。请参阅模型卡了解更多信息。

🎓 DeepFloyd IF 创作者:

📄 研究论文(即将发布)

致谢

特别感谢StabilityAI及其首席执行官Emad Mostaque的宝贵支持,提供 GPU 计算和基础设施来训练模型(我们感谢Richard Vencu);特别感谢LAIONChristoph Schuhmann对项目的贡献和精心准备的数据集;感谢Huggingface团队在推理过程中优化模型的速度和内存消耗,创建演示并提供很棒的建议!

🚀 外部贡献者 🚀

  • 最诚挚的感谢@Apolinário,感谢他们在各个阶段提供的想法、咨询、帮助和支持,使 IF 可以在开源中使用;感谢他们编写了大量文档和说明;感谢他们在困难时刻营造了友好的氛围🦉;
  • 感谢@patrickvonplaten,将 unet 模型的加载时间提高了 80%;将 Stable-Diffusion-x4 集成为原生管道💪;
  • 感谢@williamberman@patrickvonplaten的扩散器集成🙌;
  • 感谢@hysts@Apolinário使用 IF 🚀创建了最佳 gradio 演示;
  • 感谢@Dango233使用 xformers 内存高效注意力机制调整 IF 💪;

About

一种新颖的最先进的开源文本转图像模型

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%
0