8000 Question about latent range in DDIM inversion: X_T or X_{T-1}? · Issue #10 · WindVChen/Diff-Harmonization · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Question about latent range in DDIM inversion: X_T or X_{T-1}? #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Donus-S opened this issue Apr 6, 2025 · 1 comment
Open

Question about latent range in DDIM inversion: X_T or X_{T-1}? #10

Donus-S opened this issue Apr 6, 2025 · 1 comment

Comments

@Donus-S
Copy link
Donus-S commented Apr 6, 2025

Dear Authors,

Thank you for your great work.

I have a question regarding the following DDIM inversion code:

diff_harmon.py line 268~287

for t in tqdm(timesteps[:-1], desc="DDIM_inverse"):
    latents_input = torch.cat([latents] * 2)
    noise_pred = model.unet(latents_input, t, encoder_hidden_states=context)["sample"]
    noise_pred_uncond, noise_prediction_text = noise_pred.chunk(2)
    noise_pred = noise_pred_uncond + guidance_scale * (noise_prediction_text - noise_pred_uncond)

    next_timestep = t + model.scheduler.config.num_train_timesteps // model.scheduler.num_inference_steps
    alpha_bar_next = model.scheduler.alphas_cumprod[next_timestep] \
        if next_timestep <= model.scheduler.config.num_train_timesteps else torch.tensor(0.0)

    # leverage reversed x0
    reverse_x0 = (1 / torch.sqrt(model.scheduler.alphas_cumprod[t]) * (
        latents - noise_pred * torch.sqrt(1 - model.scheduler.alphas_cumprod[t])))

    latents = reverse_x0 * torch.sqrt(alpha_bar_next) + torch.sqrt(1 - alpha_bar_next) * noise_pred

    all_latents.append(latents)

# all_latents[N] -> N: DDIM steps (X_{T-1} ~ X_0)
return latents, all_latents

From what I understand, when t = T-1 (961), the next_timestep becomes T (981), meaning the alpha_bar_next is α_T (α_981), so the newly computed latent should correspond to X_T (X_981).

However, according to the comment at the end of the code (all_latents[N] corresponds to X_{T-1} ~ X_0), it seems the stored latents start from X_{T-1}, not X_T.

Could you please clarify this point? Specifically: why is the range of all_latents described as X_{T-1} ~ X_0, instead of X_T ~ X_0?

Thank you in advance for your help!

@WindVChen
Copy link
Owner

Hello @Donus-S,

Thank you for your recognition of our work.

As the code was written quite some time ago, I tried to recall the exact reason behind that comment but unfortunately couldn't remember it precisely.

My guess is that the comment refers to the fact that, to transform x_0 to x_T (or vice versa), there should be T steps, where T = len(timesteps). Since we're only using timesteps[:-1] here, we go from x_0 to x_{T−1}, not all the way to x_T.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0