8000 Added explicit memory management during the VAE decode process. by KBRASK · Pull Request #7587 · comfyanonymous/ComfyUI · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Added explicit memory management during the VAE decode process. < 8000 span class="f1-light color-fg-muted">#7587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

KBRASK
Copy link
@KBRASK KBRASK commented Apr 13, 2025

Hello, I noticed that during the VAE decode process, the feature maps as intermediate variables were not explicitly deleted, which led to unnecessary CUDA out of memory errors. I have fixed this by expli 8000 citly deleting the intermediate variables.

@KBRASK KBRASK requested a review from comfyanonymous as a code owner April 13, 2025 04:05
@comfyanonymous
Copy link
Owner

I can't merge this as is because it would break on non cuda devices but does it actually improve things?

h = h_new does "del" the h variable. del doesn't actually delete anything, it only removes that reference to the object.

@KBRASK
Copy link
Author
KBRASK commented Apr 20, 2025

@comfyanonymous Hi, thank you for your review.
In this section, during the iterative upsampling process, the feature map h continuously accumulates, which is a significant reason for the CUDA out of memory errors.
I removed the reference to the old h and then used torch.cuda.empty_cache(). This allows the memory for the unused h tensor to be freed. This approach can reduce the peak VRAM usage during the VAE decoding process in ComfyUI by 40%.

@KBRASK
Copy link
Author
KBRASK commented Apr 20, 2025

You are correct that non-CUDA devices cannot use torch.cuda.empty_cache() to release VRAM. However, this step is quite important for CUDA devices. I can add logic here to check if CUDA is available and apply this optimization specifically for CUDA devices.

@KBRASK
Copy link
Author
KBRASK commented Apr 20, 2025

I can also show you the difference in VRAM usage during the VAE decoding stage, comparing the results with and without explicit memory management.
Previously (in the original logic), all feature maps generated during upsampling were kept in memory until the entire decoding phase was finished, at which point they were all released together.
Now (in the new logic), we clear the previous feature map immediately after each upsampling step. This avoids it taking up VRAM unnecessarily.

@chaObserv
Copy link
Contributor

You are correct that non-CUDA devices cannot use torch.cuda.empty_cache() to release VRAM. However, this step is quite important for CUDA devices. I can add logic here to check if CUDA is available and apply this optimization specifically for CUDA devices.

If empty_cache is needed, comfy.model_management.soft_empty_cache() might be better for compatibility.

@KBRASK
Copy link
Author
KBRASK commented Apr 20, 2025

@chaObserv Thank you for your valuable suggestion! I have already replaced torch.cuda.empty_cache() with model_management.soft_empty_cache() to support non-CUDA devices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0