Add fast_sampler.py with optimized sampling and VAE decoding, enhance PreviewImage #8136
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add fast_sampler.py with 8000 optimized sampling and VAE decoding, enhance PreviewImage
This update introduces
fast_sampler.py
, a new module designed to enhance the performance of sampling and VAE decoding in ComfyUI. It replaces or augments functionality previously handled inmodel_management.py
, providing better VRAM management, FP16 support, and tiled decoding for low-memory scenarios. Additionally, it improves thePreviewImage
node innodes.py
for faster and more efficient preview generation. These changes improve efficiency, stability, and usability, particularly for GPU-based workflows.Key Changes:
fast_ksampler
for optimized sampling with improved memory management, FP16 support viatorch.amp.autocast
, andchannels_last
memory format for better GPU performance.fast_vae_decode
for efficient VAE decoding, incorporating FP16 support,channels_last
, and selective VRAM clearing to prevent out-of-memory errors.fast_vae_tiled_decode
for tiled VAE decoding, enabling processing of large latents on GPUs with limited VRAM by using configurable tile sizes and overlaps.profile_section
,profile_cuda_sync
) to track execution times and VRAM usage when--profile
or--debug
flags are enabled.clear_vram
, ensuring sufficient free memory before loading models or VAE, with configurable thresholds and minimum free memory requirements.is_fp16_safe
to check GPU compatibility for FP16 operations, disabling them on unsupported hardware (e.g., GTX 1660/Turing).optimized_transfer
andoptimized_conditioning
for synchronous device placement and dtype casting.preload_model
, which unloads VAE before loading U-Net to conserve VRAM and checks for already-loaded VAE to avoid redundant transfers.cudnn.benchmark
for for tests, disabled by default.PreviewImage
node innodes.py
to support adaptive resizing of preview images to a maximum dimension of ~512 pixels while preserving aspect ratio, usingImage.LANCZOS
for quality. Increasedcompress_level
from 1 to 4 for faster PNG compression, optimizing preview generation.Impact:
Dependencies:
nodes.py
for integration withKSampler
,VAEDecode
,VAEDecodeTiled
, andPreviewImage
nodes.ModelPatcher
functionality for model patching (e.g., inLoraLoader
).Notes:
--profile
or--debug
flags to access detailed performance logs.tile_size
,overlap
, etc.) may need tuning for specific workflows.max_size
inPreviewImage
if higher resolution previews are needed.This is a foundational change to improve ComfyUI's performance and scalability, particularly for resource-constrained environments.
Thanks to Grok @ xAI for help.