8000 GitHub - cjeen/LoRAEdit: We achieves high-quality first-frame guided video editing given a reference image, while maintaining flexibility for incorporating additional reference conditions.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

We achieves high-quality first-frame guided video editing given a reference image, while maintaining flexibility for incorporating additional reference conditions.

License

Notifications You must be signed in to change notification settings

cjeen/LoRAEdit

Β 
Β 

Repository files navigation

LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

[Paper] | [Project Page] | [Demo]

LoRA-Edit Teaser
We achieves high-quality first-frame guided video editing given a reference image (top row), while maintaining flexibility for incorporating additional reference conditions (bottom row).

πŸ“° News

  • [2025.06.15] Additional edited frames as guidance is now available! πŸŽ‰
  • [2025.06.07] LoRA-Edit first-frame-guided-editing code is now available! πŸŽ‰

Important Notes

  1. Unlike video editing techniques such as VACE, our method itself does not provide visual editing capabilities. Instead, we leverage powerful image editing models to edit the first frame (or more frames), thereby transferring image editing capabilities to video editing.

  2. Our project curr 8000 ently runs at a moderate speed, taking 30-50 minutes to edit 49 frames on RTX 4090. We are actively working on optimizations (A faster version will be available this month).

  3. The ComfyUI version is currently facing challenges as we haven't found a suitable node for training Wan I2V LoRA in ComfyUI.

  4. We welcome your issues and contributions to our project!

TODO List

  • Upload first-frame-guided video editing code (Completed)
  • Upload additional reference code (Expected before UTC 2025.06.15, Completed)
  • Upload detailed memory and speed report on RTX 4090 (Expected before UTC 2025.06.15, Completed)
  • Optimize memory usage and speed

πŸ› οΈ Environment Setup

Prerequisites

  • CUDA-compatible GPU with sufficient VRAM (We use a single GeForce RTX 4090 (24GB))
  • Python 3.12 (recommended)
  • Git
  • Miniconda or Anaconda

1. Clone Repository and Setup Environment

# Clone the repository with submodules
git clone --recurse-submodules https://github.com/cjeen/LoRAEdit.git
cd LoRAEdit

# If you already cloned without submodules, run:
# git submodule init
# git submodule update

2. Install PyTorch

Install PyTorch compatible with your CUDA version. Check your CUDA version with nvcc -V and choose the appropriate installation command from PyTorch official website.

Examples for common CUDA versions:

# For CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1  
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# For CUDA 12.4
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

3. Install Dependencies

# Install Python dependencies
pip install -r requirements.txt

4. Download Models

Download Wan2.1-I2V Model

# Install huggingface_hub if not already installed
pip install huggingface_hub

# Download the Wan2.1-I2V model
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir ./Wan2.1-I2V-14B-480P

Download SAM2 Model Checkpoint

# Create models directory
mkdir -p models_sam

# Download SAM2 large model (recommended)
wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt -O models_sam/sam2_hiera_large.pt

# Alternative: Download other SAM2 models if needed
# SAM2 Base+: wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_base_plus.pt -O models_sam/sam2_hiera_base_plus.pt
# SAM2 Small: wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_small.pt -O models_sam/sam2_hiera_small.pt
# SAM2 Tiny: wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt -O models_sam/sam2_hiera_tiny.pt

πŸš€ Usage

Tutorial Video

Watch this quick tutorial to see how to use the data preprocessing interface:

tutorial_compressed.mp4

Note: A new tutorial video covering additional reference frames will be available soon.

Step 1: Data Preprocessing

Launch the data preprocessing interface:

python predata_app.py --port 8890 --checkpoint_dir models_sam/sam2_hiera_large.pt

Step 2: LoRA Training

After preprocessing, use the generated training command (example):

NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" deepspeed --num_gpus=1 train.py --deepspeed --config ./processed_data/your_sequence/configs/training.toml

🏁 Training Cost

The table below summarizes the training speed and memory usage for different numbers of frames at 480P (832Γ—480) resolution on an RTX 4090, helping you estimate the resource requirements for your own experiments.
All results on our project page are obtained by training for 100 steps under the 49-frame setting at 480P (832Γ—480).

Number of Frames Time per Iteration (sec) Memory Usage (MB)
5 7.55 11,086
13 10.81 12,496
21 14.79 14,456
49 31.88 21,522
65 † 45.71 20,416

† For 65 frames, blocks_to_swap was set to 38 instead of the default 32.

Step 3: Video Generation

After training completes, run inference:

# Save your edited first frame as edited_image.png (or .jpg) in the data directory
# Then run inference
python inference.py --model_root_dir ./Wan2.1-I2V-14B-480P --data_dir ./processed_data/your_sequence

Step 4: Additional Edited Frames as Reference (Optional)

For more precise control using multiple edited frames as reference:

# 1. Put your edited frames from source_frames to additional_edited_frames directory
# Edit frames from ./processed_data/your_sequence/source_frames/
# Save edited frames to ./processed_data/your_sequence/additional_edited_frames/
# Important: Keep the same filename (e.g., 00000.png, 00001.png, etc.)

# 2. Preprocess additional data
python predata_additional.py --data_dir ./processed_data/your_sequence

# 3. Train additional LoRA (much faster than previous LoRA training)
NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" deepspeed --num_gpus=1 train.py --deepspeed --config ./processed_data/your_sequence/configs/training_additional.toml

# 4. Run inference with additional frames guidance
python inference.py --model_root_dir ./Wan2.1-I2V-14B-480P --data_dir ./processed_data/your_sequence --additional

πŸ“ Directory Structure

project_root/
β”œβ”€β”€ predata_app.py          # Data preprocessing interface
β”œβ”€β”€ train.py                # LoRA training script
β”œβ”€β”€ inference.py            # Video generation inference
β”œβ”€β”€ models_sam/             # SAM2 model checkpoints
β”‚   └── sam2_hiera_large.pt
β”œβ”€β”€ Wan2.1-I2V-14B-480P/    # Wan2.1 model directory
β”œβ”€β”€ processed_data/         # Processed training data
β”‚   └── your_sequence/
β”‚       β”œβ”€β”€ source_frames/  # Original frames for editing
β”‚       β”œβ”€β”€ additional_edited_frames/  # Your edited frames for additional reference
β”‚       β”œβ”€β”€ traindata/      # Training videos and captions
β”‚       β”œβ”€β”€ configs/        # Training configuration files
β”‚       β”œβ”€β”€ lora/          # Trained LoRA checkpoints
β”‚       β”œβ”€β”€ inference_rgb.mp4    # Preprocessed RGB video
β”‚       β”œβ”€β”€ inference_mask.mp4   # Mask video
β”‚       └── edited_image.png     # Your edited first frame
└── requirements.txt

πŸ™ Acknowledgments

We would like to express our sincere gratitude to Wan2.1 for open-sourcing their powerful Image-to-Video model, which serves as the foundation for our work.

This project is built upon diffusion-pipe by tdrussell. We gratefully acknowledge their excellent work in providing a solid foundation for memory-efficient training of diffusion models.

The SAM2 GUI interface in this project references code from SAM2-GUI by YunxuanMao. We thank them for their contribution to the SAM2 community with their intuitive interface design.

About

We achieves high-quality first-frame guided video editing given a reference image, while maintaining flexibility for incorporating additional reference conditions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%
0