LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

We achieves high-quality first-frame guided video editing given a reference image (top row), while maintaining flexibility for incorporating additional reference conditions (bottom row).

📰 News

[2025.06.15] Additional edited frames as guidance is now available! 🎉
[2025.06.07] LoRA-Edit first-frame-guided-editing code is now available! 🎉

Important Notes

Unlike video editing techniques such as VACE, our method itself does not provide visual editing capabilities. Instead, we leverage powerful image editing models to edit the first frame (or more frames), thereby transferring image editing capabilities to video editing.
Our project curr 8000 ently runs at a moderate speed, taking 30-50 minutes to edit 49 frames on RTX 4090. We are actively working on optimizations (A faster version will be available this month).
The ComfyUI version is currently facing challenges as we haven't found a suitable node for training Wan I2V LoRA in ComfyUI.
We welcome your issues and contributions to our project!

TODO List

Upload first-frame-guided video editing code (Completed)
Upload additional reference code (~~Expected before UTC 2025.06.15~~, Completed)
Upload detailed memory and speed report on RTX 4090 (~~Expected before UTC 2025.06.15~~, Completed)
Optimize memory usage and speed

🛠️ Environment Setup

Prerequisites

CUDA-compatible GPU with sufficient VRAM (We use a single GeForce RTX 4090 (24GB))
Python 3.12 (recommended)
Git
Miniconda or Anaconda

1. Clone Repository and Setup Environment

# Clone the repository with submodules
git clone --recurse-submodules https://github.com/cjeen/LoRAEdit.git
cd LoRAEdit

# If you already cloned without submodules, run:
# git submodule init
# git submodule update

2. Install PyTorch

Install PyTorch compatible with your CUDA version. Check your CUDA version with nvcc -V and choose the appropriate installation command from PyTorch official website.

Examples for common CUDA versions:

# For CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1  
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# For CUDA 12.4
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

3. Install Dependencies

# Install Python dependencies
pip install -r requirements.txt

4. Download Models

Download Wan2.1-I2V Model

# Install huggingface_hub if not already installed
pip install huggingface_hub

# Download the Wan2.1-I2V model
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir ./Wan2.1-I2V-14B-480P

Download SAM2 Model Checkpoint

# Create models directory
mkdir -p models_sam

# Download SAM2 large model (recommended)
wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt -O models_sam/sam2_hiera_large.pt

# Alternative: Download other SAM2 models if needed
# SAM2 Base+: wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_base_plus.pt -O models_sam/sam2_hiera_base_plus.pt
# SAM2 Small: wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_small.pt -O models_sam/sam2_hiera_small.pt
# SAM2 Tiny: wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt -O models_sam/sam2_hiera_tiny.pt

🚀 Usage

Tutorial Video

Watch this quick tutorial to see how to use the data preprocessing interface:

tutorial_compressed.mp4

Note: A new tutorial video covering additional reference frames will be available soon.

Step 1: Data Preprocessing

Launch the data preprocessing interface:

python predata_app.py --port 8890 --checkpoint_dir models_sam/sam2_hiera_large.pt

Step 2: LoRA Training

After preprocessing, use the generated training command (example):

NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" deepspeed --num_gpus=1 train.py --deepspeed --config ./processed_data/your_sequence/configs/training.toml

🏁 Training Cost

The table below summarizes the training speed and memory usage for different numbers of frames at 480P (832×480) resolution on an RTX 4090, helping you estimate the resource requirements for your own experiments.
All results on our project page are obtained by training for 100 steps under the 49-frame setting at 480P (832×480).

Number of Frames	Time per Iteration (sec)	Memory Usage (MB)
5	7.55	11,086
13	10.81	12,496
21	14.79	14,456
49	31.88	21,522
65 †	45.71	20,416

^{† For 65 frames, blocks_to_swap was set to 38 instead of the default 32.}

Step 3: Video Generation

After training completes, run inference:

# Save your edited first frame as edited_image.png (or .jpg) in the data directory
# Then run inference
python inference.py --model_root_dir ./Wan2.1-I2V-14B-480P --data_dir ./processed_data/your_sequence

Step 4: Additional Edited Frames as Reference (Optional)

For more precise control using multiple edited frames as reference:

# 1. Put your edited frames from source_frames to additional_edited_frames directory
# Edit frames from ./processed_data/your_sequence/source_frames/
# Save edited frames to ./processed_data/your_sequence/additional_edited_frames/
# Important: Keep the same filename (e.g., 00000.png, 00001.png, etc.)

# 2. Preprocess additional data
python predata_additional.py --data_dir ./processed_data/your_sequence

# 3. Train additional LoRA (much faster than previous LoRA training)
NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" deepspeed --num_gpus=1 train.py --deepspeed --config ./processed_data/your_sequence/configs/training_additional.toml

# 4. Run inference with additional frames guidance
python inference.py --model_root_dir ./Wan2.1-I2V-14B-480P --data_dir ./processed_data/your_sequence --additional

📁 Directory Structure

project_root/
├── predata_app.py          # Data preprocessing interface
├── train.py                # LoRA training script
├── inference.py            # Video generation inference
├── models_sam/             # SAM2 model checkpoints
│   └── sam2_hiera_large.pt
├── Wan2.1-I2V-14B-480P/    # Wan2.1 model directory
├── processed_data/         # Processed training data
│   └── your_sequence/
│       ├── source_frames/  # Original frames for editing
│       ├── additional_edited_frames/  # Your edited frames for additional reference
│       ├── traindata/      # Training videos and captions
│       ├── configs/        # Training configuration files
│       ├── lora/          # Trained LoRA checkpoints
│       ├── inference_rgb.mp4    # Preprocessed RGB video
│       ├── inference_mask.mp4   # Mask video
│       └── edited_image.png     # Your edited first frame
└── requirements.txt

🙏 Acknowledgments

We would like to express our sincere gratitude to Wan2.1 for open-sourcing their powerful Image-to-Video model, which serves as the foundation for our work.

This project is built upon diffusion-pipe by tdrussell. We gratefully acknowledge their excellent work in providing a solid foundation for memory-efficient training of diffusion models.

The SAM2 GUI interface in this project references code from SAM2-GUI by YunxuanMao. We thank them for their contribution to the SAM2 community with their intuitive interface design.

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
assets		assets
configs		configs
docs		docs
examples		examples
models		models
optimizers		optimizers
submodules		submodules
tools		tools
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
custom_wan_pipe.py		custom_wan_pipe.py
inference.py		inference.py
predata_additional.py		predata_additional.py
predata_app.py		predata_app.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

📰 News

Important Notes

TODO List

🛠️ Environment Setup

Prerequisites

1. Clone Repository and Setup Environment

2. Install PyTorch

3. Install Dependencies

4. Download Models

Download Wan2.1-I2V Model

Download SAM2 Model Checkpoint

🚀 Usage

Tutorial Video

Step 1: Data Preprocessing

Step 2: LoRA Training

🏁 Training Cost

Step 3: Video Generation

Step 4: Additional Edited Frames as Reference (Optional)

📁 Directory Structure

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

cjeen/LoRAEdit

Folders and files

Latest commit

History

Repository files navigation

LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

📰 News

Important Notes

TODO List

🛠️ Environment Setup

Prerequisites

1. Clone Repository and Setup Environment

2. Install PyTorch

3. Install Dependencies

4. Download Models

Download Wan2.1-I2V Model

Download SAM2 Model Checkpoint

🚀 Usage

Tutorial Video

Step 1: Data Preprocessing

Step 2: LoRA Training

🏁 Training Cost

Step 3: Video Generation

Step 4: Additional Edited Frames as Reference (Optional)

📁 Directory Structure

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages