8000 GitHub - richservo/MatAnyone: [CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

richservo/MatAnyone

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

76 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

MatAnyone Logo

Stable Video Matting with Consistent Memory Propagation

1S-Lab, Nanyang Technological Universityโ€ƒ 2SenseTime Research, Singaporeโ€ƒ

MatAnyone is a practical human video matting framework supporting target assignment, with stable performance in both semantics of core regions and fine-grained boundary details.

๐ŸŽฅ For more visual results, go checkout our project page


๐Ÿ“ฎ Update

  • [2025.05] Revolutionary Smart Chunking System: Added intelligent heat map-based chunk placement with face detection for optimal video quality
  • [2025.05] Enhanced GUI Interface: Completely redesigned interface with simplified controls and advanced processing options
  • [2025.05] Advanced Video Processing: Added enhanced chunk processing, parallel processing, and high-quality video encoding
  • [2025.03] Release our evaluation benchmark - YouTubeMatte.
  • [2025.03] Integrate MatAnyone with Hugging Face ๐Ÿค—
  • [2025.02] Release inference codes and gradio demo.
  • [2025.02] This repo is created.

๐Ÿ”Ž Overview

overall_structure

๐Ÿ”ง Installation

  1. Clone Repo

    git clone https://github.com/richservo/MatAnyone
    cd MatAnyone
  2. Create Conda Environment and Install Dependencies

    # create new conda env
    conda create -n matanyone python=3.8 -y
    conda activate matanyone
    
    # install python dependencies
    pip install -e .
    # [optional] install python dependencies for gradio demo
    pip3 install -r hugging_face/requirements.txt
  3. Install Segment Anything Models (SAM)

    # Install original SAM (required)
    pip install git+https://github.com/facebookresearch/segment-anything.git
  4. [Optional] Install SAM2 for Better Mask Quality

    Easy Installation (Python 3.8):

    # Simply run the installer
    python install_sam2.py

    This will automatically:

    • Download and patch SAM2 for Python 3.8 compatibility
    • Install SAM2 in your environment
    • Download the model weights (856MB)
    • Verify the installation

    Note: SAM2 requires Python 3.10+ officially, but our installer makes it work with Python 3.8. If SAM2 is not available, MatAnyone will automatically fall back to SAM1.

๐Ÿ”ง Plugin Architecture

MatAnyone now features a clean, modular plugin architecture that organizes the codebase for maximum maintainability and extensibility:

Project Structure

MatAnyone/
โ”œโ”€โ”€ core/                    # Core processing engine
โ”œโ”€โ”€ plugins/
โ”‚   โ””โ”€โ”€ MatAnyone/          # MatAnyone model plugin
โ”‚       โ”œโ”€โ”€ adapter.py      # Model adapter implementation
โ”‚       โ”œโ”€โ”€ matanyone/      # Original MatAnyone code
โ”‚       โ””โ”€โ”€ hugging_face/   # HuggingFace integration
โ”œโ”€โ”€ ui/                     # GUI components
โ”œโ”€โ”€ chunking/              # Smart chunking system
โ”œโ”€โ”€ mask/                  # Mask processing
โ””โ”€โ”€ utils/                 # Shared utilities

Key Benefits

  • ๐Ÿ”„ Easy Updates: MatAnyone plugin can be updated independently via git
  • ๐Ÿงฉ Modular Design: Clean separation between core engine and model implementation
  • โšก Automatic Updates: Built-in update checking for seamless improvements
  • ๐Ÿ”ง Extensible: Plugin architecture allows for future model integrations

For Developers: Adapting Other Models

While MatAnyone provides the best-tested and supported video matting experience, the plugin architecture makes it possible to experiment with other video processing models. If you're interested in adapting other models to work with MatAnyone's enhanced chunking system and GUI:

๐Ÿ“š See the Plugin Architecture Guide for detailed documentation on:

  • Understanding the adapter interface
  • Creating custom model plugins
  • Integration with enhanced chunking
  • Best practices and examples

Note: This is intended for advanced users and experimental purposes. The core MatAnyone model remains the primary, production-ready solution.

๐Ÿค— Load from Hugging Face

Alternatively, models can be directly loaded from Hugging Face to make inference.

pip install -q git+https://github.com/pq-yang/MatAnyone

To extract the foreground and the alpha video you can directly run the following lines. Please refer to inference_hf.py for more arguments.

from matanyone import InferenceCore
processor = InferenceCore("PeiqingYang/MatAnyone")

foreground_path, alpha_path = processor.process_video(
    input_path = "inputs/video/test-sample1.mp4",
    mask_path = "inputs/mask/test-sample1.png",
    output_path = "outputs"
)

๐Ÿ”ฅ Inference

Download Model

Download our pretrained model from MatAnyone v1.0.0 to the pretrained_models folder (pretrained model can also be automatically downloaded during the first inference).

The directory structure will be arranged as:

pretrained_models
   |- matanyone.pth

Quick Test

We provide some examples in the inputs folder. For each run, we take a video and a segmentation mask as input. The mask can be from any frame in the video - you don't need your subject to be in frame 0! The segmentation mask can be obtained from interactive segmentation models such as SAM2 demo or generated directly in the GUI. For example, the directory structure can be arranged as:

inputs
   |- video
      |- test-sample0          # folder containing all frames
      |- test-sample1.mp4      # .mp4, .mov, .avi
   |- mask
      |- test-sample0_1.png    # mask for person 1
      |- test-sample0_2.png    # mask for person 2
      |- test-sample1.png    

Run the following command to try it out:

## single target
# short video; 720p
python inference_matanyone.py -i inputs/video/test-sample1.mp4 -m inputs/mask/test-sample1.png
# short video; 1080p
python inference_matanyone.py -i inputs/video/test-sample2.mp4 -m inputs/mask/test-sample2.png
# long video; 1080p
python inference_matanyone.py -i inputs/video/test-sample3.mp4 -m inputs/mask/test-sample3.png

## multiple targets (control by mask)
# obtain matte for target 1
python inference_matanyone.py -i inputs/video/test-sample0 -m inputs/mask/test-sample0_1.png --suffix target1
# obtain matte for target 2
python inference_matanyone.py -i inputs/video/test-sample0 -m inputs/mask/test-sample0_2.png --suffix target2

The results will be saved in the results folder, including the foreground output video and the alpha output video.

  • If you want to save the results as per-frame images, you can set --save_image.
  • If you want to set a limit for the maximum input resolution, you can set --max_size, and the video will be downsampled if min(w, h) exceeds. By default, we don't set the limit.

๐ŸŽช Interactive Demo

To get rid of the preparation for first-frame segmentation mask, we prepare a gradio demo on hugging face and could also launch locally. Just drop your video/image, assign the target masks with a few clicks, and get the the matting results!

cd hugging_face

# install python dependencies
pip3 install -r requirements.txt # FFmpeg required

# launch the demo
python app.py

By launching, an interactive interface will appear as follow:

overall_teaser

๐Ÿ–ฅ๏ธ Enhanced GUI Application

We've developed a comprehensive desktop GUI application that provides advanced video processing capabilities with revolutionary Smart Chunking Technology and an intuitive interface for professional video matting workflows.

Running the GUI

Launch the MatAnyone GUI with:

python matanyone_gui.py

You can also provide initial paths as command-line arguments:

python matanyone_gui.py --input INPUT_VIDEO --mask MASK_PATH --output OUTPUT_DIRECTORY

MatAnyone GUI

๐Ÿง  Smart Chunking System

Revolutionary Content-Aware Processing

The MatAnyone GUI features an groundbreaking Smart Chunking System that analyzes your video content to intelligently place processing chunks exactly where they're needed, rather than using traditional uniform grid patterns.

Smart Chunking Demo

๐Ÿ“š For developers: See the Smart Chunking Developer Guide for detailed documentation on using this system in your own projects.

How Smart Chunking Works

  1. Heat Map Analysis: The system analyzes the entire mask sequence to create a "heat map" showing where activity occurs throughout the video
  2. Face Detection Integration: Facial regions receive priority weighting (3x boost) to ensure optimal quality for human subjects
  3. Intelligent Placement: Chunks are positioned to center important content rather than just covering areas with activity
  4. Dynamic Orientation: Chunks can be oriented horizontally or vertically based on what works best for the content layout
  5. Complete Coverage: Advanced algorithms ensure every active area is covered while eliminating redundant processing

Smart vs Uniform Chunking

Feature Smart Chunking Traditional Uniform
Content Analysis โœ… Analyzes entire video โŒ Blind grid placement
Face Prioritization โœ… 3x priority weighting โŒ No content awareness
Adaptive Placement โœ… Centers on important content โŒ Fixed grid positions
Processing Efficiency โœ… Only processes active areas โŒ Processes entire grid
Quality Optimization โœ… Key content optimally framed โŒ Content may be at chunk edges

Advanced Reassembly Technology

The reassembly process has been completely redesigned for Smart Chunking:

  1. Black Canvas Approach: Creates a clean output canvas and places only the processed chunks
  2. Intelligent Blending: Uses weighted blending in overlap regions with distance-based falloff
  3. Boundary Optimization: Ensures seamless transitions between chunks with adaptive edge feathering
  4. Quality Preservation: Maintains full resolution and quality in all processed regions

๐ŸŽฎ GUI Features

Simplified Interface Design

The interface has been streamlined for optimal user experience:

Processing Options

  • Basic Controls: Warmup frames, resolution limits, and high-quality video encoding options
  • Mask Controls: Erosion and dilation radius with real-time preview
  • Advanced Controls:
    • Chunking Mode: Choose between Smart Chunking (recommended) or Uniform Chunking
    • Edge Feathering: Fine-tune chunk boundary blending
    • Blend Method: Select optimal blending algorithm for your content

Enhanced Processing Settings

  • Low-res Scale: Control preview resolution for faster analysis (1/8 to 3/4 scale)
  • Low-res Blend Method: Independent blending settings for analysis vs final output
  • Minimum Activity (%): Set threshold for content detection sensitivity
  • Parallel Processing: Multi-core processing support (disable if experiencing stability issues)

Built-in Mask Generator

The application includes a powerful SAM-based mask generator with an intuitive interface:

Mask Generator

Key Features:

  • Any Frame Processing: Generate masks on any frame in your video - no need to have your subject in frame 0
  • Point-and-Click Interface: Simply click to select foreground/background regions
  • Box Selection Mode: Draw rectangles around target objects
  • Real-time Preview: See mask generation results instantly
  • Multiple Selection Methods: Combine points and boxes for precise control
  • Keyframe Metadata: Automatically stores frame information for optimal processing from any starting point

Advanced Mask Editor

Mask Editor

Professional Editing Tools:

  • Brush Tools: Paint and erase with adjustable brush sizes
  • Precision Controls: Fine-tune mask boundaries with pixel-level accuracy
  • Layer Management: Work with multiple mask layers
  • Undo/Redo Support: Non-destructive editing workflow

High-Quality Video Output

Professional Encoding Options:

  • Codec Selection: H.264 (compatibility), H.265 (efficiency), VP9 (modern), or Auto
  • Quality Presets: From fast/small to lossless quality
  • Custom Bitrate: Precise control from 1-50 Mbps
  • Frame Export: Save individual frames for further editing

๐Ÿš€ Performance Optimizations

Memory Management

  • Intelligent Chunking: Automatically prevents out-of-memory errors
  • Adaptive Processing: Scales processing based on available system resources
  • Parallel Processing: Multi-threaded chunk processing for faster results

Processing Speed

  • Smart Analysis: Only processes regions with detected activity
  • Efficient Algorithms: Optimized heat map generation and chunk placement
  • Cached Results: Reuses analysis data when possible

๐Ÿ“Š Evaluation Benchmark

We provide a synthetic benchmark YouTubeMatte to enlarge the commonly-used VideoMatte240K-Test. A comparison between them is summarized in the table below.

Dataset #Foregrounds Source Harmonized
VideoMatte240K-Test 5 Purchased Footage โŒ
YouTubeMatte 32 YouTube Videos โœ…

It is noteworthy that we applied harmonization (using Harmonizer) when compositing the foreground on a background. Such an operation effectively makes YouTubeMatte a more challenging benchmark that is closer to the real distribution. As shown in the figure below, while RVM is confused by the harmonized frame, our method still yields robust performance.

harmonization

๐Ÿ”ฌ Technical Details

Smart Chunking Algorithm

The Smart Chunking system employs several advanced techniques:

  1. Heat Map Generation:

    # Analyzes mask sequence to create activity heat map
    heat_map = analyzer.analyze_mask_sequence(mask_dir, original_frames_dir)
  2. Face Detection Integration:

    # Applies 3x priority weighting to facial regions
    face_regions = detect_faces_in_frame(frame, face_cascade)
    heat_map[face_regions] *= face_priority_weight
  3. Intelligent Placement:

    # Centers chunks on important content rather than just activity
    score = calculate_centered_score(chunk_region, position, dimensions)
  4. Adaptive Reassembly:

    # Uses black canvas approach with intelligent blending
    reassemble_arbitrary_chunks(chunk_outputs, canvas_size, blend_method)

Performance Characteristics

  • Analysis Time: 2-5 seconds for heat map generation (one-time cost)
  • Memory Usage: Significantly reduced compared to uniform chunking
  • Quality Improvement: Up to 40% better results for content with faces
  • Processing Speed: 15-25% faster due to reduced redundant processing

Keyframe Metadata System

MatAnyone includes an advanced keyframe metadata system that allows processing from any frame:

  • Any Frame Start: Generate masks on any frame where your subject is clearly visible
  • Intelligent Processing: System automatically processes forward and backward from the keyframe
  • Perfect Alignment: Ensures frame sequence integrity regardless of starting point
  • Metadata Storage: Keyframe information is embedded in the mask file for seamless processing

This eliminates the traditional limitation of needing subjects in frame 0, making the workflow much more flexible for real-world videos.

๐Ÿ“‘ Citation

If you find our repo useful for your research, please consider citing our paper:

 @inProceedings{yang2025matanyone,
     title     = {{MatAnyone}: Stable Video Matting with Consistent Memory Propagation},
     author    = {Yang, Peiqing and Zhou, Shangchen and Zhao, Jixin and Tao, Qingyi and Loy, Chen Change},
     booktitle = {CVPR},
     year      = {2025}
     }

๐Ÿ“ License

This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.

GUI Components License

This repository contains an independent GUI frontend that uses MatAnyone as a backend inference engine. The GUI and all its components were developed independently and are not affiliated with the original MatAnyone project.

The GUI frontend components are designed to work independently of the model, which only handles inference. All GUI elements, including:

  • The graphical user interface (matanyone_gui.py)
  • Mask generation and editing tools
  • Video processing utilities
  • Chunking and optimization systems

can be used freely in any way you see fit, including commercial applications. These components are modular and can work with any compatible video matting model that provides similar inference capabilities.

For detailed documentation on using the Smart Chunking System independently, see SMART_CHUNKING_GUIDE.md and the examples directory.

๐Ÿ‘ Acknowledgement

This project is built upon Cutie, with the interactive demo adapted from ProPainter, leveraging segmentation capabilities from Segment Anything Model and Segment Anything Model 2. Thanks for their awesome works!


This study is supported under the RIE2020 Industry Alignment Fund โ€“ Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

๐Ÿ“ง Contact

If you have any questions, please feel free to reach us at peiqingyang99@outlook.com.

About

[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Tcl 0.8%
0