A simple web application for removing backgrounds from images using AI.
- Docker
- Docker Compose
- NVIDIA GPU + NVIDIA Container Toolkit (for GPU version)
- Clone the repository:
git clone https://github.com/WpythonW/rmbg2.00-inerface.git
cd https://github.com/WpythonW/rmbg2.00-inerface.git
- Create a directory for models:
mkdir -p models
# Production mode
docker compose -f docker-compose.cpu.yml up -d
# Development mode
docker compose -f docker-compose.cpu.yml up -d --build
# Make sure NVIDIA Container Toolkit is installed
nvidia-smi
# Production mode
docker compose -f docker-compose.gpu.yml up -d
# Development mode
docker compose -f docker-compose.gpu.yml up -d --build
For development, you can use direct launch with code mounting:
# CPU Version
docker compose -f docker-compose.cpu.yml up -d --build
docker compose -f docker-compose.cpu.yml exec rmbg-cpu bash
# GPU Version
docker compose -f docker-compose.gpu.yml up -d --build
docker compose -f docker-compose.gpu.yml exec rmbg-gpu bash
After launch, the application will be available at:
http://localhost:8501
# CPU Version
docker compose -f docker-compose.cpu.yml down
# GPU Version
docker compose -f docker-compose.gpu.yml down
.
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── rmbg.py # Main application code
├── docker-compose.cpu.yml # Docker Compose for CPU version
├── docker-compose.gpu.yml # Docker Compose for GPU version
├── Dockerfile.cpu # Dockerfile for CPU version
├── Dockerfile.gpu # Dockerfile for GPU version
└── models/ # Directory for model cache
-
The
models/
directory is used for caching Hugging Face models. It is mounted in the container to preserve models between restarts. -
In development mode, you can modify the code in
rmbg.py
- changes will be reflected in the container thanks to volume mounting. -
The GPU version requires installed NVIDIA Container Toolkit and compatible GPU.
- If you experience permission issues with the
models/
directory:
sudo chown -R 1000:1000 models/
- To check GPU in container:
docker compose -f docker-compose.gpu.yml exec rmbg-gpu nvidia-smi
- Checking logs:
# CPU Version
docker compose -f docker-compose.cpu.yml logs -f
# GPU Version
docker compose -f docker-compose.gpu.yml logs -f
RMBG-1.4 is based on the IS-Net architecture, enhanced with BRIA's unique training scheme and proprietary dataset. These enhancements significantly improve the model's accuracy and effectiveness across diverse image-processing scenarios.
RMBG-2.0 utilizes the BiRefNet (Bilateral Reference Network) architecture, which includes localization and restoration modules for precise foreground-background separation. This innovative architecture, combined with a carefully curated dataset, ensures high accuracy and efficiency in background removal tasks.
Model | F-measure↑ | MAE↓ | S-measure↑ | E-measure↑ | HCE↓ |
---|---|---|---|---|---|
IS-Net | 0.761 | 0.083 | 0.791 | 0.835 | 1333 |
BiRefNet | 0.799 | 0.070 | 0.819 | 0.858 | 1016 |
Improvement | +5.0% | -15.7% | +3.5% | +2.8% | -23.8% |
Model | S-measure↑ | F-measure↑ | E-measure↑ | MAE↓ |
---|---|---|---|---|
IS-Net | 0.935 | 0.937 | 0.946 | 0.020 |
BiRefNet | 0.957 | 0.958 | 0.972 | 0.014 |
Improvement | +2.4% | +2.2% | +2.7% | -30.0% |
Model | S-measure↑ | F-measure↑ | E-measure↑ | MAE↓ |
---|---|---|---|---|
IS-Net | 0.871 | 0.806 | 0.935 | 0.023 |
BiRefNet | 0.913 | 0.874 | 0.960 | 0.014 |
Improvement | +4.8% | +8.4% | +2.7% | -39.1% |
-
Bilateral Reference Framework
- Inward reference: Maintains original high-res image details
- Outward reference: Uses gradient maps to enhance focus on fine details
- Significant improvement in boundary precision and detail preservation
-
Architecture Enhancements
- Separate localization and reconstruction modules
- Enhanced high-resolution feature processing
- More effective feature fusion strategies
-
Training Optimizations
- Multi-stage supervision for accelerated convergence
- Regional loss fine-tuning for better detail preservation
- Context feature fusion improvements
-
Overall Improvements
- Consistent performance gains across all benchmarks
- Most significant improvements in MAE (15-39% reduction)
- Notable HCE reduction by 23.8% on DIS5K
-
Task-Specific Strengths
- DIS5K: Major improvement in fine detail handling (HCE↓)
- HRSOD: Better high-resolution feature preservation
- COD: Significant boost in camouflaged object detection accuracy
-
Practical Impact
- Better handling of complex structures
- Improved edge preci ED48 sion
- More robust across varied object types
- Reduced need for manual corrections
- Single-stream architecture with intermediate supervision
- Focus on feature synchronization at different levels
- Relies heavily on dense supervision strategy
- Dual-stream architecture with explicit task decomposition
- Bilateral reference mechanism for feature enhancement
- More sophisticated feature reconstruction approach
IS-Net:
- Traditional encoder-decoder backbone
- GT encoder for intermediate feature supervision
- Single pathway for feature processing
- Limited ability to handle high-resolution details
BiRefNet:
- Separate localization and reconstruction modules
- Transformer-based encoder for better global context
- Multiple pathways for feature processing
- Enhanced high-resolution feature handling
IS-Net:
- Direct feature synchronization
- Single-scale feature processing
- Limited context aggregation
BiRefNet:
- Bilateral reference mechanism
- Inward reference: Original resolution details
- Outward reference: Gradient-aware feature enhancement
- Multi-scale feature reconstruction
- Advanced context feature fusion
IS-Net:
- Dense supervision on intermediate outputs
- Feature-level and mask-level guidance
- Single-stage training process
BiRefNet:
- Multi-stage hierarchical supervision
- Gradient-aware feature guidance
- Regional loss fine-tuning
- Progressive refinement strategy
- BiRef Block Design
- Maintains original image resolution through adaptive cropping
- Integrates gradient information for detail enhancement
- Combines local and global feature contexts
- Reconstruction Module
- Deformable convolutions with hierarchical receptive fields
- Better handling of varying object scales
- Enhanced feature aggregation capabilities
- Localization Module
- Dedicated module for object positioning
- Better semantic understanding
- Improved global context modeling
- IS-Net: Limited by memory constraints for high-res images
- BiRefNet: Better memory efficiency and high-res processing
- IS-Net: Struggles with fine details at higher resolutions
- BiRefNet: Maintains detail fidelity through bilateral reference
- IS-Net: Limited global context integration
- BiRefNet: Enhanced context modeling through separate modules
-
Inward Reference
- Maintains original resolution through adaptive patch cropping
- Preserves full image details at each decoder stage
- Eliminates information loss from traditional downsampling
-
Outward Reference
- Introduces gradient-aware feature enhancement
- Guides model attention to detail-rich areas
- Improves boundary precision
- 23.8% reduction in Human Correction Efforts (HCE)
- 15.7% improvement in Mean Absolute Error (MAE)
- Significant enhancement in fine structure preservation
- Better handling of complex object boundaries
-
Localization Module (LM)
- Dedicated to object positioning
- Enhanced semantic understanding
- Global context integration through transformer blocks
-
Reconstruction Module (RM)
- Specialized in detail reconstruction
- Hierarchical feature processing
- Multi-scale context fusion
- Improved accuracy across different object scales
- Better handling of camouflaged objects (+4.8% S-measure on COD)
- Enhanced performance on high-resolution images (+2.4% S-measure on HRSOD)
-
Deformable Convolutions
- Adaptive receptive field
- Better feature alignment
- Enhanced spatial adaptation
-
Context Feature Fusion
- Multi-scale feature integration
- Improved semantic understanding
- Better global context modeling
- Better handling of complex shapes
- Improved performance on thin structures
- Enhanced ability to capture long-range dependencies
- +2.4% S-measure on HRSOD
- Better preservation of fine details
- Improved boundary accuracy
- +8.4% F-measure on COD
- Better object-background separation
- Improved handling of subtle contrasts
- +5.0% F-measure on DIS5K
- Better handling of intricate patterns
- Improved segmentation of thin structures
- Cleaner object boundaries
- Better preservation of fine details
- More precise segmentation masks
- Reduced need for manual corrections
- More reliable automated workflows
- Better handling of diverse object types
- Improved reliability for medical imaging
- Better accuracy for industrial inspection
- Enhanced performance in scientific applications
Model | Total Size | Component Breakdown |
---|---|---|
IS-Net | 176.6 MB | - Main Net: 148.9 MB - GT Encoder: 27.7 MB |
BiRefNet | 885 MB | - Localization Module - Reconstruction Module - BiRef Blocks |
Model | Time (s) | GPU |
---|---|---|
IS-Net | 1.3 | GTX 1070Ti |
BiRefNet | 5.4 | GTX 1070Ti |
-
Memory Optimization
- Adaptive patch cropping
- Efficient feature reuse
- Gradient checkpointing support
-
Speed Optimization
- Compiled version available (13% faster)
- Parallel processing of references
- Efficient feature pyramid handling
-
Training Optimization
- Multi-stage supervision reduces required epochs by 70%
- Better gradient flow
- More efficient loss computation
Aspect | IS-Net | BiRefNet |
---|---|---|
Required GPU VRAM | 4.6 GB | 7.7 GB |
-
Batch Processing
- IS-Net: Better for batch processing
- BiRefNet: Better for single image quality
-
Resolution Scaling
- IS-Net: Limited to 1024×1024
- BiRefNet: Supports higher resolutions with adaptive cropping
-
Memory vs Quality
- BiRefNet requires ~20% more memory for ~25% quality improvement
-
Speed vs Accuracy
- BiRefNet is 4x slower but provides significantly better results
- Lightweight variants offer better speed-quality balance
-
Overall Accuracy
- +5.0% F-measure on DIS5K
- +2.4% S-measure on HRSOD
- +8.4% F-measure on COD
-
Error Reduction
- -15.7% MAE on DIS5K
- -30.0% MAE on HRSOD
- -39.1% MAE on COD
-
Quality Metrics
- -23.8% Human Correction Efforts
- Significant improvement in boundary precision
- Better handling of complex structures
- Maintains fine structures like hair and thin objects
- Better edge definition and boundary precision
- Improved handling of transparent and translucent objects
-
Complex Objects
- Better handling of intricate patterns
- Improved segmentation of irregular shapes
- Superior performance on mesh-like structures
-
Challenging Scenarios
- Better results with camouflaged objects
- Improved handling of low-contrast areas
- Better performance with cluttered backgrounds
- Cleaner masks for photo editing
- More precise background removal
- Better preservation of important details
- Higher precision for quality control
- Better reliability for automated inspection
- Improved accuracy for measurement applications
- Better results for video editing
- Improved performance for AR/VR applications
- More accurate 3D modeling support
- Better handling of high-resolution images
- More precise boundary detection
- Reduced artifacts in complex areas
- More consistent performance across different scenarios
- Better handling of edge cases
- Improved stability with varying input qualities
- Reduced need for manual corrections
- Better results with default settings
- More reliable automated processing
-
Professional Photo Editing
- Better hair and fur segmentation
- Improved preservation of fine details
- More precise edge detection
-
Batch Processing
- More reliable automated results
- Fewer manual corrections needed
- Better consistency across images
-
Medical Imaging
- Better precision for diagnostic applications
- Improved detail preservation
- More reliable segmentation results
-
Industrial Inspection
- Higher accuracy for quality control
- Better detection of defects
- More reliable measurements
- Significantly improved quality
- Better handling of complex cases
- Reduced need for manual corrections
- More reliable automated processing
- Increased computational requirements
- Longer processing time
- Higher memory usage
- More complex deployment requirements