Pix2Next is a novel image-to-image (I2I) translation framework that generates Near-Infrared (NIR) images from RGB inputs. By integrating Vision Foundation Models (VFM) as a feature extractor and applying cross-attention within an encoder-decoder architecture, Pix2Next delivers high-quality, high-fidelity NIR image synthesis.
- Title: Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation
- Authors: Youngwan Jin, Incheol Park, Hanbin Song, Hyeongjin Ju, Yagiz Nalcakan, Shiho Kim
- Paper: [https://arxiv.org/abs/2409.16706]
- Project page: [https://yonsei-stl.github.io/Pix2Next/]
First of all install torch with the appropriate CUDA version.
cd pix2next/common/ops_dcn3/
python setup.py build install
Data structure
Pix2Next
├── datasets
│ ├── ranus
│ │ ├── train_A (source: RGB)
│ │ ├── train_B (target: NIR)
│ │ ├── test_A (source: RGB)
│ │ └── test_B (target: NIR)
│ └── idd_aw
│ ...
│
└── ...
cd ~/pix2next/UNET/trainer/
python train.py
cd ~/pix2next/UNET/tester/
python test_unet.py
copy from test image folder, paste 새 evaluation folder
cd ~/pix2next/UNET/evaluation/
python eval_all.py
weight_file: download
weight_file: download
RANUS: RGB and NIR Urban Scene Dataset for Deep Scene Parsing
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{technologies13040154,
AUTHOR = {Jin, Youngwan and Park, Incheol and Song, Hanbin and Ju, Hyeongjin and Nalcakan, Yagiz and Kim, Shiho},
TITLE = {Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation},
JOURNAL = {Technologies},
VOLUME = {13},
YEAR = {2025},
NUMBER = {4},
ARTICLE-NUMBER = {154},
URL = {https://www.mdpi.com/2227-7080/13/4/154},
ISSN = {2227-7080},
DOI = {10.3390/technologies13040154}
}