Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation

Pix2Next is a novel image-to-image (I2I) translation framework that generates Near-Infrared (NIR) images from RGB inputs. By integrating Vision Foundation Models (VFM) as a feature extractor and applying cross-attention within an encoder-decoder architecture, Pix2Next delivers high-quality, high-fidelity NIR image synthesis.

Paper

Title: Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation
Authors: Youngwan Jin, Incheol Park, Hanbin Song, Hyeongjin Ju, Yagiz Nalcakan, Shiho Kim
Paper: [https://arxiv.org/abs/2409.16706]
Project page: [https://yonsei-stl.github.io/Pix2Next/]

Quick start

Installation

First of all install torch with the appropriate CUDA version.

cd pix2next/common/ops_dcn3/
python setup.py build install

Datasets

Dataset download IDD-AW Ranus

Data structure

Pix2Next
├── datasets
│   ├── ranus
│   │   ├── train_A (source: RGB)
│   │   ├── train_B (target: NIR)
│   │   ├── test_A  (source: RGB)
│   │   └── test_B  (target: NIR)
│   └── idd_aw
│       ...
│       
└── ...

Training

cd ~/pix2next/UNET/trainer/
python train.py

Testing

cd ~/pix2next/UNET/tester/
python test_unet.py

Evaluation

copy from test image folder, paste 새 evaluation folder

cd ~/pix2next/UNET/evaluation/
python eval_all.py

Performance

Ranus

weight_file: download

IDDAW

weight_file: download

visualization

Ranus

IDDAW

BDD100K zeroshot translation

References

RANUS: RGB and NIR Urban Scene Dataset for Deep Scene Parsing

IDD-AW: A Benchmark for Safe and Robust Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{technologies13040154,
AUTHOR = {Jin, Youngwan and Park, Incheol and Song, Hanbin and Ju, Hyeongjin and Nalcakan, Yagiz and Kim, Shiho},
TITLE = {Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation},
JOURNAL = {Technologies},
VOLUME = {13},
YEAR = {2025},
NUMBER = {4},
ARTICLE-NUMBER = {154},
URL = {https://www.mdpi.com/2227-7080/13/4/154},
ISSN = {2227-7080},
DOI = {10.3390/technologies13040154}
}

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
UNET		UNET
assets		assets
common		common
LICENSE		LICENSE
README.md		README.md
config_gan_base.yaml		config_gan_base.yaml
config_gan_base_internimage.yaml		config_gan_base_internimage.yaml
config_gan_base_vit.yaml		config_gan_base_vit.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation

Paper

Quick start

Installation

Datasets

Training

Testing

Evaluation

Performance

Ranus

IDDAW

visualization

Ranus

IDDAW

BDD100K zeroshot translation

References

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Yonsei-STL/pix2next

Folders and files

Latest commit

History

Repository files navigation

Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation

Paper

Quick start

Installation

Datasets

Training

Testing

Evaluation

Performance

Ranus

IDDAW

visualization

Ranus

IDDAW

BDD100K zeroshot translation

References

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages