8000 GitHub - Yonsei-STL/pix2next: Official implementation of "Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation"
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Yonsei-STL/pix2next

Repository files navigation

Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation

Pix2Next is a novel image-to-image (I2I) translation framework that generates Near-Infrared (NIR) images from RGB inputs. By integrating Vision Foundation Models (VFM) as a feature extractor and applying cross-attention within an encoder-decoder architecture, Pix2Next delivers high-quality, high-fidelity NIR image synthesis.

Pix2Next Teaser


Paper


Quick start

Installation

First of all install torch with the appropriate CUDA version.

cd pix2next/common/ops_dcn3/
python setup.py build install

Datasets

Dataset download IDD-AW Ranus

Data structure

Pix2Next
├── datasets
│   ├── ranus
│   │   ├── train_A (source: RGB)
│   │   ├── train_B (target: NIR)
│   │   ├── test_A  (source: RGB)
│   │   └── test_B  (target: NIR)
│   └── idd_aw
│       ...
│       
└── ...

Training

cd ~/pix2next/UNET/trainer/
python train.py

Testing

cd ~/pix2next/UNET/tester/
python test_unet.py

Evaluation

copy from test image folder, paste 새 evaluation folder

cd ~/pix2next/UNET/evaluation/
python eval_all.py

Performance

Ranus

Ranus_qual

weight_file: download

IDDAW

IDDAW_qual

weight_file: download


visualization

Ranus

Ranus

IDDAW

IDDAW

BDD100K zeroshot translation

bdd100k

References

RANUS: RGB and NIR Urban Scene Dataset for Deep Scene Parsing

IDD-AW: A Benchmark for Safe and Robust Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{technologies13040154,
AUTHOR = {Jin, Youngwan and Park, Incheol and Song, Hanbin and Ju, Hyeongjin and Nalcakan, Yagiz and Kim, Shiho},
TITLE = {Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation},
JOURNAL = {Technologies},
VOLUME = {13},
YEAR = {2025},
NUMBER = {4},
ARTICLE-NUMBER = {154},
URL = {https://www.mdpi.com/2227-7080/13/4/154},
ISSN = {2227-7080},
DOI = {10.3390/technologies13040154}
}

About

Official implementation of "Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
0