This is the official implementation of our paper PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation).
Authors: Zhiwei Hao, Zhongyu Xiao, Yong Luo, Jianyuan Guo, Jing Wang, Li Shen, Han Hu
We present a KD-based approach to guide multimodal fusion, with a specific focus on the primary modality. Unlike existing methods that often treat modalities equally without considering their varying levels of content, our findings and proposed method offer insights into effective multimodal processing.
You could download the official NYU Depth V2 data here. After downloading the official data, you should modify them according to the structure of directories we provide.
You can download the dataset from the official SUNRGBD website and preprocess it according to the requirements specified on the website.
For RGB-Depth semantic segmentation, the generation of HHA maps from Depth maps can refer to https://github.com/charlesCXK/Depth2HHA-python.
- Clone this repo.
$ git clone https://github.com/xiaoshideta/PrimKD.git
$ cd primkd-main
- Install all dependencies.
$ conda create -name primkd python=3.8.11
$ pip install -r requirements.txt
$ conda activate primkd
Your directory tree should look like this:
|-- <config>
|-- <dataloder>
|-- <pretrained>
|-- <pre>
|-- <segformer>
|-- <datasets>
|-- <NYUDepthv2>
|-- <RGBFolder>
|-- <HHAFolder>
|-- <LabelFolder>
|-- train.txt
|-- test.txt
Download the pretrained segformer here.
Download the pretrained teacher model here.
$ bash train.sh
$ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train.py --port=29516 --distillation_alpha=1.0 --distillation_beta=0.1 --distillation_flag=1 --lambda_mask=0.75 --select="max" --mask_single="hint"
$ bash val.sh
CUDA_VISIBLE_DEVICES="0" python val.py -d="0" -e="your checkpoint path" --save_path="your save path"
If you find this work useful for your research, please cite our paper:
@inproceedings{hao2024primkd,
title={PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation},
author={Hao, Zhiwei and Xiao, Zhongyu and Luo, Yong and Guo, Jianyuan and Wang, Jing and Shen, Li and Hu, Han},
booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
pages={1943--1951},
year={2024}
}
Part of our code is based on CMX, thanks for their excellent work!