PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation (ACM MM 2024)

This is the official implementation of our paper PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation).

Authors: Zhiwei Hao, Zhongyu Xiao, Yong Luo, Jianyuan Guo, Jing Wang, Li Shen, Han Hu

Introduction

We present a KD-based approach to guide multimodal fusion, with a specific focus on the primary modality. Unlike existing methods that often treat modalities equally without considering their varying levels of content, our findings and proposed method offer insights into effective multimodal processing.

Framework

Data

NYU Depth V2 Datasets

You could download the official NYU Depth V2 data here. After downloading the official data, you should modify them according to the structure of directories we provide.

SUN-RGBD Datasets

You can download the dataset from the official SUNRGBD website and preprocess it according to the requirements specified on the website.

For RGB-Depth semantic segmentation, the generation of HHA maps from Depth maps can refer to https://github.com/charlesCXK/Depth2HHA-python.

Installation

Clone this repo.

$ git clone https://github.com/xiaoshideta/PrimKD.git
$ cd primkd-main

Install all dependencies.

$ conda create -name primkd python=3.8.11
$ pip install -r requirements.txt
$ conda activate primkd

Directory Tree

Your directory tree should look like this:

|-- <config>
|-- <dataloder>
|-- <pretrained>
    |-- <pre>
    |-- <segformer>
|-- <datasets>
    |-- <NYUDepthv2>
        |-- <RGBFolder>
        |-- <HHAFolder>
        |-- <LabelFolder>
        |-- train.txt
        |-- test.txt

Training and Inference

Pretrain weights:

Download the pretrained segformer here.

Download the pretrained teacher model here.

Training

$ bash train.sh

$ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train.py --port=29516 --distillation_alpha=1.0 --distillation_beta=0.1 --distillation_flag=1 --lambda_mask=0.75 --select="max" --mask_single="hint"

Inference

$ bash val.sh

CUDA_VISIBLE_DEVICES="0" python val.py -d="0" -e="your checkpoint path" --save_path="your save path"

Result

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{hao2024primkd,
  title={PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation},
  author={Hao, Zhiwei and Xiao, Zhongyu and Luo, Yong and Guo, Jianyuan and Wang, Jing and Shen, Li and Hu, Han},
  booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
  pages={1943--1951},
  year={2024}
}

Acknowledgement

Part of our code is based on CMX, thanks for their excellent work!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Image		Image
config		config
dataloader		dataloader
engine		engine
models		models
utils		utils
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py
train.sh		train.sh
val.py		val.py
val.sh		val.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation (ACM MM 2024)

Introduction

Framework

Data

NYU Depth V2 Datasets

SUN-RGBD Datasets

Installation

Directory Tree

Training and Inference

Pretrain weights:

Training

Inference

Result

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

xiaoshideta/PrimKD

Folders and files

Latest commit

History

Repository files navigation

PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation (ACM MM 2024)

Introduction

Framework

Data

NYU Depth V2 Datasets

SUN-RGBD Datasets

Installation

Directory Tree

Training and Inference

Pretrain weights:

Training

Inference

Result

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages