Uni-to-Multi Modal Knowledge Distillation for Bidirectional LiDAR-Camera Semantic Segmentation (TPAMI2024)
Tianfang Sun1, Zhizhong Zhang1, Xin Tan1, Yong Peng3, Yanyun Qu2, Yuan Xie1
1ECNU, 2XMU, 3CSU
For easy installation, we recommend using conda:
conda create -n u2mkd python=3.9
conda activate u2mkd
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip3 install numba tensorboard
# to support nuscenes
pip3 install nuscenes-devkit
Our method is based on torchpack and torchsparse. To install torchpack, we recommend to firstly install openmpi and mpi4py.
conda install -c conda-forge mpi4py openmpi
Install torchpack
pip install git+https://github.com/zhijian-liu/torchpack.git
Before installing torchsparse, install Google Sparse Hash library first.
sudo apt install libsparsehash-dev
Then install torchsparse (v1.4.0) by
pip3 install --upgrade git+https://github.com/mit-han-lab/torchsparse.git@v1.4.0
to support SphereFormer, for more details, please refer to SphereFormer
pip install torch_scatter==2.1.2
pip install torch_geometric==1.7.2
pip install spconv-cu114==2.3.6
pip install torch_sparse==0.6.18 cumm-cu114==0.4.11 torch_cluster==1.6.3
pip install timm termcolor tensorboardX
# Install sptr
cd third_party/SparseTransformer && python setup.py install
Please download ImageNet pretrained weight for SwiftNet from Google Drive or BaiduDisk.
Please download the datasets following the official instruction. The official websites of each dataset are listed as following: nuScenes_lidarseg, SemanticKITTI, Waymo open. The color images of SemanticKITTI datasets can be downloaded from KITTI-odometry dataset.
# nuScenes_lidarseg
python3 prepare_nusc_inst_database.py
# SemanticKITTI
python3 prepare_semkitti_inst_database.py
# Waymo Open Set
python3 prepare_waymo_inst_database.py
- Run the following command to train uni-modal teacher model. (e.g. SphereFormer)
CUDA_VISIBLE_DEVICES=0,1,2,3 torchpack dist-run -np 4 python3 train_spformer.py configs/nuscenes/train/spformer.yaml --run-dir runs/nusc/spvcnn_spformer_cr2p0_multisweeps4_ep25_seed123
- Modify the
teacher_pretrain
inconfigs/nuscenes/train/spformer_tsd_full_ours_star.yaml
to the path of uni-modal teacher model trained in Step 1.
teacher_pretrain: /data2/stf/codes/lifusion/runs/nusc_rq/spvcnn_spformer_cr2p0_multisweeps4_ep25_seed123/checkpoints/max-iou-val-vox.pt
- Run the following command to train cross-modal student model.
CUDA_VISIBLE_DEVICES=0,1,2,3 torchpack dist-run -np 4 python3 train_lc_nusc_tsd_full.py configs/nuscenes/train/spformer_tsd_full_ours_star.yaml --run-dir runs/nusc/spformer_swiftnet18_cr2p0_tsd_multisweeps4_ep25_seed123
This repo is built upon torchpack, torchsparse, SphereFormer, SwiftNet.
If you find this repo useful, please consider citing our paper:
@ARTICLE{sun2024u2mkd,
author={Sun, Tianfang and Zhang, Zhizhong and Tan, Xin and Peng, Yong and Qu, Yanyun and Xie, Yuan},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Uni-to-Multi Modal Knowledge Distillation for Bidirectional LiDAR-Camera Semantic Segmentation},
year={2024},
volume={46},
number={12},
pages={11059-11072},
doi={10.1109/TPAMI.2024.3451658}}