Authors: Bowen Yin, Jiaolong Cao, Xuying Zhang, Yuming Chen, Ming-Ming Cheng, Qibin Hou*
This paper is still under review. Full code, complete ImageNeXt dataset, and model checkpoints will be publicly released upon acceptance.
This official repository of ' OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation'. This paper provides a large-scale multi-modal dataset (ImageNeXt) and a general multi-modal pretraining and finetuning framework. You can pretrain more powerful multi-modal encoders and contribute to the RGBX research.
Figure 1: Visualizations of our assembled ImageNeXt dataset. Built upon ImageNet [43], a widely used large-scale RGB classification dataset, ImageNeXt is composed of five popular visual modalities for each sample, including RGB, Depth, LiDAR, Thermal, and Event.