[CVPR 2025] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
Created by Ziyi Wang*, Yanran Zhang*, Jie Zhou, Jiwen Lu (* indicates equal contribution)
This repository is an official implementation of UniPre3D (CVPR 2025).
Paper | arXiv | Project Page
UniPre3D is the first unified pre-training method for 3D point clouds that effectively handles both object- and scene-level data through cross-modal Gaussian splatting.
Our proposed pre-training task involves predicting Gaussian parameters from the input point cloud. The 3D backbone network is expected to extract representative features, and 3D Gaussian splatting is implemented to render images for direct supervision. To incorporate additional texture information and adjust task complexity, we introduce a pre-trained image model and propose a scale-adaptive fusion block to accommodate varying data scales.
- [2025-06-12] Our arXiv paper is released.
- [2025-06-11] Our pretraining code is released.
- [2025-02-27] Our paper is accepted by CVPR 2025.
- Release datasets
- Release object-level pretraining code.
- Release object-level logs and checkpoints.
- Add more details about diverse downstream tasks.
- Release scene-level pretraining code.
- Release scene-level logs and checkpoints.
Below is visualization of UniPre3D pre-training outputs. The first row presents the input point clouds, followed by the reference view images in the second row. The third row displays the rendered images, which are supervised by the ground truth images shown in the fourth row. In the rightmost column, we illustrate a schematic diagram of the view selection principle for both object- and scene-level samples.
- Environment Setup π§
- Object-level Pretraining πͺ
- Scene-level Pretraining π
- Acknowledgements π
- Citation π
- Python 3.11
- PyTorch 2.2
- CUDA 12.0 or higher
- Linux or Windows operating system
Please follow docs/INSTALLATION.md for detailed installation instructions.
- CUDA-capable GPU with compute capability 6.0 or higher
- Minimum 8GB GPU memory (16GB+ recommended for large-scale experiments)
- 16GB+ RAM
Please follow docs/DATA_PREPARATION.md for detailed data preparation instructions.
Object-level pre-training is a technique where we train a 3D model on a large collection of individual 3D objects before fine-tuning it for specific downstream tasks. This approach helps the model learn fundamental geometric patterns and structural representations that can be transferred to various 3D understanding tasks.
Key Characteristics:
- Focuses on learning from individual objects (e.g., chairs, airplanes, cars)
- Captures fine-grained local geometric structures
- Enables knowledge transfer to tasks like object classification and part segmentation
PointMLP pretraining:
CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name pointmlp_pretraining
Standard Transformer pretraining:
CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name transformer_pretraining
Mamba3D pretraining:
CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name mamba3d_pretraining
Point Cloud Mamba pretraining:
CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name pcm_pretraining
We cache dataset images in memory to accelerate data loading. If you encounter memory constraints: Disable this feature by setting
opt.record_img
tofalse
inconfigs/settings.yaml
We evaluate the effectiveness of UniPre3D on various object-level downstream tasks, including:
- Object Classification
- Part Segmentation
- Object Detection
We provide pretrained models and checkpoints for object-level tasks in the following table:
Model | Pretrained Checkpoint | Downstream Task | Performance | Finetuning Logs |
---|---|---|---|---|
Standard Transformer |
Baidu Disk Google Drive |
Classification | 87.93% Acc (+10.69%) |
Logs |
PointMLP |
Baidu Disk Google Drive |
Classification | 89.5% Acc (+2.1%) |
Logs |
Point Cloud Mamba |
Baidu Disk Google Drive |
Classification | 89.0% Acc (+0.9%) |
Logs |
Mamba3D |
Baidu Disk Google Drive |
Classification | 93.4% Acc (+0.8%) |
Logs |
PointMLP |
Baidu Disk Google Drive |
Part Segmentation | 85.5% (+0.9%) |
Logs |
For more details on the usage of downstream tasks, please refer to the docs/OBJECT_LEVEL_DOWNSTREAM_TASKS.md file.
Scene-level pretraining focuses on learning representations from complex 3D environments containing multiple objects and spatial relationships. This approach helps models understand large-scale geometric structures and spatial contexts that are crucial for scene understanding tasks.
Key Characteristics:
- Processes complete indoor/outdoor scenes rather than individual objects
- Captures long-range spatial relationships and contextual information
- Optimized for tasks like semantic segmentation and instance segmentation
Coming soon...
We evaluate the effectiveness of UniPre3D on various object-level downstream tasks, including:
- Semantic Segmentation
- Instance Segmentation
- 3D Object Detection
Coming soon...
We would like to express our gratitude to
- Gaussian Splatting
- Openpoints
- Pointcept
- ShapenetRender_more_variation
- Splatter Image
- ShapeNet
- ScanNet
- PointCloudMamba
- Mamba3D
For any questions about data preparation, please feel free to open an issue in our repository or send email to 1302821779@qq.com
If you find this work useful in your research, please consider citing:
@inproceedings{wang2025unipre3d,
title={UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting},
author={Wang, Ziyi and Zhang, Yanran and Zhou, Jie and Lu, Jiwen},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={1319--1329},
year={2025}
}