8000 GitHub - wangzy22/UniPre3D: [CVPR 2025] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[CVPR 2025] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

License

Notifications You must be signed in to change notification settings

wangzy22/UniPre3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

[CVPR 2025] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

Created by Ziyi Wang*, Yanran Zhang*, Jie Zhou, Jiwen Lu (* indicates equal contribution)

This repository is an official implementation of UniPre3D (CVPR 2025).

Paper | arXiv | Project Page

UniPre3D is the first unified pre-training method for 3D point clouds that effectively handles both object- and scene-level data through cross-modal Gaussian splatting.

Our proposed pre-training task involves predicting Gaussian parameters from the input point cloud. The 3D backbone network is expected to extract representative features, and 3D Gaussian splatting is implemented to render images for direct supervision. To incorporate additional texture information and adjust task complexity, we introduce a pre-trained image model and propose a scale-adaptive fusion block to accommodate varying data scales.

News πŸ”₯

  • [2025-06-12] Our arXiv paper is released.
  • [2025-06-11] Our pretraining code is released.
  • [2025-02-27] Our paper is accepted by CVPR 2025.

TODO (In Progress) ⭐

  • Release datasets
  • Release object-level pretraining code.
  • Release object-level logs and checkpoints.
  • Add more details about diverse downstream tasks.
  • Release scene-level pretraining code.
  • Release scene-level logs and checkpoints.

Visualization Results πŸ“·

Below is visualization of UniPre3D pre-training outputs. The first row presents the input point clouds, followed by the reference view images in the second row. The third row displays the rendered images, which are supervised by the ground truth images shown in the fourth row. In the rightmost column, we illustrate a schematic diagram of the view selection principle for both object- and scene-level samples.

Getting Started πŸš€

Table of Contents πŸ“–

  1. Environment Setup πŸ”§
  2. Object-level Pretraining πŸͺ‘
  3. Scene-level Pretraining 🏠
  4. Acknowledgements πŸ™
  5. Citation πŸ“š

Environment Setup πŸ”§

Recommended Environment

  • Python 3.11
  • PyTorch 2.2
  • CUDA 12.0 or higher
  • Linux or Windows operating system

Please follow docs/INSTALLATION.md for detailed installation instructions.

Hardware Requirements

  • CUDA-capable GPU with compute capability 6.0 or higher
  • Minimum 8GB GPU memory (16GB+ recommended for large-scale experiments)
  • 16GB+ RAM

Data Preparation

Please follow docs/DATA_PREPARATION.md for detailed data preparation instructions.

Object-level Pre-training πŸͺ‘

Object-level pre-training is a technique where we train a 3D model on a large collection of individual 3D objects before fine-tuning it for specific downstream tasks. This approach helps the model learn fundamental geometric patterns and structural representations that can be transferred to various 3D understanding tasks.

Key Characteristics:

  • Focuses on learning from individual objects (e.g., chairs, airplanes, cars)
  • Captures fine-grained local geometric structures
  • Enables knowledge transfer to tasks like object classification and part segmentation

Usage

PointMLP pretraining:

CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name pointmlp_pretraining

Standard Transformer pretraining:

CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name transformer_pretraining

Mamba3D pretraining:

CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name mamba3d_pretraining

Point Cloud Mamba pretraining:

CUDA_VISIBLE_DEVICES=<GPUs> python train_network.py --config-name pcm_pretraining

We cache dataset images in memory to accelerate data loading. If you encounter memory constraints: Disable this feature by setting opt.record_img to false in configs/settings.yaml

Finetune on Object-level Downstream Tasks 🎯

We evaluate the effectiveness of UniPre3D on various object-level downstream tasks, including:

  • Object Classification
  • Part Segmentation
  • Object Detection

Model Zoo (Pretrained Checkpoints)

We provide pretrained models and checkpoints for object-level tasks in the following table:

Model Pretrained Checkpoint Downstream Task Performance Finetuning Logs
Standard Transformer Baidu Disk
Google Drive
Classification 87.93% Acc
(+10.69%)
Logs
PointMLP Baidu Disk
Google Drive
Classification 89.5% Acc
(+2.1%)
Logs
Point Cloud Mamba Baidu Disk
Google Drive
Classification 89.0% Acc
(+0.9%)
Logs
Mamba3D Baidu Disk
Google Drive
Classification 93.4% Acc
(+0.8%)
Logs
PointMLP Baidu Disk
Google Drive
Part Segmentation 85.5% $\text{mIoU}_C$
(+0.9%)
Logs

For more details on the usage of downstream tasks, please refer to the docs/OBJECT_LEVEL_DOWNSTREAM_TASKS.md file.

Scene-level Pretraining 🏠

Scene-level pretraining focuses on learning representations from complex 3D environments containing multiple objects and spatial relationships. This approach helps models understand large-scale geometric structures and spatial contexts that are crucial for scene understanding tasks.

Key Characteristics:

  • Processes complete indoor/outdoor scenes rather than individual objects
  • Captures long-range spatial relationships and contextual information
  • Optimized for tasks like semantic segmentation and instance segmentation

Usage

Coming soon...

Finetune on Scene-level Downstream Tasks 🎯

We evaluate the effectiveness of UniPre3D on various object-level downstream tasks, including:

  • Semantic Segmentation
  • Instance Segmentation
  • 3D Object Detection

Model Zoo (Pretrained Checkpoints)

Coming soon...

Acknowledgements πŸ™

We would like to express our gratitude to

For any questions about data preparation, please feel free to open an issue in our repository or send email to 1302821779@qq.com

Citation πŸ“š

If you find this work useful in your research, please consider citing:

@inproceedings{wang2025unipre3d,
  title={UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting},
  author={Wang, Ziyi and Zhang, Yanran and Zhou, Jie and Lu, Jiwen},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={1319--1329},
  year={2025}
}

About

[CVPR 2025] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
40CA
0