Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
Yuqi Wu*, Wenzhao Zheng*
$\dagger$ , Jie Zhou, Jiwen Lu
* Equal contribution.
Point3R is an online framework for dense streaming 3D reconstruction using explicit spatial memory, which achieves competitive performance with low training costs.
- [2025/7/3] Training/finetuning/evaluation code release.
Given streaming image inputs, our method maintains an explicit spatial pointer memory in which each pointer is assigned a 3D position and points to a changing spatial feature. We conduct a pointer-image interaction to integrate new observations into the global coordinate system and update our spatial pointer memory accordingly. Our method achieves competitive or state-of-the-art performance across various tasks: dense 3D reconstruction, monocular and video depth estimation, and camera pose estimation.
Our code is based on the following environment.
git clone https://github.com/YkiWu/Point3R.git
cd Point3R
conda create -n point3r python=3.11 cmake=3.14.0
conda activate point3r
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt
conda install 'llvm-openmp<16'
Please follow CUT3R to prepare the training datasets. The official links of all used datasets are listed below.
- ARKitScenes
- BlendedMVS
- CO3Dv2
- Hypersim
- MegaDepth
- MVS-Synth
- OmniObject3D
- PointOdyssey
- ScanNet++
- ScanNet
- Spring
- Virtual KITTI 2
- WayMo Open dataset
- WildRGB-D
We provide the following commands for training from scratch.
Please download DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth
and place it on your own path.
cd src/
# stage 1, 224 version + 5-frame sequences
NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --num_processes=8 train.py --config-name 224_stage1
# stage 2, 512 version + 5-frame sequences
NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --num_processes=8 train.py --config-name 512_stage2
# stage 3, freeze the encoder and fine-tune other parts on 8-frame sequences
NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --num_processes=8 train.py --config-name long_stage3
If you want to fine-tune our checkpoint, you can use the following command.
Click HERE to download our checkpoint and place it on your own path.
You can modify the configuration file according to your own needs.
cd src/
# finetune
NCCL_DEBUG=TRACE TORCH_DISTRIBUTED_DEBUG=DETAIL HYDRA_FULL_ERROR=1 accelerate launch --num_processes=8 train.py --config-name finetune
Please follow MonST3R and Spann3R to prepare the evaluation datasets.
Our evaluation code follows MonST3R and CUT3R.
bash eval/mv_recon/run.sh
Results will be saved in eval_results/mv_recon/${model_name}_${ckpt_name}/logs_all.txt
.
bash eval/monodepth/run.sh
Results will be saved in eval_results/monodepth/${data}_${model_name}/metric.json
.
bash eval/video_depth/run.sh
Results will be saved in eval_results/video_depth/${data}_${model_name}/result_scale.json
.
bash eval/relpose/run.sh
Results will be saved in eval_results/relpose/${data}_${model_name}/_error_log.txt
.
Our code is based on the following awesome repositories:
Many thanks to these authors!
If you find this project helpful, please consider citing the following paper:
@article{point3r,
title={Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory},
author={Yuqi Wu and Wenzhao Zheng and Jie Zhou and Jiwen Lu},
journal={arXiv preprint arXiv:2507.02863},
year={2025}
}