Inspired by SpatialLM, SpaceLM is a 3D reconstruction system that can be used to reconstruct 3D models from 2D images and videos. We will improve the code and the pretrained model. Stay tuned for more details and improvements.
Β
- Release train dataset
- Release dataset preprocess code
- Release train code
- Release the pretrained model (Apr 10)
- Add Inference and Visualize Demo (Apr 15)
Β
We use one A100 GPUs to train the model.
Please follow the steps below to create the environment and install the dependencies.
git clone https://github.com/sengine-research/SpaceLM.git
cd SpaceLM
conda create -n spacelm python=3.10
conda activate spacelm
# CUDA version 12.4
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install xformers==v0.0.29.post2 --index-url https://download.pytorch.org/whl/cu124 # install xformers from table below
# CUDA version 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install xformers==v0.0.29.post2 --index-url https://download.pytorch.org/whl/cu118 # install xformers from table below
pip install -r requirements.txt
pip install git+https://github.com/mit-han-lab/torchsparse.git # may take a long time
# install spconv cuda 12.4
pip install spconv-cu124
# install spconv cuda 11.8
pip install spconv-cu118
install xformers from the table below
eg: pip install xformers==v0.0.29.post2 --index-url https://download.pytorch.org/whl/cu118
xformers | pytorch | CUDA |
---|---|---|
v0.0.29.post2 | torch==2.6.0 | cu118, cu124, cu126 |
0.0.29.post1, 0.0.29, 0.0.28.post3 | torch==2.5.1 | cu118, cu121, cu124 |
0.0.28.post2 | torch==2.5.0 | cu118, cu121, cu124 |
0.0.28.post1 | torch==2.4.1 | cu118, cu121, cu124 |
Β
python app.py
# Inference
python inference.py --model_path PRE_TRAINED_MODEL_PATH --point_cloud sample_data/scene0000_00/scene0000_00_pc_result.ply -o test.txt
# Save the result as rrd file
python visualize.py --point_cloud sample_data/scene0000_00/scene0000_00_pc_result.ply --layout test.txt --save test.rrd
# Open the rrd file in Windows system
pip install rerun-sdk
rerun test.rrd
Β
Note: 48GB VRAM GPU machine preferred and 5 hours training time needed
modelscope download --model qwen/Qwen2.5-0.5B-Instruct --local_dir ./Qwen2.5-0.5B-Instruct
python model/modify_qwen2.5.py --model_path ./Qwen2.5-0.5B-Instruct # modify the vision token to point token
Download scenescript_model_ase.ckpt and put scenescript_model_ase.ckpt to the root dir of the project.
Thanks to VLA-3D, we can get six most popular open source dataset (Scannet / Matterport / HM3D / Unity / ARKitScenes / 3RScan) with same format for easy training. We use Scannet as an example to show how to train the model. For other dataset(Matterport / HM3D / Unity / ARKitScenes / 3RScan), you can refer to the code and train it by yourself.
git lfs install
git clone https://huggingface.co/datasets/sengine-research/preprocessed-vla-3d
mkdir 3D_dataset
unzip ./preprocessed-vla-3d/Scannet.zip -d ./3D_dataset/
python dataset/preprocess_data_scene_script.py --data_path 3D_dataset --dataset_name Scannet
After preprocess, you will get the preprocessed data in the folder preprocessed_data_scene_script
.
python train.py --dataset_dir preprocessed_data_scene_script --dataset_name Scannet --model_path ./Qwen2.5-0.5B-Instruct --exp_path YOUR_EXP_PATH --exp_name YOUR_EXP_NAME --stage_1_epochs EPOCH_NUM --stage_2_epochs EPOCH_NUM --batch_size BATCH_SIZE --gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS --learning_rate LEARNING_RATE --save_per_epoch SAVE_PER_EPOCH
# example
python train.py --dataset_dir preprocessed_data_scene_script --dataset_name Scannet --model_path ./Qwen2.5-0.5B-Instruct --exp_path ./exp --exp_name space_lm_model_qwen_llm_lr_1e-6_point_lr_1e-5 --stage_1_epochs 4 --stage_2_epochs 10 --batch_size 1 --gradient_accumulation_steps 16 --learning_rate 5e-6 --save_per_epoch 2
We use two stages to train the model. The first stage is to train the point backbone model and the second stage is to train the whole model on the Scannet dataset.
python inference.py --model_path YOUR_EXP_PATH/YOUR_EXP_NAME/EPOCH_NUM --point_cloud PLY_FILE -o OUTPUT_FILE
python visualize.py --point_cloud PLY_FILE --layout OUTPUT_FILE --save OUTPUT_FILE.rrd
rerun OUTPUT_FILE.rrd # better on windows
# example
python inference.py --model_path exp/space_lm_model_qwen_llm_lr_1e-5_point_lr_1e-4_no_stage_1_Scannet/stage_2/epoch_0 --point_cloud sample_data/scene0000_00/scene0000_00_pc_result.ply -o test.txt
python visualize.py --point_cloud sample_data/scene0000_00/scene0000_00_pc_result.ply --layout test.txt --save test.rrd
rerun test.rrd # better on windows
Β
We welcome contributions in the following ways:
- Submit an Issue to report problems
- Create a Pull Request to improve the code
- Complete the project documentation
- Share your usage examples
Β
This work is inspired by the following projects:
SpatialLM | Qwen2.5 | SceneScript | VLA-3D