Hi, this is the original code for our CVPR2024 paper: CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs.
We use ControlNet for img-to-img conversion, and Grounded-SAM for semantic segmentation.
- Release Usage and Data
- Provide a Notebook version
- Make codes... readable
The dataset is avilable here. Dataset_V1 and V2 are for different abstraction levels, i.e. primitive numbers. Since the dataset has not been throughly reviewed, we also release our labelling tool here
This file takes as input an arbitrary .scad file, and output all the 'to be commented' locations by adding placeholders to those locations.
The core function is the 'CADTalk_parser' function. Please specify the file paht and output path and run with
python SPA/0_parse_scad_code.py
Known Issue : The parser doesn't support all .scad files since it is developed only for this project.
After having all the 'to-be-commented' blocks, this file is to render multiview images of the given CAD program, and, in the meantime, register pixel-to-block correspondence. The core is the 'render_multi_view' function, which takes four input:
- working_dir: this is used as a tempory storage during the registration, and the final results will also be here.
- program_dir: this is a folder that stores all the programs that you want to register and render.
- model_dir: this is to store the 3D model for all the programs.
the output is 'pixel2block.npy', which stores a correspondence matrix named by 'pixel2block', along with the block area and number of blocks.
the registration is used to conduct the voting, we achieve this by change block color and recognize the interested color.
This file is to produce depth image for the program. The input is the working_dir and model_dir specified in the previous stage. Change the two paths in 2_get_depth.py
The script requires blender to execute. Install blender3.2 and run it with
[path_to_blender3.2/blender] ./SPA/2_get_depth.blend --background --python ./SPA/2_get_depth.py
Notice : Blender 4.x will produce low-contrast depth maps, that is because the background depth is considered in the 'normalize' node.
You will find the depth image under working_dir with the name as depth0001.png
** For 'Unable to open a display' issue, the solution is:
- run
sudo ./virtualfb/virtualfb.sh
- you will see output like 'DISPLAY=:567'
- run
export DISPLAY=:567
- retry the blender script
- run
sudo ./virtualfb/virtualfb.sh stop
to quit the virtualfb
You can avoid using blender by implementing your own depth-rendering tool.
This stage convert depth images to realistic images with ControlNet
-
Setup ControlNet with
cd ControlNet conda env create -f environment.yaml conda activate control-v11
-
Download stable diffusion v1.5 checkpoint v1-5-pruned.ckpt and depth ControlNet and save under ControlNet/models.
-
Under the ControlNet folder, conduct depth-to-image with
python ./3_controlnet.py
The file is to conduct open-vocabulary image segmentation based on the generated images.
-
Setup Grounded SAM by:
cd Grounded-Segment-Anything make build-image
Notice1 : The dockerfile has been modified Notice2 : For issue regrading installing GSA, please refer to their official page
-
Run prediction (DINO + SAM) with:
make run #you will enter a terminal cd Grounded-Segment-Anything/ python 4_prediction.py
You will have result under examples/stage3
With numpy installed:
python SPA/5_voting.py
You will have the annotated code under examples/stage4
Notice!!! : We tried to visualize the code label in OpenSCAD, however, this only works for abstracted shape. You will see incorrect color display due to their complex program structure for instances from dataset_Real.