For the stereo matching method based on deep learning, the network architecture is critical for the accuracy of the algorithm, while the efficiency is also an important factor to consider in practical applications. A stereo matching method with spare cost volume in disparity dimension is proposed. The spare cost volume is created by shifting right-view features with a wide stride to reduce greatly the memory and computational resources in three-dimension convolution module. The matching cost is nonlinearly sampled by means of multi-classification in disparity dimension, and model is trained with merging two kind of loss function, so that the accuracy is improved without notably lowering the efficiency.
Usage of KITTI and SceneFlow dataset in stereo/dataloader/README.md
As an example, use the following command to train a WSMCnet on SceneFlow
dir_save="./results"
LOG="${dir_save}/log_`date +%Y-%m-%d_%H-%M-%S`.txt"
mkdir -p "${dir_save}"
python main.py --mode Train --arch WSMCnetEB_S2C3F32 --maxdisp 192 --bn 4 \
--loadmodel None \
--datas_train "sf-tr" --dir_datas_train (dir_root_sf) \
--datas_val "sf-val" --dir_datas_val (dir_root_sf) \
--crop_width 512 --crop_height 256 \
--epochs 20 --nloop 1 --freq_print 20 \
--freq_optim 4 \
--lr 0.001 --lr_epoch0 16 \
--lr_stride 10 --lr_delay 0.1 \
--dir_save $dir_save \
2>&1 | tee -a "$LOG"
As another example, use the following command to finetune a WSMCnet on KITTI
dir_save="./results"
LOG="${dir_save}/log_`date +%Y-%m-%d_%H-%M-%S`.txt"
mkdir -p "${dir_save}"
python main.py --mode Finetune --arch WSMCnetEB_S2C3F32 --maxdisp 192 --bn 4 \
--loadmodel (filepath of pretrained weight) \
--datas_train "k15-tr,k12-tr" --dir_datas_train (dir_root_kitti) \
--datas_val "k15-val,k12-val" --dir_datas_val (dir_root_kitti) \
--crop_width 512 --crop_height 256 \
--epochs 20 --nloop 30 --freq_print 20 \
--freq_optim 4 \
--lr 0.005 --lr_epoch0 16 \
--lr_stride 10 --lr_delay 0.2 \
--dir_save $dir_save \
2>&1 | tee -a "$LOG"
You can also see those examples in [demos/train_*.sh] for details.
Use the following command to evaluate the trained WSMCnet on KITTI 2015 test data
dir_save="./results"
LOG="${dir_save}/log_`date +%Y-%m-%d_%H-%M-%S`.txt"
mkdir -p "${dir_save}"
python main.py --mode Submission --arch WSMCnetEB_S2C3F32 --maxdisp 192 --bn 1 \
--loadmodel (filepath of pretrained weight) \
--datas_val "k15-te" --dir_datas_val (dir_root_kitti) \
--freq_print 1 \
--dir_save $dir_save \
2>&1 | tee -a "$LOG"
You can also see the example in demos/kitti_submission.sh for details.
Model | SceneFlow | KITTI |
---|---|---|
WSMCnet-S1C1 | Baidu-pan | Baidu-pan |
WSMCnetEB-S2C3 | Baidu-pan | Baidu-pan |
WSMCnetEB-S3C3 | Baidu-pan | Baidu-pan |
Extraction code:rycn
Results on KITTI 2015 leaderboard
Method | D1-all (All) | D1-all (Noc) | Runtime (s) | Environment |
---|---|---|---|---|
WSMCnetEB-S2C3 | 2.13 % | 1.85 % | 0.39 | Nvidia GTX 1070 (pytorch) |
PSMNet | 2.32 % | 2.14 % | 0.41 | Nvidia GTX Titan Xp (pytorch) |
iResNet-i2 | 2.44 % | 2.19 % | 0.12 | Nvidia GTX Titan X (Pascal) (Caffe) |
GC-Net | 2.87 % | 2.61 % | 0.90 | Nvidia GTX Titan X (TensorFlow) |
MC-CNN | 3.89 % | 3.33 % | 67 | Nvidia GTX Titan X (CUDA, Lua/Torch7) |
Any discussions or concerns are welcomed!