Reproduce train_cirkd_segformer.py

Hi, thanks your excellent work !

I reproduce your code and find that the training time is too long.

The training time showed on my server is about 25 days, the scripts such as follows:

CUDA_VISIBLE_DEVICES=0
python -m torch.distributed.launch --nproc_per_node=1 ./CIRKD-main/train_cirkd_segformer.py
--teacher-model segformer
--student-model segformer
--teacher-backbone MiT_B4
--student-backbone MiT_B0
--dataset citys
--data ./CIRKD-main/dataset/cityscapes/
--optimizer-type adamw
--pixel-memory-size 2000
--lr 0.0002
--batch-size 4
--workers 16
--crop-size 512 1024
--max-iterations 320000
--save-dir ./CIRKD-main/work_dirs/mit_b0_OnlyKD
--log-dir ./CIRKD-main/work_dirs/mit_b0_OnlyKD
--teacher-pretrained ./segformers/mit_b4.pth
--student-pretrained ./segformers/mit_b0.pth

And the original scripts such as follows:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m torch.distributed.launch --nproc_per_node=8 --master_addr 127.5.0.4 --master_port 26501
train_cirkd_segformer.py
--teacher-model segformer
--student-model segformer
--teacher-backbone MiT_B4
--student-backbone MiT_B0
--dataset citys
--data --data [your dataset path]/cityscapes/
--batch-size 8
--workers 16
--crop-size 1024 1024
--optimizer-type adamw
--pixel-memory-size 2000
--region-contrast-size 1024
--pixel-contrast-size 4096
--kd-temperature 1
--lambda-kd 1.0
--contrast-kd-temperature 1.
--lambda-kd 1.
--lambda-minibatch-pixel 1.
--lambda-memory-pixel 0.1
--lambda-memory-region 0.1
--lr 0.0002
--max-iterations 160000
--save-dir [your directory path to store checkpoint files]
--log-dir [your directory path to store log files]
--gpu-id 0,1,2,3,4,5,6,7
--teacher-pretrained [your teacher weights path]/segformer_MiT_B4_citys_best_model.pth
--student-pretrained [your pretrained-backbone path]/mit_b0.pth

The reason I use the 320K training iterations is that I use 4 as the batch size.

I only have 1 4090 GPU and cannot run with 16 batchsize following the original CIRKD_Segformer setting. So I want to use the bs4/320K iters comparable with bs8/160K, is it right way to reproduce it?

Could you please share the log file of training train_cirkd_segformer.py?
I'd like to know about your advices and if some parameters are set incorrectly?

Looking forward to your reply. Thanks!

Best Regards!
Jun

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions