8000 Reproduce train_cirkd_segformer.py · Issue #32 · winycg/CIRKD · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Reproduce train_cirkd_segformer.py #32
Open
@GiLonn

Description

@GiLonn

Hi, thanks your excellent work !

I reproduce your code and find that the training time is too long.

The training time showed on my server is about 25 days, the scripts such as follows:

CUDA_VISIBLE_DEVICES=0
python -m torch.distributed.launch --nproc_per_node=1 ./CIRKD-main/train_cirkd_segformer.py
--teacher-model segformer
--student-model segformer
--teacher-backbone MiT_B4
--student-backbone MiT_B0
--dataset citys
--data ./CIRKD-main/dataset/cityscapes/
--optimizer-type adamw
--pixel-memory-size 2000
--lr 0.0002
--batch-size 4
--workers 16
--crop-size 512 1024
--max-iterations 320000
--save-dir ./CIRKD-main/work_dirs/mit_b0_OnlyKD
--log-dir ./CIRKD-main/work_dirs/mit_b0_OnlyKD
--teacher-pretrained ./segformers/mit_b4.pth
--student-pretrained ./segformers/mit_b0.pth

And the original scripts such as follows:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m torch.distributed.launch --nproc_per_node=8 --master_addr 127.5.0.4 --master_port 26501
train_cirkd_segformer.py
--teacher-model segformer
--student-model segformer
--teacher-backbone MiT_B4
--student-backbone MiT_B0
--dataset citys
--data --data [your dataset path]/cityscapes/
--batch-size 8
--workers 16
--crop-size 1024 1024
--optimizer-type adamw
--pixel-memory-size 2000
--region-contrast-size 1024
--pixel-contrast-size 4096
--kd-temperature 1
--lambda-kd 1.0
--contrast-kd-temperature 1.
--lambda-kd 1.
--lambda-minibatch-pixel 1.
--lambda-memory-pixel 0.1
--lambda-memory-region 0.1
--lr 0.0002
--max-iterations 160000
--save-dir [your directory path to store checkpoint files]
--log-dir [your directory path to store log files]
--gpu-id 0,1,2,3,4,5,6,7
--teacher-pretrained [your teacher weights path]/segformer_MiT_B4_citys_best_model.pth
--student-pretrained [your pretrained-backbone path]/mit_b0.pth

The reason I use the 320K training iterations is that I use 4 as the batch size.

I only have 1 4090 GPU and cannot run with 16 batchsize following the original CIRKD_Segformer setting. So I want to use the bs4/320K iters comparable with bs8/160K, is it right way to reproduce it?

Could you please share the log file of training train_cirkd_segformer.py?
I'd like to know about your advices and if some parameters are set incorrectly?

Looking forward to your reply. Thanks!

Best Regards!
Jun

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0