Prompting Medical Vision-Language Models to Mitigate Diagnosis Bias by Generating Realistic Dermoscopic Images
This work builds upon the repository DiT(https://github.com/facebookresearch/DiT) by Facebook Research. We extend their implementation for our DermDiT model. The setup and training workflow are also adapted from the original repository.
First, download and set up the repo:
git clone https://github.com/Munia03/DermDiT.git
cd DermDiT
We provide an environment.yml
file that can be used to create a Conda environment.
conda env create -f environment.yml
conda activate DiT
We provide a training script for DiT in train_text_to_image.py
. This script can be used to train text-conditional DermDiT model.
To launch DiT-L/4 (256x256) training with N
GPUs on one node:
torchrun --nnodes=1 --nproc_per_node=N train_text_to_image.py --model DiT-L/4 --data-path /path/to/imagenet/train
We include a sample_text2img.py
script which samples a large number of images from a DiT model in parallel. This script
generates a folder of samples as well as a .npz
file which can be directly used with ADM's TensorFlow
evaluation suite to compute FID, Inception Score and
other metrics. For example, to sample 50K images from a trained model over N
GPUs, run:
torchrun --nnodes=1 --nproc_per_node=N sample_text2img.py --model DiT-L/4 --image-size 256 --num-fid-samples 50000 --ckpt /path/to/model.pt
@inproceedings{munia2025prompting,
title={Prompting Medical Vision-Language Models to Mitigate Diagnosis Bias by Generating Realistic Dermoscopic Images},
author={Munia, Nusrat and Imran, Abdullah Al Zubaer},
booktitle={2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI)},
pages={1--4},
year={2025},
organization={IEEE}
}