QPMIL-VL ✨

Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

Accepted by AAAI 2025

Abstract: Whole Slide Image (WSI) classification has very significant applications in clinical pathology, e.g., tumor identification and cancer diagnosis. Currently, most research attention is focused on Multiple Instance Learning (MIL) using static datasets. One of the most obvious weaknesses of these methods is that they cannot efficiently preserve and utilize previously learned knowledge. With any new data arriving, classification models are required to be re-trained on both previous and current new data. To overcome this shortcoming and break through traditional vision modality, this paper proposes the first Vision-Language-based framework with Queryable Prototype Multiple Instance Learning (QPMIL-VL) specially designed for incremental WSI classification. This framework mainly consists of two information processing branches: one is for generating bag-level features by prototype-guided aggregation of instance features, while the other is for enhancing class features through a combination of class ensemble, tunable vector and class similarity loss. The experiments on four public WSI datasets demonstrate that our QPMIL-VL framework is effective for incremental WSI classification and often significantly outperforms other compared methods, achieving state-of-the-art (SOTA) performance.

📚 Recent News

04/11/2025: Our paper has been published in the AAAI 2025 proceedings.
01/23/2025: 🎉 One co-authored paper, VLSA, is accepted by ICLR 2025. Refer to its code & paper for more details. Congratulations to Pei Liu.
12/15/2024: Update our prepared dataset to facilitate direct online viewing of the dataset's file organization and allow for on-demand downloading of specific files.
12/10/2024: 🥳 Our QPMIL-VL is accepted by AAAI 2025 and its code & paper (containing Supplementary Material) are live.

On updating. Stay tuned.

👩‍💻 Running the Code

1. CONCH Pre-trained Weight

First of all, you could download the pre-trained weight of pathology VLM CONCH here (official link).

2. WSI Preprocessing

We use CLAM to crop non-overlapping 256 × 256 patches from the segmented tissue at 10× magnification. Then, pre-trained image encoder in CONCH is used to extract instance features. You could move to Pipeline-Processing-TCGA-Slides-for-MIL for a detailed tutorial.

Of course, you could also use the dataset we prepared directly (corresponding compressed file is here).

3. Running Environment

All experiments are run on a machine with

two NVIDIA GeForce RTX 3090 GPUs
python==3.8 and pytorch==1.11.0+cu113

Detailed package requirements:

for pip or conda users, full requirements are provided in requirements.txt.
for docker users, you could use our base docker image via docker pull yuukilp/deepath:py38-torch1.11.0-cuda11.3-cudnn8-devel and then install additional essential python packages (see requirements.txt) in the container.

4. Training Models

All important arguments are explained in configs/main.yaml. You could replace the values of dataset_root_dir and conch_ckpt_path with the root directory of dataset and the path of CONCH pre-trained weight respectively.

Finally, in the project directory scripts/, execute the following command (ten-fold cross-validation):

./main.sh

🌈 Acknowledgments

This work is supported by the National Natural Science Foundation of China (NSFC) under Grant No.62476049.

Some parts of codes in this repo are adapted from the following amazing works. We thank the authors and developers for their selfless contributions.

Luping Ji, Pei Liu: Provide detailed guidance.
CONCH: Our QPMIL-VL is driven by this great pathology VLM.
AttriCLIP: A prompt tuning approach for incremental learning of natural image based on CLIP.
L2P: A prompt-query mechanism based on the pre-trained ViT model is designed to mitigate catastrophic forgetting in incremental learning for natural image.
TOP: Proposes a two-level prompt learning MIL framework based on GPT-4 and CLIP for Few-shot WSI Classification (FSWC) problem.
TaskRes: An efficient fine-tuning method of VL pre-training model is proposed.

🔓 License and Terms of Use

ⓒ UESTC. The models and associated code are released under the CC-BY-NC-ND 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. Any commercial use, sale, or other monetization of the QPMIL-VL model and its derivatives are prohibited and require prior approval. If you are a commercial entity, please contact the corresponding author (Luping Ji).

📝 Citation

If you find this work helps your research, please consider citing our paper:

@inproceedings{gou2025queryable,
  title={Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification},
  author={Gou, Jiaxiang and Ji, Luping and Liu, Pei and Ye, Mao},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={3},
  pages={3158--3166},
  year={2025}
}

Additionally, our another work, for the first time, proposes a new Vision-Language-based Survival Analysis (VLSA) paradigm. If you find VLSA useful, please also consider citing the corresponding paper:

@inproceedings{liu2025interpretable,
  title={Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology},
  author={Liu, Pei and Ji, Luping and Gou, Jiaxiang and Fu, Bo and Ye, Mao},
  booktitle={International conference on learning representations},
  year={2025},
  url={https://arxiv.org/abs/2409.09369}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
class_ensemble		class_ensemble
configs		configs
dataset		dataset
docs		docs
eval_template		eval_template
manager		manager
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

QPMIL-VL ✨

Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

📚 Recent News

👩‍💻 Running the Code

1. CONCH Pre-trained Weight

2. WSI Preprocessing

3. Running Environment

4. Training Models

🌈 Acknowledgments

🔓 License and Terms of Use

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

can-can-ya/QPMIL-VL

Folders and files

Latest commit

History

Repository files navigation

QPMIL-VL ✨

Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

📚 Recent News

👩‍💻 Running the Code

1. CONCH Pre-trained Weight

2. WSI Preprocessing

3. Running Environment

4. Training Models

🌈 Acknowledgments

🔓 License and Terms of Use

📝 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages