Official PyTorch implementation of Few-shot Object Localization
Existing object localization methods are tailored to locate a specific class of objects, relying on abundant labeled data for model optimization. However, in numerous real-world scenarios, acquiring large labeled data can be arduous, significantly constraining the broader application of localization models. To bridge this research gap, this paper proposes the novel task of Few-Shot Object Localization (FSOL), which seeks to achieve precise localization with limited samples available. This task achieves generalized object localization by leveraging a small number of labeled support samples to query the positional information of objects within corresponding images. To advance this research field, we propose an innovative high-performance baseline model. Our model integrates a dual-path feature augmentation module to enhance shape association and gradient differences between supports and query images, alongside a self query module designed to explore the association between feature maps and query images. Experimental results demonstrate a significant performance improvement of our approach in the FSOL task, establishing an efficient benchmark for further research. The architecture of the model is as follows:
pip install -r requirements.txt
Official website: https://github.com/cvlab-stonybrook/LearningToCountEverything
- Copy 'images_384_VarV2' and 'gt_density_map_adaptive_384_VarV2' to data/FSC147_384_V2
- Run gen_gt_density.py The structure should be as follows:
|-- data
|-- FSC147_384_V2
|-- images_384_VarV2
|-- gt_density_map_adaptive_384_VarV2
|-- train.json
|-- val.json
|-- test.json
|-- gen_gt_density.py
Official website: https://github.com/desenzhou/ShanghaiTechDataset
For ShanghaiTech partA:
- Copy 'test_data', 'train_data' to data/ShanghaiTech/part_A
- Run gen_gt_density.py
For ShanghaiTech partB:
- Copy 'test_data', 'train_data' to data/ShanghaiTech/part_B
- Run gen_gt_density.py The structure should be as follows:
|-- data
|-- ShanghaiTech
|-- part_A
|-- train_data
|-- test_data
|-- gen_gt_density.py
|-- train.json
|-- test.json
|-- exemplar.json
|-- part_B
|-- train_data
|-- test_data
|-- gen_gt_density.py
|-- train.json
|-- test.json
|-- exemplar.json
Official website: https://lafi.github.io/LPN/
- Copy 'CARPK/CARPK_devkit/data/Images' to data/CARPK_devkit/
- Run gen_gt_density.py The structure should be as follows:
|-- data
|-- CARPK_devkit
|-- Images
|-- gen_gt_density.py
|-- train.json
|-- test.json
|-- exemplar.json
You can train FSOL model on different datasets. Under the root directory, you can first enter the experiment folder by:
FSC-147:
cd experiments/FSC147
ShanghaiTech partA:
cd experiments/ShanghaiTech/part_A
ShanghaiTech partB:
cd experiments/ShanghaiTech/part_B
CARPK:
cd experiments/CARPK
Then, you can run sh train.sh #GPU_NUM #GPU_ID
to train the FSOL model. For example, training with one GPU and ID 0 should be sh train.sh 1 0
. For FSC-147 dataset, you can run sh eval.sh #GPU_NUM #GPU_ID
for evaluation and sh test.sh #GPU_NUM #GPU_ID
for testing. For other datasets, you can run sh eval.sh #GPU_NUM #GPU_ID
for testing.
We suggest you train the model on single GPU.
You can access the following links to get pretrained weight of one-shot FSOL model. Google Drive: here; Baidu Netdisk: here.
You can load pre-trained weight of FSOL model through modifying the config file for each experiment. For example, for FSC147 experiment, move the model weight to experiments/FSC147/checkpoints, then access to experiments/FSC147/config.yaml and modify as follows:
saver:
ifload: True
load_weight: FSOL_Final.tar
save_dir: checkpoints/
log_dir: log/
All of the following results are experimented on one NVIDIA RTX 3090 with one support sample provided.
Dataset | F1( |
AP( |
AR( |
F1( |
AP( |
AR( |
---|---|---|---|---|---|---|
FSC-147 | 53.4 | 55.5 | 51.4 | 70.0 | 72.7 | 67.4 |
Shanghai A | 52.4 | 58.4 | 47.6 | 69.6 | 77.6 | 63.1 |
Shanghai B | 67.2 | 75.5 | 60.5 | 78.0 | 88.4 | 70.9 |
CARPK | 81.84 | 80.9 | 82.8 | 93.46 | 92.38 | 94.56 |
This code is based on SAFECount and FIDTM. Many thanks for your code implementation.
@article{FSOL,
title={Few-shot Object Localization},
author={Yunhan Ren and Bo Li and Chengyang Zhang and Yong Zhang and Baocai Yin},
journal={ArXiv},
year={2024},
volume={abs/2403.12466},
url={https://api.semanticscholar.org/CorpusID:268531824}
}