The repository official code for Crane, a zero-shot anomaly detection framework built on CLIP.
- Introduction
- Results
- Getting Started
- Installation
- Datasets
- Custom Dataset
- Citation
- Acknowledgements
- Contact
Crane is a zero-shot anomaly detection (ZSAD) framework that leverages a pre-trained vision-language model, CLIP, for robust and generalizable anomaly localization. It introduces two attention refinement modulesβE-Attn and D-Attnβinserted into the vision backbone to enhance patch-level alignment, fully utilizing the pretrained knowledge, for the zero-shot task. For image-level refinement, Crane adjusts the CLS token to improve global anomaly sensitivity and incorporates a context-guided prompt learning strategy to better model finer-grained anomalies. Together, these components strengthen both image-level and pixel-level detection. Extensive experiments across 14 datasets from industrial and medical domains show that Crane achieves state-of-the-art performance with consistent improvements across multiple evaluation metrics.
- Enhancing the sensitivity of global to anomalous cues for image-level anomaly detection
- Reinforcing patch-level alignment by extending self-correlation attention through E-Attn
- Further improving patch-level alignment using the similarity of DINO features through D-Attn
- Improving auxiliary training generalization through context-guided prompt learning
To reproduce the results, follow the instructions below to run inference and training:
All required libraries, including the correct PyTorch version, are specified in environment.yaml. Running setup.sh will automatically create the environment and install all dependencies.
git clone https://github.com/AlirezaSalehy/Crane.git && cd Crane
bash setup.sh
conda activate crane_env
The required checkpoints for CLIP and DINO will be downloaded automatically by the code and stored in ~/.cache
. However, the ViT-B SAM checkpoint must be downloaded manually.
Please download sam_vit_b_01ec64.pth
from the official Segment Anything repository here to the following directory:
~/.cache/sam/sam_vit_b_01ec64.pth
You can download the datasets from their official sources and use utilities in datasets/generate_dataset_json/
to generate a compatible meta.json. Alternatively from the AdaCLIP repository which has provided a compatible format of the datasets. Place all datasets under DATASETS_ROOT
, which is defined in ./__init__.py
.
bash test.sh default
bash train.sh default
You can use your custom dataset with our model easily following instructions bellow:
Your dataset must either include a meta.json
file at the root directory, or be organized so that one can be automatically generated.
The meta.json
should follow this format:
- A dictionary with
"train"
and"test"
at the highest level - Each section contains class names mapped to a list of samples
- Each sample includes:
img_path
: path to the image relative to the root dirmask_path
: path to the mask relative to the root dir (empty for normal samples)cls_name
: class namespecie_name
: subclass or condition (e.g.,"good"
,"fault1"
)anomaly
: anomaly label; 0 (normal) or 1 (anomalous)
If your dataset does not include the required meta.json
, you can generate it automatically by organizing your data as shown below and running datasets/generate_dataset_json/custom_dataset.py
:
datasets/your_dataset/
βββ train/
β βββ c1/
β β βββ good/
β β βββ <NAME>.png
β βββ c2/
β βββ good/
β βββ <NAME>.png
βββ test/
β βββ c1/
β β βββ good/
β β β βββ <NAME>.png
β β βββ fault1/
β β β βββ <NAME>.png
β β βββ fault2/
β β β βββ <NAME>.png
β β βββ masks/
β β βββ <NAME>.png
β βββ c2/
β βββ good/
... ...
Once organized, run the script to generate a meta.json
automatically at the dataset root.
Then you should place your dataset in the DATASETS_ROOT
, specified in datasets/generate_dataset_json/__init__.py
and run the inference:
python test.py --dataset YOUR_DATASET --model_name default --epoch 5
This project is licensed under the MIT License. See the LICENSE file for details.
If you find this project helpful for your research, please consider citing the following BibTeX entry.
BibTeX:
@article{salehi2025crane,
title={Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections},
author={Salehi, Alireza and Salehi, Mohammadreza and Hosseini, Reshad and Snoek, Cees GM and Yamada, Makoto and Sabokrou, Mohammad},
journal={arXiv preprint arXiv:2504.11055},
year={2025}
}
This project builds upon:
We thank the authors for their contributions and open-source support.
For questions or collaborations, please contact alireza99salehy@gmail.com.