Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings

This repository provides training- and evaluation-related code to Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings. Code that was used to create the ILID dataset is at github.com/kenomo/ilid. On request, we provide the full dataset - send an email to 📧 Keno Moenck.

This repository builds upon the code base of Dassl, uses the original CLIP implementation, and contains a devcontainer.json and Dockerfile, which setups all the necessary dependencies. The training and code was tested on a 4090.

🏋️ Training

We provide trainers for CLIP-Adapter, CoOp, zero-shot CLIP and combination of CoOp and adapters in industrial_clip/trainers. For training use train.py, for evaluation only use eval.py. You will find example configuration files under configs/, datasets must be placed under data/. For ILID create a folder structure as:

│
├── data/
│   └── ilid/
│       ├── images/
│       │   └── ...     # downloaded images
│       └── ilid.json   # dataset json file
│
├── ...

You can find example run configurations in train.sh and eval.sh. If you want to use W&B to track your runs, you will need an API key and export it as WANDB_API_KEY. We extended Dassl's configuration parameters (s. /root/dassl/configs and /root/dassl/dassl/config/defaults.py), which you will find in utils.py.

📈 Evaluation

We provide a set of notebooks for evaluation:

cross_validation.ipynb: Get cross-validation results of multiple runs from W&B.
embeddings.ipynb: Generate TSNE diagrams. Before running the notebook, you have to set the configuration flag EVAL.SAVE_EMBEDDINGS to Trueto save image and text encoder embeddings.
prompting.ipynb: Example for prompting.
material_prompting.ipynb: Example for prompting for materials.

samclip.ipynb: Example for language-guided segmentation with SAM. You have to download a chackpoint before:

curl -L -o /root/industrial-clip/evaluation/sam_vit_h_4b8939.pth https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

☎ Contact

You are welcome to submit issues, send pull requests, or share some ideas with us. If you have any other questions, please contact 📧: Keno Moenck.

✍ Citation

If you find ILID or the provided code useful to your work/research, please cite:

@article{Moenck.2024,
  title = {Industrial {{Language-Image Dataset}} ({{ILID}}): {{Adapting Vision Foundation Models}} for {{Industrial Settings}}},
  author = {Moenck, Keno and Thieu, Duc Trung and Koch, Julian and Sch{\"u}ppstuhl, Thorsten},
  year = {2024},
  journal = {Procedia CIRP},
  series = {57th {{CIRP Conference}} on {{Manufacturing Systems}} 2024 ({{CMS}} 2024)},
  volume = {130},
  pages = {250--263},
  issn = {2212-8271},
  doi = {10.1016/j.procir.2024.10.084}
}

@misc{Moenck.14.06.2024,
  author = {Moenck, Keno and Thieu, Duc Trung and Koch, Julian and Sch{\"u}ppstuhl, Thorsten},
  title = {Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings},
  date = {14.06.2024},
  year = {2024},
  url = {http://arxiv.org/pdf/2406.09637},
  doi = {https://doi.org/10.48550/arXiv.2406.09637}
}

🙏 Acknowledgment

Dassl, CoOp, and APEX that helped during the course of this work.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.devcontainer		.devcontainer
.github		.github
CLIP @ dcba3cb		CLIP @ dcba3cb
configs		configs
data		data
evaluation		evaluation
industrial_clip		industrial_clip
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.md		LICENSE.md
README.md		README.md
eval.py		eval.py
eval.sh		eval.sh
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings

🏋️ Training

📈 Evaluation

☎ Contact

✍ Citation

🙏 Acknowledgment

About

Uh oh!

Languages

License

kenomo/industrial-clip

Folders and files

Latest commit

History

Repository files navigation

Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings

🏋️ Training

📈 Evaluation

☎ Contact

✍ Citation

🙏 Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages