8000 GitHub - davnords/octic-vits: Stronger ViTs With Octic Equivariance
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

davnords/octic-vits

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stronger ViTs With Octic Equivariance

David Nordström · Johan Edstedt · Fredrik Kahl · Georg Bökman


example

Incorporating octic layers into Vision Transformers (ViTs) reduces the computational complexity while maintaining or improving representational power. We provide a PyTorch implementation for easy integration into existing ViT pipelines.

Structure

Octic ViTs

In the octic_vits folder you find all the components to build octic-equivariant Vision Transformers (intended to be compatible with the timm library). For example, to create an octic ViT-H you can run the following:

from octic_vits import OcticVisionTransformer

model = OcticVisionTransformer(embed_dim=1280, depth=32, num_heads=16)

This will default to a hybrid model with the first half of its block being octic and the remaining standard (i.e. this model will have approx. 40% less FLOPs than a regular ViT-H). To instead obtain an invariant model, simply set invariant=True. You can further decide on the number of octic blocks k by setting octic_equi_break_layer=k.

DeiT III

Code based on the official repo has been placed in the deit folder.

DINOv2

Code based on the official repo has been placed in the dinov2 folder.

Reproducing Results

Code to reproduce the experiments can be found in the experiments folder. Below follows general instruction on how to run it and how to obtain pretrained model weights.

Setup

All the code is written with the intent to be run on a Slurm cluster using submitit. So first you must set up the cluster settings in utils/cluster.py. If you intend to run it using torchrun instead, it should work straightforwardly. Also, make sure to run export PYTHONPATH=$(pwd) in the root folder of this directory to ensure relative imports work as intended.

Environment

For DINOv2 we use the same environment as in the original repo and same goes for deit. Additional miscellaneous installations, e.g. submitit, need to be additionally downloaded.

Since DeiT III is deprecated we provide some additional guidance on its installation in DEIT_ENV.md

Data

ImageNet-1K

We follow the DINOv2 IN1K data structure. As such, the root directory of the dataset should hold the following contents:

  • <ROOT>/test/ILSVRC2012_test_00000001.JPEG
  • <ROOT>/test/[..]
  • <ROOT>/test/ILSVRC2012_test_00100000.JPEG
  • <ROOT>/train/n01440764/n01440764_10026.JPEG
  • <ROOT>/train/[...]
  • <ROOT>/train/n15075141/n15075141_9993.JPEG
  • <ROOT>/val/n01440764/ILSVRC2012_val_00000293.JPEG
  • <ROOT>/val/[...]
  • <ROOT>/val/n15075141/ILSVRC2012_val_00049174.JPEG
  • <ROOT>/labels.txt

The provided dataset implementation expects a few additional metadata files to be present under the extra directory:

  • <EXTRA>/class-ids-TRAIN.npy
  • <EXTRA>/class-ids-VAL.npy
  • <EXTRA>/class-names-TRAIN.npy
  • <EXTRA>/class-names-VAL.npy
  • <EXTRA>/entries-TEST.npy
  • <EXTRA>/entries-TRAIN.npy
  • <EXTRA>/entries-VAL.npy

These metadata files can be generated (once) with the following lines of Python code:

from dinov2.data.datasets import ImageNet

for split in ImageNet.Split:
    dataset = ImageNet(split=split, root="<ROOT>", extra="<EXTRA>")
    dataset.dump_extra()

ADE20K / VOC2012

For segmentation evaluation we use the code from capi and as the creator of said repository is very helpful, he has enabled automatic downloading of the datasets. For more information consult the original repo.

Weights

Download the weights from here to reproduce the evaluation metrics. The DINOv2 weights only include the teacher backbone. HERE is a link to a Google Drive that contains all files of interest.

DeiT III

model # of
params
# of
FLOPs
ImageNet
Top-1
weights logs
Hybrid ViT-H/14 356 M 102 G 85.0% weights logs
Invariant ViT-H/14 362 M 104 G 84.7% weights logs
Hybrid ViT-L/16 171 M 38 G 84.5% weights logs
Invariant ViT-L/16 175 M 39 G 84.0% weights logs

DINOv2

model # of
FLOPs
ImageNet
linear
ImageNet
knn
weights logs
ViT-H/16 128 G 81.7% 81.0% weights logs
Hybrid ViT-H/16 78 G 82.2% 81.4% weights logs
Invariant ViT-H/16 78 G 81.9% 80.9% weights logs
ViT-L/16 62 G 80.9% 80.5% weights logs
Hybrid ViT-L/16 38 G 81.3% 80.8% weights logs
Invariant ViT-L/16 38 G 81.2% 80.4% weights logs

Evaluation

We will use the Hybrid ViT-H model as an example (as it is the best performing) to showcase how you can perform evaluation. You can replace the model with the one that you want to test.

Deit III

After downloading the weights you should be able to run the following command:

python experiments/eval_deit.py --model hybrid_deit_huge_patch14 --eval pretrained_models/hybrid_deit_huge_patch14.pth

This should give:

* Acc@1 84.996 Acc@5 96.390 loss 0.799

DINOv2

For classification, run:

python experiments/eval_dinov2_classification.py output_dir --config-file dinov2/configs/eval/hybrid_vith16.yaml --pretrained-weights pretrained_models/hybrid_dinov2_huge_patch16.pth

This should give an accuracy of 82.2% and 81.4% for linear and knn, respectively.

For segmentation, run:

python experiments/eval_dinov2_segmentation.py model_path=dinov2/eval/segmentation/dinov2_loader.py model_loader_kwargs.model_name=dinov2_hybrid_vith16 model_loader_kwargs.weights=pretrained_models/hybrid_dinov2_huge_patch16.pth distributed=True ntasks_per_node=4 account=... gpus-per-node=4 nodes=1 output_dir=./output_dir

This should give an mIoU of 35.1 (linear) and 31.1 (knn) for ADE20K and 70.8 (linear) and 61.7 (knn) for VOC2012.

Training

Per-GPU batch sizes are adjusted to work well on A100-40GB. Feel free to adjust for your settings while making sure the effective batch size remains the same (2048 for DeiT III and 1024 for DINOv2).

DeiT III

To launch distributed training, run:

python experiments/train_deit.py --model hybrid_deit_huge_patch14

DINOv2

To launch distributed training, run:

python experiments/train_dinov2.py --config-file dinov2/configs/train/hybrid_vith16.yaml --ngpus 4 --nodes 2

Equivariance

We have provided a utility file to verify octic equivariance (and invariance). Simply run:

python experiments/test_equivariance.py

Throughput

In the paper we present the throughput. To replicate these figures run the following command on a A100-80GB:

python experiments/complexity.py --amp --compile

Checklist

  • Release the D8 models + weights
  • Add to timm library

License

Stronger ViTs with Octic Equivariance code is released under the Apache License 2.0. See LICENSE for additional details. Training recipes are taken from DeiT III and DINOv2, and evaluation is taken from capi, all released under the Apache License 2.0.

Credit

Code structure is inspired by capi and RoMa.

Cite

If you find this repository useful, please consider giving a star ⭐ and citation 🐙:

@misc{nordström2025strongervitsocticequivariance,
      title={Stronger ViTs With Octic Equivariance}, 
      author={David Nordström and Johan Edstedt and Fredrik Kahl and Georg Bökman},
      year={2025},
      eprint={2505.15441},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.15441}, 
}

About

Stronger ViTs With Octic Equivariance

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0