Walter Simoncini1, Andrei Bursuc2, Spyros Gidaris2, Yuki M. Asano1.
This repository contains the experiments for our paper No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations.
Our paper introduces FUNGI: Features from UNsupervised GradIents, a method to enhance the features of vision encoders by leveraging self-supervised gradients. Our method is simple: given any pretrained model, we first compute gradients from various self-supervised objectives for each input. These are projected to a lower dimension and then concatenated with the model's embedding. The resulting features are evaluated on k-nearest neighbor classification over 11 datasets from vision, 5 from natural language processing, and 2 from audio. Across backbones spanning various sizes and pretraining strategies, FUNGI features provide consistent performance improvements over the embeddings. We also show that using FUNGI features can benefit linear classification and image retrieval, and that they significantly improve the retrieval-based in-context scene understanding abilities of pretrained models, for example improving upon DINO by +17% for semantic segmentation — without any training.
The root folders contains experiments for the vision modality (k-nn classification, retrieval and linear classification). Experiments for other modalities and tasks are in their corresponding folders:
fungi-text
: k-nearest neighbor text classification using FUNGI obtained from text encoders.fungi-ssast
: k-nearest neighbor audio classification using FUNGI obtained from an SSAST backbone.fungi-hummingbird
: retrieval-based vision in-context learning evaluation on semantic segmentation tasks.
The goal of this repository is to make the experiments illustrated in the paper reproducible, and it's not intented to be used in practice.
Important
If you simply want to extract FUNGI features for your dataset have a look at fungivision, which provides an easy to use and customizable model wrapper for producing FUNGI features.
This code was developed using Python 3.10
, so to run the experiments we recommend that you create an appropriate environment and install the required dependencies as follows:
conda create -n grady python=3.10
conda activate grady
# Install the required Python libraries
pip install -r requirements.txt
Most evaluation datasets are loaded via torchvision
, and require no special setup. In this section, we provide instructions on how to download and generate the ones not available out of the box.
Stanford Cars used to be available in torchvision
, but links to the original data files are broken. Use the commands below to download and setup the data, and don't forget to manually download cars_test_annos_withlabels.mat
from here.
# Based on https://github.com/pytorch/vision/issues/7545 and
# https://github.com/pytorch/vision/issues/7545#issuecomment-1575410733
mkdir /scratch-shared/$USER/cache/cars
kaggle datasets download -p /scratch-shared/$USER/cache/cars jessicali9530/stanford-cars-dataset
# Unpack the dataset and remove recursive structure
cd /scratch-shared/$USER/cache/cars
unzip stanford-cars-dataset.zip
# Move dataset folders to the top level
mv cars_test/ cars_test_tmp
mv cars_test_tmp/cars_test/ .
rm -rf cars_test_tmp
mv cars_train/ cars_train_tmp
mv cars_train_tmp/cars_train/ .
rm -rf cars_train_tmp
wget https://github.com/pytorch/vision/files/11644847/car_devkit.tgz
# Download the test annotations manually from this URL
# https://www.kaggle.com/code/subhangaupadhaya/pytorch-stanfordcars-classification/input?select=cars_test_annos_withlabels+%281%29.mat
# Go back to the home folder
cd $HOME
# Wrap the cars dataset into the proper folder structure, i.e. cars/stanford_cars/...
mv /scratch-shared/$USER/cache/cars /scratch-shared/$USER/cache/stanford_cars
mkdir /scratch-shared/$USER/cache/cars
mv /scratch-shared/$USER/cache/stanford_cars /scratch-shared/$USER/cache/cars
# Unpack the devkit
cd /scratch-shared/$USER/cache/cars/stanford_cars/
tar -xf car_devkit.tgz
Warning
The pets dataset is no longer available from the original source. If the issue is only temporary you will be able to use the following commands to download and build its corresponding HuggingFace dataset.
mkdir /scratch-shared/$USER/cache/pets
# Download the images and annotations
wget -P /scratch-shared/$USER/cache/pets https://thor.robots.ox.ac.uk/~vgg/data/pets/images.tar.gz
wget -P /scratch-shared/$USER/cache/pets https://thor.robots.ox.ac.uk/~vgg/data/pets/annotations.tar.gz
# Unpack the images and annotations
cd /scratch-shared/$USER/cache/pets
tar -xf images.tar.gz
tar -xf annotations.tar.gz
# Go back to the home folder (or the folder containing this repository)
cd $HOME/knn
# Generate an HuggingFace dataset version of Pets
sh jobs/datasets/generate_pets.job
Download ImageNet-100 from Kaggle and build its corresponding HuggingFace dataset.
kaggle datasets download -p /scratch-shared/$USER/cache ambityga/imagenet100
# Unpack and structure the dataset
cd /scratch-shared/$USER/cache
mkdir imagenet100 & mv imagenet100.zip imagenet100
cd imagenet100
unzip imagenet100.zip
# Update the script arguments as needed before scheduling this job
sh jobs/datasets/generate_imagenet100.job
Download CUB 200 (2011) and build its corresponding HuggingFace dataset.
WORKING_DIR=$PWD
DATASET_DIR=/scratch-shared/$USER/cache/cub
mkdir $DATASET_DIR
wget -P $DATASET_DIR https://data.caltech.edu/records/65de6-vp158/files/CUB_200_2011.tgz
cd $DATASET_DIR
tar -xf CUB_200_2011.tgz
cd $WORKING_DIR
sh jobs/datasets/generate_cub.job
Download the Oxford and Paris landmarks datasets from Kaggle. These datasets are only used for the retrieval experiments.
BASE_DIR=/scratch-shared/$USER/cache/retrieval-landmarks
mkdir -p $BASE_DIR
kaggle datasets download -p $BASE_DIR qiubit/roxfordparis
cd $BASE_DIR
unzip roxfordparis.zip
Most of the experiments in the paper can be reproduced using the scripts in the jobs
directory. The scripts assume you have a scratch folder whose path is /scratch-shared/$USER
, make sure to change it as appropriate.
To run the image classification experiments you first have to extract the embeddings and gradient features for both the training and test splits of a dataset. To extract the model embeddings use the jobs/extract.job
script, and customize the DATASETS
and MODEL_TYPES
arrays as appropriate. The full list of supported models and datasets is available in src/enums.py
.
Once you've extracted the embeddings you can then extract the gradient features for each individual loss/objective. To do so use one of the jobs/grads_xxx_multi.job
scripts, where xxx
is the objective name. You can customize the parameters to be used for the objective, the datasets and backbones in the scripts.
For each backbones, in addition to its name you must also specify the linear layer from which gradients should be extracted from and the corresponding dataset name in the output HDF5
gradients file in the LAYER_PATHS
array. Each entry should be in the format path.to.layer:out_dataset_name
, e.g. to extract the gradients for the last attention output projection and save them in the attn_proj
dataset you should use backbone.blocks.11.attn.proj:attn_proj
. The word dataset
in this context refers to one of the tensor arrays stored in each HDF5 file, as they are indexed by name.
Most models from timm
follow the backbone.blocks.xx.attn.proj
structure, while torchvision
models follow the backbone.encoder.layers.encoder_layer_xx.self_attention.out_proj
structure, where xx
is the transformer block number.
Finally, you will also have to specify the path to a random projection matrix in the PROJECTION_MATRICES
array, which is used to downsample the gradients to a smaller vector, so that they are manageable with respect to storage and compute. You can generate one using the following command
python generate_projection_matrix.py \
--projection-dim $EMBEDDINGS_DIM \
--gradients-dim $GRADIENTS_DIM \
--output-path /scratch-shared/$USER/path_to_matrix.pth
Where $EMBEDDINGS_DIM
is the dimension of the projected gradients and $GRADIENTS_DIM
is the dimension of the gradients matrix. Below we list these dimensions as ($EMBEDDINGS_DIM, $GRADIENTS_DIM)
tuples for some models, assuming that gradients are always extracted from the attention output projection.
- ViT-S/16: (384, 147840)
- ViT-B/16: (768, 590592)
- ViT-L/16: (1024, 1049600)
Warning
Make sure you're using the same projection to extract gradients for the same dataset but from different objectives! Using different projections may lead to significant performance degradation.
Once you've extracted gradient features and embeddings you can evaluate their performance in k-nearest neighbor classification using the jobs/concat.job
(or jobs/concat_few_shot.job
). You can choose the PCA dimensionality (set it to a large number to effectively disable it), and the datasets and backbones to evaluate. Don't forget to update the data paths as appropriate!
The script will evaluate the performance of all possible combinations of gradients and embeddings. You can run the same experiments for linear classification using jobs/concat_linear.job
(or the few shot counterpart), but only incremental combinations will be evaluated (i.e. embeddings, embeddings + kl, ...). To disable this behavior and evaluate every combination comment out line 172
of the train_linear_concat_grads.py
script.
You can replicate the retrieval experiments by following the same steps for image classification, but instead using the scripts in the jobs/retrieval
folder.
We provide some automatic tests used to validate the gradients extraction and whether the manually downloaded datasets were processed correctly. You can run them using the following commands:
mkdir -p cache/models
pytest
The gradient sanity check tests should be run on CPU, as running them on a GPU may produce slighly different gradients depending on the batch size and cause the test to fail.
If you found our work useful please cite us as follows:
@inproceedings{simoncini2024fungi,
title={No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations},
author={Walter Simoncini and Spyros Gidaris and Andrei Bursuc and Yuki M. Asano},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=PRBsEz8rnV}
}