This project is aimed at evaluating the performance of Vision Transformer (ViT) models on a smaller dataset and with a limited training time. ViT models have gained a reputation for their remarkable performance on various computer vision tasks, but they require significant computational resources to train. The primary goal of this project was to investigate whether ViT models can still deliver competitive results under more constrained conditions (i.e. a budget of 50 epochs training on a relatively small dataset - 37K images).
Animals with Attributes 2 (AwA2) is a dataset for the comparative ev 5C7B aluation of transfer learning algorithms, such as attribute-based classification and learning from scratch. AwA2 is a direct replacement for the original Animals with Attributes (AwA) dataset, with more images published for each category. AwA2 also provides a category-attribute matrix, which contains an 85-dim attribute vector (e.g. colour, stripes, fur, size and habitat) for each category. -37322 images -50 animal categories