Vision Transformer from Scratch in PyTorch

This is a simplified Scratch Pytorch Implementation of Vision Transformer (ViT) with detailed Steps (Refer to model.py)

Overview:

The default network is a scaled-down version of the original ViT architecture from the ViT Paper.
Has only 200k-800k parameters depending upon the embedding dimension (Original ViT-Base has 86 million).
Tested on MNIST, FashionMNIST, SVHN, CIFAR10, and CIFAR100 datasets.
Uses a smaller patch size of 4.
Can be used with bigger datasets by increasing the model parameters and patch size.
Option to use PyTorch's inbuilt transformer layers in-place of the implemented one to define the ViT.

Run commands (also available in scripts.sh):

Dataset	Run command	Test Acc
MNIST	python main.py --dataset mnist --epochs 100	99.5
Fashion MNIST	python main.py --dataset fmnist	92.3
SVHN	python main.py --dataset svhn --n_channels 3 --image_size 32 --embed_dim 128	96.2
CIFAR10	python main.py --dataset cifar10 --n_channels 3 --image_size 32 --embed_dim 128	86.3 (82.5 w/o RandAug)
CIFAR100	python main.py --dataset cifar100 --n_channels 3 --image_size 32 --embed_dim 128	59.6 (55.8 w/o RandAug)

use_torch_transformer_layers argument (in main.py) switches between PyTorch's inbuilt transformer layers and the implemented one for defining the Vision Transformer's Encoder and its layers (code at model.py).

Transformer Config:

Config	MNIST and FMNIST	SVHN and CIFAR
Input Size	1 X 28 X 28	3 X 32 X 32
Patch Size	4	4
Sequence Length	7*7 = 49	8*8 = 64
Embedding Size	64	128
Parameters	210k	820k
Num of Layers	6	6
Num of Heads	4	4
Forward Multiplier	2	2
Dropout	0.1	0.1

Further optimizing the configuration can provide additional performance gains.

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
outputs		outputs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
main.py		main.py
model.py		model.py
scripts.sh		scripts.sh
solver.py		solver.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vision Transformer from Scratch in PyTorch

This is a simplified Scratch Pytorch Implementation of Vision Transformer (ViT) with detailed Steps (Refer to model.py)

Overview:

Run commands (also available in scripts.sh):

Transformer Config:

About

Uh oh!

Releases

Packages

Languages

License

seanshpark/schh_vit

Folders and files

Latest commit

History

Repository files navigation

Vision Transformer from Scratch in PyTorch

This is a simplified Scratch Pytorch Implementation of Vision Transformer (ViT) with detailed Steps (Refer to model.py)

Overview:

Run commands (also available in scripts.sh):

Transformer Config:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages