Welcome to the 100 Days of CUDA challenge! This repository documents my journey of learning and mastering CUDA (Compute Unified Device Architecture) for GPU programming over the next 100 days.
Day | Topic | Notes |
---|---|---|
01 | Hello CUDA! | Setup and Writing my first CUDA kernel |
02 | Vector Addition | Implementing vector addition on both CPU and GPU |
03 | RGB to Grayscale | Converting RGB image to Grayscale on GPU |
04 | Image Blurring | Blurring an image using CUDA |
05 | Matrix Multiplication | Matrix multiplication on GPU and CPU and verifying the results |
06 | Streamlined multiprocessors (Theory) | Pytorch C++/CUDA extension setup |
07 | Pytorch C++/CUDA extension | Implementing a custom PyTorch operation using CUDA |
08 | Memory Tiling | Matrix multiplication with memory tiling |
09 | cuBLAS | Comparing cuBLAS with simple vector addition kernel |
10 | GEMM | Matrix multiplication with GEMM |
11 | Activation Kernels | Implementing tanh |
12 | Comparing Frameworks | Comparing the performance of tensorflow, pytorch, cuDNN and custom CUDA implementation for tanh activation |
13 | 2D Convolution and Max Pooling | Implementing basic 2D convolution and max pooling kernels |
15 | cuDNN Conv 2D Kernel | Switched Day 14 and Day 15 due to some work |
16 | Batch Normalization | Implementing batch normalization, will do Day 14 on Tuesday (18th Feb 25) |
17 | ReLU Gradient | Implementing ReLU gradient, will do Day 14 on Tuesday (18th Feb 25) |
18 | Bias Addition | Implementing bias addition, will do Day 14 on Tuesday (19th Feb 25) |
19 | Dropout | Implementing dropout, will do Day 14 on Tuesday (21st Feb 25) |
20 | Gradient Accumulation | Implementing gradient accumulation, will do Day 14 on Tuesday (22nd and 23rd Feb 25) |
21 | MNIST Classifier | Implementing a simple MNIST classifier using CUDA with matmul kernel |
22 | MNIST Partial Forward Pass | Implementing a partial forward pass of MNIST using CUDA with matmul and ReLU kernel with batch size 32 and 128 hidden layer size |
(Will be updated daily)
The repository will be structured as follows:
100-Days-of-CUDA/
├── Day01_Hello_CUDA/
│ ├── hello_cuda.cu
│ ├── README.md
│
├── Day02_Vector_Addition/
│ ├── vector_add.cu
│ ├── README.md
│
...
└── README.md