A comprehensive collection of CUDA programming examples for teaching parallel computing concepts using NVIDIA GPUs. This repository is designed for students learning CUDA programming in Windows environments with Visual Studio.
Each tutorial is a self-contained project with full CMake support, detailed documentation, and batch files for easy building and execution:
- 01-cuda-basics - Introduction to CUDA fundamentals and device querying
- 02-vector-addition - Basic vector addition operations
- 03-memory-types - Global, shared, constant, and texture memory types
- 04-thread-organization - Thread, block, and grid organizations
- 05-matrix-multiplication - Different matrix multiplication strategies
- 06-reduction-operations - Parallel reduction operations
- 07-atomic-operations - Atomic operations and thread safety
- 08-stream-processing - Asynchronous operations and CUDA streams
- 09-texture-processing - Image processing and texture memory usage
- 10-dynamic-parallelism - Dynamic parallelism
- 11-cuda-libraries - CUDA libraries usage
- 12-multi-gpu - Multi-GPU programming
- 13-unified-memory - Unified memory model
- 14-optimization - Performance optimization techniques
- 15-debugging - Debugging CUDA applications
- Windows 10/11
- Visual Studio 2022 Community Edition
- CUDA Toolkit 12.0 or newer
- CMake 3.20 or higher
- C++17 compatible compiler
- NVIDIA GPU with Compute Capability 3.5 or higher
-
Install Visual Studio 2022 Community Edition
- During installation, select "Desktop development with C++"
- Ensure MSBuild tools are installed
-
Install CUDA Toolkit
- Download from NVIDIA CUDA Toolkit
- Select your version of Windows
- Follow the installation instructions
- Verify the installation with
nvcc --version
in a command prompt
-
Install CMake
- Download from CMake
- Add CMake to the system PATH during installation
Each project follows the same pattern for building and execution:
- Navigate to the project directory
cd 01-cuda-basics
- Configure the project
configure.bat
- Build the project
build_all.bat
- Run the example
run.bat
Most examples support additional command-line parameters:
run.bat --debug # Run in debug mode with additional information
run.bat --release # Run in release mode (optimized)
run.bat --benchmark # Run performance benchmarks
run.bat --threads N # Set specific thread configuration
run.bat --help # Show all available options
For the best learning experience, follow the tutorials in numerical order:
- Start with 01-cuda-basics to understand CUDA fundamentals
- Move through vector and matrix operations (02 through 05)
- Learn advanced topics like synchronization and atomic operations (06 and 07)
- Explore performance optimization and techniques (08 through 14)
- Finally, learn debugging techniques with 15-debugging
Each directory contains its own detailed README.md with:
- Concepts explanation
- Code walkthrough
- Build and run instructions
- Expected output and interpretation
- Exercises and further exploration ideas
- Ensure your NVIDIA driver is up to date
- Make sure you've installed the CUDA Toolkit that matches your system architecture
- Verify the installation with
nvcc --version
- Check if CMake can find CUDA with
cmake .. -G "Visual Studio 17 2022" -A x64
- Ensure your GPU has the required Compute Capability
- Check Visual Studio has the necessary C++ components installed
- Verify your GPU driver is up to date
- Check Windows Device Manager to ensure your GPU is recognized
- Run examples in debug mode for more information:
run.bat --debug
- NVIDIA CUDA Documentation
- CUDA C++ Programming Guide
- CUDA C++ Best Practices Guide
- NVIDIA Developer Blog
- CUDA Samples on GitHub
Contributions to improve the tutorials are welcome! Please feel free to submit pull requests or open issues to suggest improvements.
This project is licensed under the MIT License - see the LICENSE file for details.
- NVIDIA for the CUDA platform and documentation
- The parallel computing community for sharing knowledge and best practices
- All contributors who help improve these teaching materials
Happy CUDA Programming!